Skip to main content

Add metadata fields

Add, modify, or delete metadata fields across the CKAN ecosystem, including DCAT-AP schema updates, Solr search configuration, SeMPyRO, Discovery Service, and FAIR Data Point (FDP).

In this guide

CKAN DCAT Model
CKAN scheming schemas Solr search integration
SeMPyRO
FAIR Data Point (FDP)
Discovery Service

CKAN DCAT model

Use the GDI-maintained DCAT extension when adding fields that affect DCAT parsing, serialisation, or mapping between DCAT and CKAN:

  1. Clone the GDI DCAT extension repository:

    git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-dcat.git
  2. Add the new field to the relevant DCAT schema.

    • For DCAT-AP fields, update files such as ckanext/dcat/schemas/dcat_ap_full.yaml.
    • For HealthDCAT fields, update files such as ckanext/dcat/schemas/health_dcat_ap.yaml or ckanext/dcat/schemas/health_dcat_ap_multilingual.yaml.
    • For dataset series fields, review ckanext/dcat/schemas/dcat_ap_dataset_series.yaml and ckanext/dcat/schemas/dcat_ap_in_series.yaml.
    • Use appropriate field types (e.g., text, repeating subfield, URI).
    • Follow examples from other fields for consistency.
  3. Extend the existing mapping depending on the DCAT-AP version. Modify the mapping files located in the directory: ckanext/dcat/profiles.

    Example

    Review existing multi-valued fields such as creator in the GDI DCAT fork for mapping and test patterns.

  4. Fix the corresponding unit tests.

  5. Create a pull request to gdi-userportal-ckanext-dcat.

    • Include unit tests for the new fields.
    • Ensure compatibility across different DCAT-AP versions.
  6. Update the following repositories after a new release. Update development and production Dockerfiles in these repositories (order is important):

    Check if CKAN locally works with the new added fields by harvesting an example FDP.

Note

Always take into account the mapping from CKAN → DCAT in addition to DCAT → CKAN.

CKAN scheming schemas

Use the GDI User Portal CKAN extension when adding fields that should appear in CKAN dataset forms or be stored as CKAN package fields:

  1. Clone the GDI User Portal CKAN extension repository:

    git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-gdi-userportal.git
  2. Update the active scheming schemas. The Docker setup configures these files through scheming.dataset_schemas:

    • ckanext/gdi_userportal/scheming/schemas/dataset_multilingual.yaml
    • ckanext/gdi_userportal/scheming/schemas/dataset_series_multilingual.yaml
  3. Update GDI-specific presets if the field needs custom rendering or validation.

    • ckanext/gdi_userportal/scheming/presets/gdi_presets.yaml
  4. Check the Docker scheming configuration. The active configuration is maintained in gdi-userportal-ckan-docker, in ckan/docker-entrypoint.d/setup_scheming.sh.

For field keys and schema syntax, see the CKAN scheming documentation and Work with CKAN schemas.

Solr search integration

To make new CKAN fields searchable via Solr, modify the schema.xml file.

To add and configure a searchable field:

  1. Define the field type and name. In the top part of the schema.xml file, define the type and name of the new field. The type specifies how Solr will handle the data in the field (e.g., as text, integers, dates, etc.).

    Find the section in schema.xml where other fields are defined, and then add your new field with its corresponding type.

    Example:

    <field name="custom_field" type="string" indexed="true" stored="true" />

    In this example, custom_field is the field name with string type. The indexed="true" attribute enables searching, while stored="true" allows retrieval in results.

    Indexing vs Storing
    • indexed="true": The field can be used in searches.
    • stored="true": The field can be retrieved in search results.
  2. Add the field to search. Alternative text: In the lower part of the schema.xml file, add a copyField directive to include the new field in the search index. This allows Solr to use the contents of the new field when performing searches.

    Example:

    <copyField source="custom_field" dest="text" />

    This example maps the custom_field to the text field, which Solr uses for full-text searches. By adding the copyField directive, you're instructing Solr to include the contents of custom_field in the search index.

  3. Release a new version and update. After modifying the schema.xml file, release a new version of the Solr configuration. Then, update the GenomicDataInfrastructure/gdi-userportal-ckan-docker repository to ensure that the new Solr configuration is used in both development and production environments when running CKAN with Docker Compose.

  4. Test your configuration. After making these changes, restart your Solr instance and reindex your CKAN data to ensure that the new field is indexed and searchable with the command:

    ckan -c /etc/ckan/default/ckan.ini search-index rebuild

SeMPyRO

SeMPyRO validates and transforms metadata between different semantic formats. To extend SeMPyRO with new fields, define the field's RDF properties in the appropriate Python class.

  1. Define field properties. Before adding a field, identify these required properties:

    • Predicate - The RDF term for the field
    • Cardinality - Single or multiple-valued
    • Range - The datatype or class
    1. Add the field to the class. For HealthDCAT-AP fields, use the relevant class under sempyro.healthdcatap and add a property definition. Example for the health_theme property in HEALTHDCATAPDataset:
       health_theme: List[AnyHttpUrl] = Field(
    default=None,
    description="A category of the Dataset or tag describing the Dataset.",
    json_schema_extra={
    "rdf_term": HEALTHDCATAP.healthTheme,
    "rdf_type": "uri",
    },
    )

    Each field is defined as a class property with the following structure:

    • Line 1: Property name and range. Use List[] for multi-valued fields (cardinality > 1). Common range types include AnyHttpUrl, LiteralField, or classes like Agent or VCard.
    • Line 2: Set default=None for optional fields. Omit this line for mandatory fields.
    • Line 3: Human-readable description of the field.
      • Line 4: json_schema_extra containing the RDF mapping metadata.
      • Line 5: RDF predicate (for example HEALTHDCATAP.healthTheme). Common namespaces like DCTERMS, DCAT, and HEALTHDCATAP are imported by default. Define custom predicates with URIRef("http://example.com/range#property").
      • Line 6: RDF type such as rdfs_literal, xsd:string, or uri. Review other properties in the class for guidance.
    1. Regenerate schemas. Regenerate the JSON and YAML schemas. For the HEALTHDCATAPDataset class:
       hatch run python sempyro/healthdcatap/healthdcatap_dataset.py

FAIR Data Point (FDP)

FDP field changes are managed through the GDI metadata repository, not by manually editing metadata schemas in the FDP UI.

To add or update a field for FDP:

  1. Update the source SHACL shapes and metadata documentation in gdi-metadata.

  2. Review schema-tool/Properties.yaml if the field changes how shapes are combined, inherited, or published.

  3. Publish the updated SHACLs with the automated schema tool:

    cd schema-tool
    docker compose up

For FDP deployment and User Portal connection details, refer to the starter-kit deployment guide.

Discovery Service

Update the Discovery Service to include the new field in both the OpenAPI definitions and the mapping between CKAN and the Discovery Service.

  1. Update OpenAPI definition. Include the new field in both the CKAN API and the Discovery Service API. Both files are located in the src/main/openapi folder:

    • ckan.yaml: Contains the API returned by CKAN. Based on this YAML, Java classes are automatically generated corresponding to the API definition. For adding a field to a Dataset, the primary change will likely be in the CkanPackage definition. See the examples there on how to add a property.
    • discovery.yaml: Defines what the Discovery service should return. You can make this definition whatever you want it to be—it does not have to correspond one-to-one with CKAN. To add a property here, modify the RetrievedDataset definition. Again, see the examples in the file.
  2. Update the mapping. Run the following command to regenerate the Java classes based on the OpenAPI definitions:

    mvn clean compile
    Expected errors

    This command regenerates the classes reflecting the OpenAPI objects. Compilation errors are expected until the mapping is completed in the next step.

  3. Add the mapping between the CKAN and Discovery service fields.

    • Modify the RetrievedDatasetBuilder in src/main/java/io/github/genomicdatainfrastructure/discovery/utils/PackageShowMapper.java.
    • Review existing field mappings in this file for implementation patterns.
  4. Update test cases.

    • Update the test cases in src/test/java/io/github/genomicdatainfrastructure/discovery/services/PackageShowMapperTest.java.
    • Update both empty and filled dataset examples, ensuring that both the CkanPackage objects (representing CKAN API output) and the expected RetrievedDataset output reflect the new fields.
  5. Verify the implementation with automated testing (mvn test) and manual testing. Run the application (mvn compile quarkus:dev) and use Postman to confirm that the mapping and output match expectations.