Add metadata fields
Add, modify, or delete metadata fields across the CKAN ecosystem, including DCAT-AP schema updates, Solr search configuration, SeMPyRO, Discovery Service, and FAIR Data Point (FDP).
In this guide
CKAN DCAT Model
CKAN scheming schemas Solr search integration
SeMPyRO
FAIR Data Point (FDP)
Discovery Service
CKAN DCAT model
Use the GDI-maintained DCAT extension when adding fields that affect DCAT parsing, serialisation, or mapping between DCAT and CKAN:
-
Clone the GDI DCAT extension repository:
git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-dcat.git -
Add the new field to the relevant DCAT schema.
- For DCAT-AP fields, update files such as
ckanext/dcat/schemas/dcat_ap_full.yaml. - For HealthDCAT fields, update files such as
ckanext/dcat/schemas/health_dcat_ap.yamlorckanext/dcat/schemas/health_dcat_ap_multilingual.yaml. - For dataset series fields, review
ckanext/dcat/schemas/dcat_ap_dataset_series.yamlandckanext/dcat/schemas/dcat_ap_in_series.yaml. - Use appropriate field types (e.g., text, repeating subfield, URI).
- Follow examples from other fields for consistency.
- For DCAT-AP fields, update files such as
-
Extend the existing mapping depending on the DCAT-AP version. Modify the mapping files located in the directory:
ckanext/dcat/profiles.ExampleReview existing multi-valued fields such as
creatorin the GDI DCAT fork for mapping and test patterns. -
Fix the corresponding unit tests.
-
Create a pull request to
gdi-userportal-ckanext-dcat.- Include unit tests for the new fields.
- Ensure compatibility across different DCAT-AP versions.
-
Update the following repositories after a new release. Update development and production Dockerfiles in these repositories (order is important):
- https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-fairdatapoint
- https://github.com/GenomicDataInfrastructure/gdi-userportal-ckan-docker
Check if CKAN locally works with the new added fields by harvesting an example FDP.
Always take into account the mapping from CKAN → DCAT in addition to DCAT → CKAN.
CKAN scheming schemas
Use the GDI User Portal CKAN extension when adding fields that should appear in CKAN dataset forms or be stored as CKAN package fields:
-
Clone the GDI User Portal CKAN extension repository:
git clone https://github.com/GenomicDataInfrastructure/gdi-userportal-ckanext-gdi-userportal.git -
Update the active scheming schemas. The Docker setup configures these files through
scheming.dataset_schemas:ckanext/gdi_userportal/scheming/schemas/dataset_multilingual.yamlckanext/gdi_userportal/scheming/schemas/dataset_series_multilingual.yaml
-
Update GDI-specific presets if the field needs custom rendering or validation.
ckanext/gdi_userportal/scheming/presets/gdi_presets.yaml
-
Check the Docker scheming configuration. The active configuration is maintained in
gdi-userportal-ckan-docker, inckan/docker-entrypoint.d/setup_scheming.sh.
For field keys and schema syntax, see the CKAN scheming documentation and Work with CKAN schemas.
Solr search integration
To make new CKAN fields searchable via Solr, modify the schema.xml file.
To add and configure a searchable field:
-
Define the field type and name. In the top part of the
schema.xmlfile, define the type and name of the new field. The type specifies how Solr will handle the data in the field (e.g., astext,integers,dates, etc.).Find the section in
schema.xmlwhere other fields are defined, and then add your new field with its corresponding type.Example:
<field name="custom_field" type="string" indexed="true" stored="true" />In this example,
custom_fieldis the field name withstringtype. Theindexed="true"attribute enables searching, whilestored="true"allows retrieval in results.Indexing vs Storing- indexed="true": The field can be used in searches.
- stored="true": The field can be retrieved in search results.
-
Add the field to search. Alternative text: In the lower part of the
schema.xmlfile, add acopyFielddirective to include the new field in the search index. This allows Solr to use the contents of the new field when performing searches.Example:
<copyField source="custom_field" dest="text" />This example maps the
custom_fieldto the text field, which Solr uses for full-text searches. By adding thecopyFielddirective, you're instructing Solr to include the contents ofcustom_fieldin the search index. -
Release a new version and update. After modifying the
schema.xmlfile, release a new version of the Solr configuration. Then, update the GenomicDataInfrastructure/gdi-userportal-ckan-docker repository to ensure that the new Solr configuration is used in both development and production environments when running CKAN with Docker Compose. -
Test your configuration. After making these changes, restart your Solr instance and reindex your CKAN data to ensure that the new field is indexed and searchable with the command:
ckan -c /etc/ckan/default/ckan.ini search-index rebuild
SeMPyRO
SeMPyRO validates and transforms metadata between different semantic formats. To extend SeMPyRO with new fields, define the field's RDF properties in the appropriate Python class.
-
Define field properties. Before adding a field, identify these required properties:
- Predicate - The RDF term for the field
- Cardinality - Single or multiple-valued
- Range - The datatype or class
- Add the field to the class. For HealthDCAT-AP fields, use the relevant class under
sempyro.healthdcatapand add a property definition. Example for thehealth_themeproperty inHEALTHDCATAPDataset:
health_theme: List[AnyHttpUrl] = Field(
default=None,
description="A category of the Dataset or tag describing the Dataset.",
json_schema_extra={
"rdf_term": HEALTHDCATAP.healthTheme,
"rdf_type": "uri",
},
)Each field is defined as a class property with the following structure:
- Line 1: Property name and range. Use
List[]for multi-valued fields (cardinality > 1). Common range types includeAnyHttpUrl,LiteralField, or classes likeAgentorVCard. - Line 2: Set
default=Nonefor optional fields. Omit this line for mandatory fields. - Line 3: Human-readable description of the field.
- Line 4:
json_schema_extracontaining the RDF mapping metadata. - Line 5: RDF predicate (for example
HEALTHDCATAP.healthTheme). Common namespaces likeDCTERMS,DCAT, andHEALTHDCATAPare imported by default. Define custom predicates withURIRef("http://example.com/range#property"). - Line 6: RDF type such as
rdfs_literal,xsd:string, oruri. Review other properties in the class for guidance.
- Line 4:
- Regenerate schemas. Regenerate the JSON and YAML schemas. For the
HEALTHDCATAPDatasetclass:
hatch run python sempyro/healthdcatap/healthdcatap_dataset.py
FAIR Data Point (FDP)
FDP field changes are managed through the GDI metadata repository, not by manually editing metadata schemas in the FDP UI.
To add or update a field for FDP:
-
Update the source SHACL shapes and metadata documentation in
gdi-metadata.- The current model documentation lives in
Documentation. - Shape files live under
Formulasation(shacl)/core/PiecesShape.
- The current model documentation lives in
-
Review
schema-tool/Properties.yamlif the field changes how shapes are combined, inherited, or published. -
Publish the updated SHACLs with the automated schema tool:
cd schema-tool
docker compose up
For FDP deployment and User Portal connection details, refer to the starter-kit deployment guide.
Discovery Service
Update the Discovery Service to include the new field in both the OpenAPI definitions and the mapping between CKAN and the Discovery Service.
-
Update OpenAPI definition. Include the new field in both the CKAN API and the Discovery Service API. Both files are located in the
src/main/openapifolder:ckan.yaml: Contains the API returned by CKAN. Based on this YAML, Java classes are automatically generated corresponding to the API definition. For adding a field to a Dataset, the primary change will likely be in the CkanPackage definition. See the examples there on how to add a property.discovery.yaml: Defines what the Discovery service should return. You can make this definition whatever you want it to be—it does not have to correspond one-to-one with CKAN. To add a property here, modify the RetrievedDataset definition. Again, see the examples in the file.
-
Update the mapping. Run the following command to regenerate the Java classes based on the OpenAPI definitions:
mvn clean compileExpected errorsThis command regenerates the classes reflecting the OpenAPI objects. Compilation errors are expected until the mapping is completed in the next step.
-
Add the mapping between the CKAN and Discovery service fields.
- Modify the
RetrievedDatasetBuilderinsrc/main/java/io/github/genomicdatainfrastructure/discovery/utils/PackageShowMapper.java. - Review existing field mappings in this file for implementation patterns.
- Modify the
-
Update test cases.
- Update the test cases in
src/test/java/io/github/genomicdatainfrastructure/discovery/services/PackageShowMapperTest.java. - Update both empty and filled dataset examples, ensuring that both the
CkanPackageobjects (representing CKAN API output) and the expectedRetrievedDatasetoutput reflect the new fields.
- Update the test cases in
-
Verify the implementation with automated testing (
mvn test) and manual testing. Run the application (mvn compile quarkus:dev) and use Postman to confirm that the mapping and output match expectations.