Configure metadata schemas
Configure and manage CKAN metadata schemas to define dataset fields, validation rules, and data entry forms. This guide covers schema development, deployment, and maintenance for system administrators.
Schema structure and format
Define CKAN schemas as JSON or YAML files that specify dataset metadata fields and their properties.
Example field definition
{
"field_name": "license_id",
"label": "License",
"form_snippet": "license.html",
"help_inline": true,
"help_text": {
"en": "[dct:license] This property refers to the licence under which the Dataset is made available.",
"nl": "[dct:license] Deze eigenschap heeft betrekking op de licentie waaronder de Dataset beschikbaar wordt gesteld."
}
}
Key field properties
Configure field behaviour using these properties:
- field_name: CKAN field identifier that defines the field in the database
- label: UI field representation that displays to end users
- help_text: Explanatory text that appears under the field in the UI
- choices: List of dictionaries with value and label for dropdown menus
- choices_helper: Generates form dropdowns dynamically from API endpoints
- presets: Validation presets like
radio,multiple_checkbox, anddatefor automatic checks - form_snippet: Defines field representation for data input in Jinja2 format
- display_snippet: Defines how the system displays data in the UI
- validators: Data validation functions that enforce field requirements
- output_validators: Functions that convert complex data structures from the database
- repeating_subfields: Handles cardinality requirements for multi-value fields
- start_form_page: Controls which form page displays the field
- display_property: Overrides the DCAT mapping representation for a field
Example with display_property:
{
"field_name": "author",
"label": "Author",
"display_property": "dc:creator"
}
Default validation includes ignore_missing and unicode. When you specify custom validators, include these explicitly if you need them.
Schema configuration
Configure your CKAN instance to use the defined schemas for dataset metadata management.
Single schema setup
Configure your primary schema in CKAN configuration:
ckan config-tool $CKAN_INI -s app:main \
"scheming.dataset_schemas = ckanext.healthri:scheming/schemas/gdi_userportal.json"\
"scheming.presets = ckanext.scheming:presets.json"\
"scheming.dataset_fallback = false"
Multiple schema support
Configure multiple schemas using a declaration file for different dataset types:
[
{
"dataset_type": "dataset",
"about": "Dataset",
"about_url": "https://dataplatform.nl/what-is-a-dataset",
"schemas": [
"ckanext.healthri:scheming/schemas/core_schema.json",
"ckanext.healthri:scheming/schemas/health_ri.json"
]
},
{
"dataset_type": "geo_dataset",
"about": "Geo Document",
"about_url": "https://dataplatform.nl/what-is-a-dataset",
"schemas": ["ckanext.healthri:scheming/schemas/geo_document.json"]
}
]
Reference the multi-schema file in ckan.ini:
scheming.dataset_multi_schemas = ckanext.healthri:scheming/schemas/multi_schemas.json
Schema merging behaviour
The system handles schema merging differently based on the implementation:
- Core CKAN: The latest schema with the same
dataset_typetakes precedence over earlier definitions - GDI implementation: The system merges schemas with the same type, and field order follows the schema order in the configuration
- Field merging: The
ckanext.scheming.overwrite_fieldsparameter controls how the system merges individual fields
Schema deployment
Update running CKAN instance
Change the schema in a running Docker container:
docker exec -it ckan /bin/sh
vi /srv/app/ckan.ini # change the schema
The system automatically updates CKAN when you make changes to ckan.ini.
Schema path format
Define schemas using the format: <extension name with dashes replaced with dots>:<path to schema .json file>
Example: ckanext.healthri:scheming/schemas/gdi_userportal.json
Schema management APIs
Manage schemas programmatically using CKAN APIs:
# List all dataset schema types
GET http(s)://<ckan-host>/api/action/scheming_dataset_schema_list
# Get specific schema details
GET http(s)://<ckan-host>/api/action/scheming_dataset_schema_show?type=<dataset_type>
Best practices
Follow these practices when designing and deploying schemas.
Schema design
- Follow DCAT-AP standards: Ensure interoperability with other data catalogues
- Design for user experience: Prioritise usability over technical complexity
- Include comprehensive help text: Provide clear guidance for complex fields
- Test with real users: Validate schemas with actual users before deployment
Deployment
- Test in development first: Validate schema changes in a development environment before production
- Document modifications: Record all schema changes for audit trails and troubleshooting
- Consider migration impact: Assess how schema changes affect existing datasets
- Back up data: Create backups before applying major schema updates
For comprehensive schema development, see the CKAN scheming documentation.
After configuring schemas:
- Manage user roles and permissions: Control access to schema management
- Manage data and services: Configure data workflows
- Monitor and maintain the system: Track schema usage and performance