Reset Datasets Run Harvester
Goal: clear all non-harvest CKAN content in the target environment, run a local CKAN with the deployed version against that environment’s database and Solr, and execute harvester run-test for every harvest source.
Warning: this is a destructive operation. The SQL statements below delete every non-harvest dataset, resource, tag, and package from the target CKAN database. Take a backup before proceeding. Pointing a local CKAN instance to a shared environment database or Solr instance can also impact the environment’s availability. Coordinate with the team and product owner before continuing.
Scope: the commands target the CKAN PostgreSQL database (ckandb).
- Access to the environment PostgreSQL CKAN database (
ckandb) via a database client. - Access to the environment Solr endpoint (URL and credentials).
- Local CKAN checkout with the exact tag or commit deployed in the environment.
Connect your database client to the environment CKAN PostgreSQL database and run:
TRUNCATE TABLE harvest_object CASCADE;
DELETE FROM package_tag
WHERE package_id IN (
SELECT id
FROM package
WHERE type != 'harvest'
);
DELETE FROM package_extra
WHERE package_id IN (
SELECT id
FROM package
WHERE type != 'harvest'
);
DELETE FROM resource
WHERE package_id IN (
SELECT id
FROM package
WHERE type != 'harvest'
);
DELETE FROM package
WHERE type != 'harvest';
Optional verification of remaining harvest sources:
SELECT name
FROM package
WHERE type = 'harvest'
ORDER BY name;
Identify the deployed version (tag or commit) running in the target environment.
Check out the same version locally:
git fetch --all --tags git checkout <DEPLOYED_TAG_OR_COMMIT> git status # should show a clean working treeEnsure there are no outstanding local commits or untracked changes.
Point your local CKAN instance at the environment database and Solr by updating the environment variables used by your
ckan.ini:CKAN_SQLALCHEMY_URL=postgresql://<user>:<pass>@<host>:<port>/<dbname> CKAN_SOLR_URL=http://<solr-host>:<solr-port>/solr/<core> CKAN_SITE_URL=https://<ckan-env>.healthdata.nlRestart the local stack:
docker compose down docker compose up -d --build # or: docker-compose ... depending on your CLI
Obtain the IDs (names) of all harvest sources:
SELECT name FROM package WHERE type = 'harvest' AND state = 'active' ORDER BY name;Execute
harvester run-testfor each source (replace the list with the names from your environment):for id in ist fdp-test ega csfi lnds-fdp university-of-oslo radboud maxima-medisch-centrum-test nbis lumc dataseries-blob health-dcat missing-fields-test aumc-fdp do ckan --config=ckan.ini harvester run-test "$id" # Alternative syntax: ckan -c ckan.ini harvester run-test "$id" doneEnsure the
ckanCLI resolves inside your container or virtual environment. For Dockerised setups:docker compose exec ckan ckan -c /srv/app/ckan.ini harvester run-test <id>
Run the following to synchronise Solr with CKAN:
ckan -c ckan.ini search-index clear
ckan -c ckan.ini search-index rebuild