Skip to main content

About GDI

This page provides an overview of the Genomic Data Infrastructure (GDI) and its components.

What is GDI?

The Genomic Data Infrastructure (GDI) is Europe's federated network for genomic data sharing, enabling secure access to over one million genome sequences from across European countries.

The 1+ Million Genomes Initiative: GDI is the technical implementation of the European Union's 1+ Million Genomes Initiative, launched in 2018 when EU member states signed a declaration to create the world's largest genomic dataset for research and healthcare.

Key objectives of the 1+ Million Genomes Initiative include:

  • Advance personalised medicine: Enable precision treatments based on genetic profiles
  • Accelerate rare disease research: Provide researchers with unprecedented access to genomic data
  • Support population health: Understand genetic variations across European populations
  • Maintain data sovereignty: Keep genomic data under national control while enabling cross-border research

How it works

GDI operates through two main portals that work together to create a seamless data sharing ecosystem:

  • GDI User Portal: For researchers, clinicians, policy-makers and other data users

    • Discover datasets: Search and explore available genomic datasets across Europe
    • Request access: Submit access requests for research or clinical purposes
    • Access data: Receive approved access to genomic data for authorised research
    • Accessible at: portal.gdi.lu
  • GDI Data Catalogue: For data stewards, catalogue managers, and data managers

    • Publish datasets: Add genomic datasets and metadata to the federated network
    • Manage organisations: Oversee institutional data contributions
    • Configure harvesters: Set up automated metadata collection from data sources
    • Accessible at: catalogue.portal.gdi.lu
Dataset portals

Dataset metadata published in the Data Catalogue becomes discoverable through the User Portal, creating a federated network where European genomic data remains distributed but searchable.

Technical foundation

  • Federated architecture - Data stays in original locations while metadata is shared
  • FAIR principles - Data is Findable, Accessible, Interoperable, and Reusable
  • Built on CKAN - Uses proven open-source data portal technology
  • European standards - Complies with GDPR and European genomic data sharing frameworks