Data repositories#
When choosing a repository for research data, some criteria to consider include the type of repository, the level of data access, the availability of a DOI, and the option of an embargo (or grace) period.
Types of repositories#
There at least five types of research data repositories:
Disciplinary repositories, focused on a specific research field or discipline (e.g., OpenNeuro)
Institutional repositories, managed by academic institutions or research organizations to store and share data produced by their researchers (e.g., Harvard Dataverse)
General-purpose repositories, which accept data from a wide range of disciplines and are not field-specific (e.g., Zenodo)
Governmental repositories, hosted by government agencies to provide access to publicly funded research data (e.g., NASA EarthData)
Project-specific repositories, created for a specific research project or consortium to manage and share project data (e.g., Osteoarthritis Initiative)
Data access#
The level of data protection in research data repositories can vary based on the repository’s policies, the sensitivity of the data, and legal or ethical requirements. Generally, there are three main levels of data protection [University of Bristol, 2017]:
Open Access Repositories: Data does not contain sensitive information and is freely available to anyone. Participants have given consent for their data to be shared openly. Any identifiers have been removed. Example: OpenNeuro
Restricted Access Repositories: Data may require additional safeguards due to privacy concerns, confidentiality, or ethical considerations. Participants may have given limited consent specifying that only researchers should have access to the data, or it may be that some risks of re-identifying a participant remain. Users have to register for access, agree to terms and conditions, or sign a Data User Agreement. Example: Osteoarthritis Initiative
Controlled Access Repositories: Data includes personally identifiable information or protected health information, such as medical records, genomic data, or clinical trial data. Each request for access to this kind of data is carefully considered before a decision is made. Strict access controls are in place, often requiring institutional approval, user authentication, and data user agreements. Only authorized researchers may access the data under specific conditions, and data may be encrypted or anonymized. Example: UK Biobank
DOI#
Some repositories provide a digital object identifier (DOI) associated to a dataset. A DOI is a unique string of numbers, letters and symbols used to identify digital content such as research papers, datasets, and reports. DOIs provide a permanent web address making the documents easy to access and citable, even if the location or URL of the content changes. An example of a repository that provides a DOI is Zenodo.
Embargo or grace period#
Sometimes researchers want to share their data after publishing multiple papers referring to the same dataset. In such cases, selecting a repository that offers an embargo or grace period can be advantageous. An embargo period or grace period refers to a defined time during which access to deposited research data is restricted or delayed. During this time, the data is not made publicly available, although it is stored in the repository. Researchers may use this period to complete their analyses, publish findings, or secure intellectual property rights before allowing others to access their data. Examples of repositories that provide an embargo or grace period are Zenodo and OpenNeuro.