Author: Heinz Pampel

How to find an appropriate research data repository?

As more and more funders and journals adopt data policies that require researchers to deposit underlying research data in a data repository, the question how to choose a repository becomes more and more important. Heinz Pampel is one of the people behind re3data.org, an Open Science tool that helps researchers to easily identify a suitable repository for their data and thus comply to requirements set out in data policies.

Background

The debate on open access to research data is gaining relevance. This February, the federal agencies in the U.S. have been told by the Office of Science and Technology Policy (OSTP) to maximize access to data from publicly funded research. In June, the G8 science ministers published a set of principles for open scientific research data. The ministers declared that, if possible, „publicly funded scientific research data should be open“. And already last year, the European Commission announced a pilot framework in Horizon 2020, the coming EU framework programme for research and innovation, to promote open access to research data.

Although scientists agree with the potential benefit of data sharing for the scientific progress, the majority is reserved when it comes to practical implementations. One reason for the reluctance is a lack of reliable “systems that make it quick and easy to share data” (Tenopir et al. 2011).

The current landscape of data repositories is heterogeneous. Some initiatives like the Data Seal of Approval (DSA) and the World Data System (WDS) are working on the standardization of data repositories. And there are already certification and auditing procedures for data repositories. Two examples are the DIN 31644 and the ISO 16363 standards. But these standards are not widely used yet. Research data repositories and their services are mostly characterized by the scientific discipline in which they work. They store a wide variety of file formats under different conditions for access and reuse. In many cases it is difficult for researchers to find an appropriate repository for the storage of their data. To overcome these shortcomings we started re3data.org – Registry of Research Data Repositories.

re3data.org – Registry of Research Data Repositories

Launched in 2012, re3data.org provides an overview of existing research data repositories. In September 2013, re3data.org lists 600 research data repositories, 400 of these are described in detail by a comprehensive vocabulary. The registry covers data repositories from all academic disciplines.

In re3data.org researchers can easily see the terms of access and use of each data repositories and other characteristics. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data.

 

Aspects of a Research Data Repository with the corresponding icons used in re3data.org.

Aspects of a Research Data Repository with the corresponding icons used in re3data.org.

re3data.org covers the following aspects of a research data repository:

  • general information (e.g. short description of the repository, content types, keywords),
  • responsibilities (e.g. institutions responsible for funding, content or technical issues),
  • policies (e.g. guidelines and policies of the repository),
  • legal aspects (e.g. licenses of the database and datasets),
  • technical standards (e.g. APIs, versioning of datasets, software of the repository),
  • quality standards (e.g. certificates, audit processes).

The re3data.org portal offers two search possibilities: (1) free text search through a simple search box, and (2) filters for more specific searches. In the list of results each record includes the name of the repository, the subjects covered, a brief description of the content and a set of icons visualizing key properties of the repository. A comprehensive view of the descriptive record of the repository can be obtained by clicking on the name of the repository in the search results.  It is also possible to simply browse through the list of indexed data repositories.

re3data-screenshot

Example screenshot of search results for geosciences data repositories using persistent identifiers.

Operators of data repositories can suggest their infrastructures to be listed in re3data.org by filling in an online application form. A repository is indexed when the minimum requirements for inclusion in re3data.org are met. These requirements are described in the re3data.org vocabulary. The project team reviews each repository and reviewed repositories are identified by a green check mark.

The project cooperates with other Open Science initiatives like BioSharing, DataCite and OpenAIRE. Some publishers already refer to re3data.org in their Editorial Policies as a tool for the identification of suitable data repositories.

Next Steps

In the upcoming project phase the focus will be on improving usability and implementing new features. Among other things, the dialog with repositories operators will be supported by a workflow system. Beyond the development of the registry, the project will promote the standardization of research data repositories.

re3data.org is funded by the German Research Foundation (DFG). Project partners are GFZ German Research Centre for Geosciences,  Humboldt-Universität zu Berlin and Karlsruhe Institute of Technology (KIT).  These three partners, with their expertise in information infrastructures, guarantee the sustainability of the registry.

Further information on re3data.org can be found in a recently published article in PLOS ONE:

Pampel, H., et al. (2013). Making Research Data Repositories Visible: The re3data.org Registry. PLOS ONE. doi: 10.1371/journal.pone.0078080

Category: Tech | Tagged | 8 Comments