How to find an appropriate research data repository?

As more and more funders and journals adopt data policies that require researchers to deposit underlying research data in a data repository, the question how to choose a repository becomes more and more important. Heinz Pampel is one of the people behind re3data.org, an Open Science tool that helps researchers to easily identify a suitable repository for their data and thus comply to requirements set out in data policies.

Background

The debate on open access to research data is gaining relevance. This February, the federal agencies in the U.S. have been told by the Office of Science and Technology Policy (OSTP) to maximize access to data from publicly funded research. In June, the G8 science ministers published a set of principles for open scientific research data. The ministers declared that, if possible, „publicly funded scientific research data should be open“. And already last year, the European Commission announced a pilot framework in Horizon 2020, the coming EU framework programme for research and innovation, to promote open access to research data.

Although scientists agree with the potential benefit of data sharing for the scientific progress, the majority is reserved when it comes to practical implementations. One reason for the reluctance is a lack of reliable “systems that make it quick and easy to share data” (Tenopir et al. 2011).

The current landscape of data repositories is heterogeneous. Some initiatives like the Data Seal of Approval (DSA) and the World Data System (WDS) are working on the standardization of data repositories. And there are already certification and auditing procedures for data repositories. Two examples are the DIN 31644 and the ISO 16363 standards. But these standards are not widely used yet. Research data repositories and their services are mostly characterized by the scientific discipline in which they work. They store a wide variety of file formats under different conditions for access and reuse. In many cases it is difficult for researchers to find an appropriate repository for the storage of their data. To overcome these shortcomings we started re3data.org – Registry of Research Data Repositories.

re3data.org – Registry of Research Data Repositories

Launched in 2012, re3data.org provides an overview of existing research data repositories. In September 2013, re3data.org lists 600 research data repositories, 400 of these are described in detail by a comprehensive vocabulary. The registry covers data repositories from all academic disciplines.

In re3data.org researchers can easily see the terms of access and use of each data repositories and other characteristics. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data.

 

Aspects of a Research Data Repository with the corresponding icons used in re3data.org.

Aspects of a Research Data Repository with the corresponding icons used in re3data.org.

re3data.org covers the following aspects of a research data repository:

  • general information (e.g. short description of the repository, content types, keywords),
  • responsibilities (e.g. institutions responsible for funding, content or technical issues),
  • policies (e.g. guidelines and policies of the repository),
  • legal aspects (e.g. licenses of the database and datasets),
  • technical standards (e.g. APIs, versioning of datasets, software of the repository),
  • quality standards (e.g. certificates, audit processes).

The re3data.org portal offers two search possibilities: (1) free text search through a simple search box, and (2) filters for more specific searches. In the list of results each record includes the name of the repository, the subjects covered, a brief description of the content and a set of icons visualizing key properties of the repository. A comprehensive view of the descriptive record of the repository can be obtained by clicking on the name of the repository in the search results.  It is also possible to simply browse through the list of indexed data repositories.

re3data-screenshot

Example screenshot of search results for geosciences data repositories using persistent identifiers.

Operators of data repositories can suggest their infrastructures to be listed in re3data.org by filling in an online application form. A repository is indexed when the minimum requirements for inclusion in re3data.org are met. These requirements are described in the re3data.org vocabulary. The project team reviews each repository and reviewed repositories are identified by a green check mark.

The project cooperates with other Open Science initiatives like BioSharing, DataCite and OpenAIRE. Some publishers already refer to re3data.org in their Editorial Policies as a tool for the identification of suitable data repositories.

Next Steps

In the upcoming project phase the focus will be on improving usability and implementing new features. Among other things, the dialog with repositories operators will be supported by a workflow system. Beyond the development of the registry, the project will promote the standardization of research data repositories.

re3data.org is funded by the German Research Foundation (DFG). Project partners are GFZ German Research Centre for Geosciences,  Humboldt-Universität zu Berlin and Karlsruhe Institute of Technology (KIT).  These three partners, with their expertise in information infrastructures, guarantee the sustainability of the registry.

Further information on re3data.org can be found in a recently published article in PLOS ONE:

Pampel, H., et al. (2013). Making Research Data Repositories Visible: The re3data.org Registry. PLOS ONE. doi: 10.1371/journal.pone.0078080

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
This entry was posted in Tech and tagged . Bookmark the permalink.

8 Responses to How to find an appropriate research data repository?

  1. Pingback: How to find an appropriate research data repository? | DA6NCI

  2. Pingback: “Die Zeit” über Wissenschaft, Transparenz und Open Science | wisspub.net

  3. Pingback: Reporte Ciencia UANL » How to find an appropriate research data repository?

  4. Re3data is fantastic and one of the main things I link to when talking to researchers about publishing their data. The main new feature I’d like at the moment is the ability to search by acceptance policy. Currently it’s great for finding repositories where you can find open data, but some/many of these repositories seem to just be publishing the output of a particular research group/consortium.

    At a minimum, just a “Yes, this repository is open to public submissions” would be great; even better would be to finetune that: is the submission process moderated/reviewed? is there a deposit fee? etc.

    Looking forward to seeing the other new features in due course.

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • Thanks for the compliments, Deborah.
      Concerning your suggestions I am happy to tell you that we already implemented some of these.
      The issue of open submission is solved with the open access icon. At the end of our vocabulary you can find a diagram showing how we define an “open” research data repository (RDR)[1]. But you are definitely right, we should point this out more clearly, maybe in the FAQ section.
      Concerning the review or moderation of submissions we tried to tackle this issue by showing whether a RDR has a quality management. Does that answer your question?
      Finally the issue of the deposit fee is actually addressed in our vocabulary. Try to search for feeRequired and re3data.org should list all those RDRs. Would it help if we introduced a check box with which you could exclude those RDR requiring a deposit fee?
      We really appreciate your feedback and are looking forward to hear more from you.
      [1] http://dx.doi.org/10.2312/re3.002

      VA:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
      • Thanks for your reply, Paul, and the link to your vocabulary. It looks like you gather a lot more metadata about repositories than I’d realised, so the question is just making that metadata searchable by users.

        Of course a related question is how much to make searchable – functionality vs feature bloat. People who want really detailed information will happily delve into an API if that provides granular access to all the fields in your vocab. But the search interface should focus on the average user, and would too much clutter specificity be desirable here? You wouldn’t want to run the risk of someone trying to search for a repository about grasshoppers that’s open in all respects and requires no fees – then getting zero results and concluding there’s nothing available for them. Still, I think a researcher wanting to find out where they can publish their dataset is a natural use case.

        I’m just not sure if “open access” conveys “open to submissions” to the average user. (Explaining in the FAQ will reach us librarians but may not reach the too-busy-to-explore researcher.) And despite your comment and the vocabulary explanation – as currently implemented (eg http://service.re3data.org/repository/r3d100010232#termsTab, http://service.re3data.org/repository/r3d100010156#termsTab, which both show an OA icon but are closed to submissions) and defined in the FAQ (“The research data repository provides open access to its data”) it doesn’t seem to have any reference to whether a repository is open to submissions.

        So maybe a separate search option should be looked into? I don’t honestly see the use of conflating open access vs open deposit: I think users would want the ability to search differently depending on whether they want to find data for reuse vs wanting to share their own data.

        Separately, the search you suggest for “feeRequired” seems like it might be ambiguous between repositories requiring fees for access to data and those requiring fees for submission of data. All the ones I’ve checked (eg at random http://service.re3data.org/repository/r3d100010251#termsTab) have it in the data access slot. I can’t figure out how to search for something that only requires a fee to upload data. Dryad comes to mind as an example, but the fact that this was a recent change could as easily explain why this isn’t (yet?) listed in its re3data record: http://service.re3data.org/repository/r3d100000044#termsTab

        Hope some of this is useful, and thanks again, this is already giving me a better handle on what I can find with re3data.

        VA:F [1.9.22_1171]
        Rating: 0 (from 0 votes)
  5. Pingback: Paper on re3data.org published | re3data.org

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>