Data Preservation, Access, and Associated Timelines

NIH Exepctations

Give plans and timelines for data preservation and access, including:

  • The name of the repository(ies) where scientific data and metadata arising from the project will be archived. See Selecting a Data Repository for information on selecting an appropriate repository.
  • How the scientific data will be findable and identifiable, i.e., via a persistent unique identifier or other standard indexing tools.
  • When the scientific data will be made available to other users and for how long. Identify any differences in timelines for different subsets of scientific data to be shared.

Adapted from: Writing a Data Management & Sharing Plan

Why is this being asked?

in order to ensure data can effectively be shared and reused, there needs to be a plan for when and how it will be shared. This section allows you to explain in detail the details regarding your plan to make your data available, if applicable. 

What to Include

Data Repositories

  • A data repository is a type of large database built specifically to store data. It differs from other types of storage options in they are typically geared towards capturing more specific information during deposition necessary for data discovery and storage. Data repositories also typically allow varieties of access privileges data owners can enforce on users attempting to access their data.
  • There are a large number of possible data repositories available for you to deposit your data into, and choosing one can come down to a variety of factors such as affiliation with a funding agency section, possible limitations related to your data, or simply personal preference. The NIH has compiled a list of desireable characteristics to consider when selecting a repository, and below are some additional options to look at, in order of decreasing recommendation of use for complying with the NIH Policy.
    • Option 1: Use accepted repositories in field. If not known, browse NIH Repositories for Sharing Scientific Data to find an applicable repository (NOTE: This should always be your first option. Most datasets affiliated with NIH funding should be able to be deposited in a data repository supported by their funding Institute or Center)
    • Option 2: Use re3data to check for potential field-specific repositories not funded by NIH. Visit this page for additional recommendations on choosing a repository for this option.
    • Option 3: Use an acceptable generalist repository. These are repositories that don't focus on any particular field of study, but are reputable, safe places to store your data. Visit this page for additional information on choosing a generalist repository.
    • Option 4: Use eCommons, Cornell's institionaly repository. This is not specifically built to host datasets, but it can host data in a pinch.
    • Option 5: If you can't find an appropriate repository using any of the other options, or if you will not be sharing your data due to data restrictions, you can archive your date in the WCM Institutional Data Repository for Research (WIDRR) (NOTE: This should only be done as a last resort if sharing your data. WIDRR is currently not set up to allow sharing to outside entities. As a result, if you can share your data, putting it in WIDRR does not make you compliant with the NIH DMS Policy)
  • For foreign sub-awardees, LabArchives can be used to share data. For more information on using LabArchives, please visit this guide, and for more infomation on the NIH data policy for foreign sub-awardees, please visit the NIH Subawards page.

Data Identifiers

  • Indicate whether persistent identifiers will be made available by the repository of choice. The most common identifier used to link to datasets is the DOI, or 'Digital Object Identifier'. Not every repository will generate a DOI for each dataset, but all repositories should generate some form of unique identifier. 

Data Availability Timeline

  • Indicate when data will initially be available, how long it will be available for, and any milestones that could trigger a data sharing event. The WCM Data Retention Policy dictates that data be made available within three years of closeout of project or upon publication, and that it is available for at least six years, with an additional six years if self-cited. The NIH dictates that data be made available at the end of the performance period or upon publication, so there is a good bit overlap between the two policies when it comes to when to start sharing. In summary:
    • Start sharing: three years from closeout of project, end of the grant performance period, or publication of funded research paper
    • Stop sharing: at least six years, plus an additional six years if self-cited

Data Requests

  • Indicate the process/workflow by which a user will access the data. For NIH or generalist repositories, data will typically be accessed or requested directly by the user, with minimal or no intervention necessary from you. Each data repository will handle this differently, however, so be sure to be inform yourself on the individual repository policies.

Sample Responses 

  • All data resulting from this project will be deposited as a batch in Zenodo within 60 days of the closeout of the project and made available for at least six years. Additionally, data for each publication for this project will be deposited in Zenodo upon publication. Zenodo issues DOIs, uses the DataCite Metadata Schema, and adheres to FAIR Principles.
  • This study is approved by the Weill Cornell Medicine Institutional Review Board (Protocol 98734768), and the approved informed consent form does not allow for sharing of data collected during the study. As a result, data will not be shared at the conclusion of the study. However, per the WCM Data Retention Policy, all data and supporting documentation will be archived in the WCM Institutional Data Repository for Research at publication or at the end of the grant performance period.