Research Data Management, Retention, and Sharing

As described by the National Institute of Standards and Technology (NIST), “it is widely recognized that data, specifically research data, are of growing importance and impact to the economy and society”. The NIST diagram below illustrates the stages of the data lifecycle – from planning to managing to retaining & archiving.

data lifecycle

Reflecting this growing importance, Cornell University updated the Research Data Retention Policy in July 2022, and the NIH is revising their Data Sharing & Management Policy in January 2023.

General recommendations for research data storage

Complying with the Cornell University Data Retention Policy and the NIH Data Management and Sharing Policy (2023)

The following section offers guidance on how to comply with the Cornell University Policy 4.21 on data retention and the new NIH Data Sharing & Management Policy.

Who is the custodian of the research data and responsible for answering the following questions?

The Principal Investigator.

The Cornell University Policy 4.21 on Research Data Retention specifies that principal investigators are the custodians of their research data and responsible for the proper use, access, security, and control of any research data under their management or supervision, including the data used in scholarly publications or presentations.

According to CU policy, when do I have to create a WCM Institutional Data Repository for Research (WIDRR) entry for my data?

1. Are your research data referenced in a publication?

Yes: Create a data retention record in the data retention tool upon publication (60 days after publication at the latest)
No: No action required

2. Are your research data a result of a grant that has just ended?

Yes: Create a record for your dataset in the data retention tool after grant closure (60 days after closure at the latest)
No: No action required

3. Are you leaving WCM or retiring?

Yes: Create a record for your dataset in the data retention tool before leaving (60 days before departure at the latest)
No: No action required

How long does the CU Policy require that I retain my data?

  • Six years after publication OR after grant close-out
  • An additional six years each time you cite your paper referencing the research data

Where should I deposit my data? Which data repository should I use?

Remember that any repositories you choose must also be able to share your data.

1. Does your funding agency or your journal require you to use a specified data repository?
YesDeposit data in the specified repository

No: Do researchers who work with similar data share their data in a specific repository?

Yes: Deposit in the repository used by your research community
NoContact the Wood Library for guidance on using a generalist repository. You can also use this NIH resource to help you choose an appropriate repository: NIH-Supported Data Sharing Resources.

Please remember: Once the data are deposited in a repository (that allows sharing if the data need to be shared), do not forget to create a record in the data retention tool to indicate the location of your dataset(s). 

Creating a record of data retention in WIDRR alone without depositing data in a NIH-recommended repository will not meet the NIH sharing requirements for dataset(s) that need to be shared.

If data are removed from the public repository, this will jeopardize compliance with both NIH and Cornell University policies. Any changes in data deposition must be versioned.

If I have followed the steps above, have I complied with the NIH data sharing policy effective January 25, 2023?

Yes for publications and grant close-outs if your data is in a repository that supports sharing.  

But, for those who want to initiate grants after January 25, 2023, you must also a Data Management and Sharing (DMS) plan.

Please remember the term Scientific Data is defined in the NIH policy as "The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens."

What are the differences in requirements between Cornell University and the new NIH Data Management and Sharing Policies?

NIH policy requirements:

The NIH policy requires investigators to share any scientific data to replicate or validate findings.

These do NOT include the following:

  • Lab notebooks
  • Preliminary analyses
  • Case reports
  • Manuscript draft
  • Future research plan
  • Peer reviews
  • Communication with colleagues
  • No lab specimen or other physical objects

NIH recommens keeping data at least three years after grant closeout, but this is different for a contract. The data should include methodology and procedures (including software) used to collect data, data labels, definitions of the variables, and any other information to reproduce and understand the data. NIH also advise the use of naming conventions resulting in unique identifiers, favor the use of Common Data Elements, and suggest advance thought about data storage format and its impact on the research budget, about version control, and the back-up of generated data.

Cornell University policy requirements:

The CU policy requires investigators to record the location of the following data in WIDRR:

  • Scientific raw data from publication (DOI of the publication must be provided)
  • Scientific raw data from any work not published before the investigator’s grant ends (ex.: Preliminary analyses). The funder grant ID must be provided.
  • Scientific raw data from any work not published before the investigator leaves WCM. The Service Now ticket related to the offboarding of the investigator must be provided.
  • Metadata associated with the raw dataset OR instruction on how to access the same raw dataset from the same data provider
  • Lab Notebooks
  • A methods file that details all the analytical steps performed on the raw data until their final published form. This includes software and code used. 

IMPORTANT: to be compliant with the CU policy, investigators must retain any data that cannot be shared in WIDRR. For data that need to be shared according to NIH policy and according to submitted DMP plans, investigators should use a NIH-approved repository and create a record in WIDRR to indicate the location of their dataset.

What do I need to do for new NIH grant applications submitted after January 25, 2023?

You must complete a maximum two-page data management and sharing plan (DMSP) that will be evaluated by NIH.

1. Review a checklist for researchers and NIH guidance before drafting your DMSP. The DMSP must include how data will be managed and shared, and identify the institutional process for confirming the plan is actually followed. Once the DMSP is accepted, it becomes part of the legal Terms and Conditions of the Notice of Award by incorporation. The DMSP can be updated at any time via a letter of prior approval from the Principal Investigator to the funding agency.

Best Practices for secure data storage

  • ITS provides several options for dataset storage. WCM recommends the use of one of these three options for dataset storage:

2. Determine appropriate data to manage and share. What data need to be managed and by whom?  According to the definition of scientific data above, all scientific data need to be managed (data needs to be backed-up, version controlled, with unique identifiers), but not all scientific data need to be shared.  The PI is responsible for the management and sharing according to NIH policy.

What data need to be shared under the NIH policy? The NIH policy expects researchers to maximize appropriate data sharing when developing DMSPs.

For Human Subject research data, NIH recommend the Principal Investigators to:

  • Share according to federal, state/local, tribal, and institutional rules or laws
  • Share the DMSP with study participants as early as possible during the informed consent process
  • Outline steps to protect privacy, rights, and confidentiality
  • Share the limitations on data usage with the person preserving and sharing the data (at WCM these limitations should be shared with the Library) determine if a controlled access is necessary for these datasets, even in the case of de-identified or non-limited datasets.

All limitations on sharing and steps to protect privacy, rights, and confidentiality for sensitive data should be documented in the DMSP.

3. Document the following in your DMSP:

4. Write the DMSP

What do I need to do for grant renewals?

  • Compare your existing data practices with what is required for the renewal
  • Identify gaps in your existing data management plans and practices
  • Address how you will begin sharing this data
  • Consider things the new policy may require, such as Data Use Agreements (DUAs) data de-identification before sharing, data documentation, and upload into a data repository
  • Write your DMSP according to the guidance above

What do I need to submit as part of my funding proposal?

Budget

Costs to execute a DMSP are allowable as a line item in the budget. A summary of the DMSP must be provided in the budget justification.

What are the allowable costs?

Allowable costs include any reasonable, justifiable costs required to comply with the DMSP.

Some examples are:

  • Labor for data curation (e.g., formatting data, de-identifying data, preparing metadata to foster discoverability, interpretation, and reuse)
  • Preserving and sharing data through established repositories, formatting data for transmission to and storage at a selected, established repository for long-term preservation and access (if fees apply)
  • Developing supporting documentation
  • De-identifying data
  • Local data management considerations, such as unique infrastructure necessary to provide local management and preservation (before being deposited in an established repository)
  • Other costs

What are the unallowable costs?

  • Infrastructure costs that are included in institutional overhead (e.g., Facilities and Administration costs)
  • Costs associated with the routine conduct of research, including costs associated with collecting or gaining access to research data
  • Costs that are double charged or inconsistently charged as both direct and indirect costs

Where are the costs represented?

The costs must be included in the SF 424 R&R budget form in Section F. Other direct costs or PHS 398 can be included for Modular Budgets. There will be a new Budget Line Item labeled “Data Management and Sharing.” The costs must also be included in Section L of Budget Justification.

Who reviews the budget?

The Center for Scientific Review (CSR) will check DMSPs for completeness and viability. The Peer Review Committee (PRC) will assess the budget and the budget justification for feasibility. The PRC will not see the DMSP which will not impact the scoring.

More information on budgeting for data management and sharing can be found here.

What tools are available for compliance purposes during my grant award period?

Storage, Backups, Security:

Generalized Storage:

Specialized Storage:

To choose an appropriate repository we recommend the following steps:

choose a data repo

This flowchart aims to guide investigators in decisions about their data retention and sharing duties for Cornell University and NIH policy compliance.

data repository flowchart

NIH has classified their repositories by funding agencies to help researchers locate the public repositories available under a specific funding Institute or Center. The link below shows lists of repositories that include the Institute or Center, Repository Name, Description, Submission Policy, and How to Access the Data. For guidance on the best repository for your data, contact the Wood Library.

NIH-recommended generalist respositories

The NIH has endorsed nine generalist repositories that house data regardless of type, format, content, or subject matter. The NIH recommended generalist repositories are available through this link: https://www.nlm.nih.gov/NIHbmic/generalist_repositories.html.

For guidance on the best repository for your data, contact the Wood Library.

Other data repositories

Other resources to help researchers find the right repositories can be found on the Samuel J. Wood Library Data Preservation, Access and Associated Timeframes site or the  Arizona University website under the Tools for Finding Repository section.

Data Sharing

The new NIH policy requires a plan to maximize data sharing, while acknowledging factors (legal, ethical, or technical) that may affect the extent of data sharing. The policy requires human subjects research to have consent forms for data sharing, including de-identified data.  The policy also requires that tribal authorities must give appropriate approvals to share data of indigenous peoples.

Where do I share my data?

You share via the same established data repositories in which you chose to deposit your data, such as:

When do I share my data?

The rule of thumb is: as soon as possible.

Consider relevant expectations such as data repository policies, record retention requirements, or journal policies.

NIH states that you must share your data when you publish your work or before your performance period ends, whichever comes first.

How do I share my data?

  • Address the NIH’s goal of making data as accessible as possible. The NIH expect all sharable data to be made available, whether associated with a publication or not. 
  • All data used or generated as part of a grant must be managed, but not all data should be shared. You should not share data if doing so would violate privacy protections or applicable laws. If your data are not shareable, you must justify it when writing your DMSP.
  • You may share human subjects-related data as long as your plan addresses how data sharing will be communicated in the consent process, and patients have given informed consent. See NIH sample consent language.


Before submitting your data to a repository, you will need to: 

1. Bundle data together in logical groups for citation and reuse with assigned persistent identifiers (e.g., dataset DOIs)

2. De-identify your data, if appropriate

3. Convert your data to an open, machine-readable file format, such as .csv, when possible

4. Use data and metadata standards if appropriate to your field.  Fairsharing.org is a database of such standards.

5. Document the dataset in a separate readme.txt file, and/or create metadata required by your chosen repository or discipline. Refer to the Data Documentation and Metadata Page for more.

What do I need to do for compliance and institutional oversight?

NIH Compliance and Monitoring:

  • You must document your compliance with your DMSP in your annual Research Performance Progress Report (RPPR).  Non-compliance may result in NIH enforcement action such as:
    • Addition of special terms or conditions to the award
    • Termination of the award
    • Non-compliance may also affect future funding decisions
  • If you make changes to your current DMSP, your new plan must be approved by NIH, but the process varies depending on whether the change is made pre-award or post award.

Institutional Oversight:

  • PIs will ultimately be responsible for ensuring the DMSP is executed
  • The IRB will be responsible for ensuring that the sharing of data pertaining to human subjects is consistent between the DMSP and informed consent
  • PIs will be responsible for ensuring Data Use Agreements are in place before sharing sensitive data
  • Before sharing any data from the data core, data curators will ensure that the data have been de-identified, and will work with the PI and IRB to ensure that proper consents and permissions have been obtained to share the data