As described by the National Institute of Standards and Technology (NIST), “it is widely recognized that data, specifically research data, are of growing importance and impact to the economy and society”. The NIST diagram below illustrates the stages of the data lifecycle – from planning to managing to retaining & archiving.
Complying with the Cornell University Data Retention Policy and the NIH Data Management and Sharing Policy (2023)
Who is the custodian of the research data and responsible for answering the following questions?
The Principal Investigator.
The Cornell University Policy 4.21 on Research Data Retention specifies that principal investigators are the custodians of their research data and responsible for the proper use, access, security, and control of any research data under their management or supervision, including the data used in scholarly publications or presentations.
According to CU policy, when do I have to create a WCM Institutional Data Repository for Research (WIDRR) entry for my data?
1. Are your research data referenced in a publication?
Yes: Create a data retention record in the data retention tool upon publication (60 days after publication at the latest)
No: No action required
2. Are your research data a result of a grant that has just ended?
Yes: Create a record for your dataset in the data retention tool after grant closure (60 days after closure at the latest)
No: No action required
3. Are you leaving WCM or retiring?
Yes: Create a record for your dataset in the data retention tool before leaving (60 days before departure at the latest)
No: No action required
How long does the CU Policy require that I retain my data?
- Six years after publication OR after grant close-out
- An additional six years each time you cite your paper referencing the research data
Where should I deposit my data? Which data repository should I use?
Remember that any repositories you choose must also be able to share your data.
1. Does your funding agency or your journal require you to use a specified data repository?
Yes: Deposit data in the specified repository
No: Do researchers who work with similar data share their data in a specific repository?
Yes: Deposit in the repository used by your research community
No: Contact the Wood Library for guidance on using a generalist repository. You can also use this NIH resource to help you choose an appropriate repository: NIH-Supported Data Sharing Resources.
Please remember: Once the data are deposited in a repository (that allows sharing if the data need to be shared), do not forget to create a record in the data retention tool to indicate the location of your dataset(s).
Creating a record of data retention in WIDRR alone without depositing data in a NIH-recommended repository will not meet the NIH sharing requirements for dataset(s) that need to be shared.
If data are removed from the public repository, this will jeopardize compliance with both NIH and Cornell University policies. Any changes in data deposition must be versioned.
If I have followed the steps above, have I complied with the NIH data sharing policy effective January 25, 2023?
Yes for publications and grant close-outs if your data is in a repository that supports sharing.
But, for those who want to initiate grants after January 25, 2023, you must also a Data Management and Sharing (DMS) plan.
Please remember the term Scientific Data is defined in the NIH policy as "The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens."
What are the differences in requirements between Cornell University and the new NIH Data Management and Sharing Policies?
NIH policy requirements:
The NIH policy requires investigators to share any scientific data to replicate or validate findings.
These do NOT include the following:
- Lab notebooks
- Preliminary analyses
- Case reports
- Manuscript draft
- Future research plan
- Peer reviews
- Communication with colleagues
- No lab specimen or other physical objects
NIH recommens keeping data at least three years after grant closeout, but this is different for a contract. The data should include methodology and procedures (including software) used to collect data, data labels, definitions of the variables, and any other information to reproduce and understand the data. NIH also advise the use of naming conventions resulting in unique identifiers, favor the use of Common Data Elements, and suggest advance thought about data storage format and its impact on the research budget, about version control, and the back-up of generated data.
Cornell University policy requirements:
The CU policy requires investigators to record the location of the following data in WIDRR:
- Scientific raw data from publication (DOI of the publication must be provided)
- Scientific raw data from any work not published before the investigator’s grant ends (ex.: Preliminary analyses). The funder grant ID must be provided.
- Scientific raw data from any work not published before the investigator leaves WCM. The Service Now ticket related to the offboarding of the investigator must be provided.
- Metadata associated with the raw dataset OR instruction on how to access the same raw dataset from the same data provider
- Lab Notebooks
- A methods file that details all the analytical steps performed on the raw data until their final published form. This includes software and code used.
IMPORTANT: to be compliant with the CU policy, investigators must retain any data that cannot be shared in WIDRR. For data that need to be shared according to NIH policy and according to submitted DMP plans, investigators should use a NIH-approved repository and create a record in WIDRR to indicate the location of their dataset.
What do I need to do for new NIH grant applications submitted after January 25, 2023?
You must complete a maximum two-page data management and sharing plan (DMSP) that will be evaluated by NIH.
1. Review a checklist for researchers and NIH guidance before drafting your DMSP. The DMSP must include how data will be managed and shared, and identify the institutional process for confirming the plan is actually followed. Once the DMSP is accepted, it becomes part of the legal Terms and Conditions of the Notice of Award by incorporation. The DMSP can be updated at any time via a letter of prior approval from the Principal Investigator to the funding agency.
Best Practices for secure data storage
- ITS provides several options for dataset storage. WCM recommends the use of one of these three options for dataset storage:
- For HIPAA-protected data: the Data Core service is recommended for secure storage and computing
- Best practices for data sharing and archiving can be found here.
- Review Data Repositories
- Example data management plans
2. Determine appropriate data to manage and share. What data need to be managed and by whom? According to the definition of scientific data above, all scientific data need to be managed (data needs to be backed-up, version controlled, with unique identifiers), but not all scientific data need to be shared. The PI is responsible for the management and sharing according to NIH policy.
What data need to be shared under the NIH policy? The NIH policy expects researchers to maximize appropriate data sharing when developing DMSPs.
For Human Subject research data, NIH recommend the Principal Investigators to:
- Share according to federal, state/local, tribal, and institutional rules or laws
- Share the DMSP with study participants as early as possible during the informed consent process
- Outline steps to protect privacy, rights, and confidentiality
- Share the limitations on data usage with the person preserving and sharing the data (at WCM these limitations should be shared with the Library) determine if a controlled access is necessary for these datasets, even in the case of de-identified or non-limited datasets.
All limitations on sharing and steps to protect privacy, rights, and confidentiality for sensitive data should be documented in the DMSP.
3. Document the following in your DMSP:
- If datasets are subject to additional privacy controls, who will manage the controls and who will have access?
- For Data Use Agreements, ensure they permit de-identified data sharing and/or sharing of derived data sets, if possible.
- If your research involves human subjects, ensure that consent forms have language to allow for de-identified data sharing.
- If your research involves indigenous peoples, ensure that the appropriate tribal leaders/groups have approved your plan for data collection and use.
4. Write the DMSP
What do I need to do for grant renewals?
- Compare your existing data practices with what is required for the renewal
- Identify gaps in your existing data management plans and practices
- Address how you will begin sharing this data
- Consider things the new policy may require, such as Data Use Agreements (DUAs) data de-identification before sharing, data documentation, and upload into a data repository
- Write your DMSP according to the guidance above
What do I need to submit as part of my funding proposal?
- If your NIH sponsored research will generate scientific data, you must submit a DMSP as part of the Budget Justification Section
- This plan should be two pages or fewer and include the following information:
- Data Type
- Related Tools/Software and/or Code
- Data Preservation, Access, and Associated Timelines
- Access, Distribution, or Reuse Considerations
- Oversight of Data Management and Sharing
- To Get Started Writing a DMSP, use the NIH Guidance here
- Review NIMH sample DMSPs
Costs to execute a DMSP are allowable as a line item in the budget. A summary of the DMSP must be provided in the budget justification.
What are the allowable costs?
Allowable costs include any reasonable, justifiable costs required to comply with the DMSP.
Some examples are:
- Labor for data curation (e.g., formatting data, de-identifying data, preparing metadata to foster discoverability, interpretation, and reuse)
- Preserving and sharing data through established repositories, formatting data for transmission to and storage at a selected, established repository for long-term preservation and access (if fees apply)
- Developing supporting documentation
- De-identifying data
- Local data management considerations, such as unique infrastructure necessary to provide local management and preservation (before being deposited in an established repository)
- Other costs
What are the unallowable costs?
- Infrastructure costs that are included in institutional overhead (e.g., Facilities and Administration costs)
- Costs associated with the routine conduct of research, including costs associated with collecting or gaining access to research data
- Costs that are double charged or inconsistently charged as both direct and indirect costs
Where are the costs represented?
The costs must be included in the SF 424 R&R budget form in Section F. Other direct costs or PHS 398 can be included for Modular Budgets. There will be a new Budget Line Item labeled “Data Management and Sharing.” The costs must also be included in Section L of Budget Justification.
Who reviews the budget?
The Center for Scientific Review (CSR) will check DMSPs for completeness and viability. The Peer Review Committee (PRC) will assess the budget and the budget justification for feasibility. The PRC will not see the DMSP which will not impact the scoring.
More information on budgeting for data management and sharing can be found here.
What tools are available for compliance purposes during my grant award period?
Storage, Backups, Security:
- High Performance Computing
- Data Core (required for grants with IRB protocols)
- Data Repositories
To choose an appropriate repository we recommend the following steps:
This flowchart aims to guide investigators in decisions about their data retention and sharing duties for Cornell University and NIH policy compliance.
NIH has classified their repositories by funding agencies to help researchers locate the public repositories available under a specific funding Institute or Center. The link below shows lists of repositories that include the Institute or Center, Repository Name, Description, Submission Policy, and How to Access the Data. For guidance on the best repository for your data, contact the Wood Library.
- https://www.nlm.nih.gov/NIHbmic/domain_specific_repositories.html This list hosts data of a specific type or related to a specific discipline.
- For data that require limitations in access or for repositories and knowledge bases with limitations on submission, the NIH provides the following link: https://www.nlm.nih.gov/NIHbmic/other_data_resources.html
NIH-recommended generalist respositories
The NIH has endorsed nine generalist repositories that house data regardless of type, format, content, or subject matter. The NIH recommended generalist repositories are available through this link: https://www.nlm.nih.gov/NIHbmic/generalist_repositories.html.
For guidance on the best repository for your data, contact the Wood Library.
Other data repositories
Other resources to help researchers find the right repositories can be found on the Samuel J. Wood Library Data Preservation, Access and Associated Timeframes site or the Arizona University website under the Tools for Finding Repository section.
The new NIH policy requires a plan to maximize data sharing, while acknowledging factors (legal, ethical, or technical) that may affect the extent of data sharing. The policy requires human subjects research to have consent forms for data sharing, including de-identified data. The policy also requires that tribal authorities must give appropriate approvals to share data of indigenous peoples.
When do I share my data?
The rule of thumb is: as soon as possible.
Consider relevant expectations such as data repository policies, record retention requirements, or journal policies.
NIH states that you must share your data when you publish your work or before your performance period ends, whichever comes first.
How do I share my data?
- Address the NIH’s goal of making data as accessible as possible. The NIH expect all sharable data to be made available, whether associated with a publication or not.
- All data used or generated as part of a grant must be managed, but not all data should be shared. You should not share data if doing so would violate privacy protections or applicable laws. If your data are not shareable, you must justify it when writing your DMSP.
- You may share human subjects-related data as long as your plan addresses how data sharing will be communicated in the consent process, and patients have given informed consent. See NIH sample consent language.
Before submitting your data to a repository, you will need to:
1. Bundle data together in logical groups for citation and reuse with assigned persistent identifiers (e.g., dataset DOIs)
2. De-identify your data, if appropriate
3. Convert your data to an open, machine-readable file format, such as .csv, when possible
4. Use data and metadata standards if appropriate to your field. Fairsharing.org is a database of such standards.
5. Document the dataset in a separate readme.txt file, and/or create metadata required by your chosen repository or discipline. Refer to the Data Documentation and Metadata Page for more.
What do I need to do for compliance and institutional oversight?
NIH Compliance and Monitoring:
- You must document your compliance with your DMSP in your annual Research Performance Progress Report (RPPR). Non-compliance may result in NIH enforcement action such as:
- Addition of special terms or conditions to the award
- Termination of the award
- Non-compliance may also affect future funding decisions
- If you make changes to your current DMSP, your new plan must be approved by NIH, but the process varies depending on whether the change is made pre-award or post award.
- PIs will ultimately be responsible for ensuring the DMSP is executed
- The IRB will be responsible for ensuring that the sharing of data pertaining to human subjects is consistent between the DMSP and informed consent
- PIs will be responsible for ensuring Data Use Agreements are in place before sharing sensitive data
- Before sharing any data from the data core, data curators will ensure that the data have been de-identified, and will work with the PI and IRB to ensure that proper consents and permissions have been obtained to share the data