You are here: Home Guides Finding Statistics & Data Sets

Finding Statistics & Data Sets

Finding Statistics in the Literature

Statistics and information about data sets appear in the biomedical literature. In MEDLINE, there are two subheadings often applied to articles containing statistics. For articles about diseases/conditions, use the subheading EP - Epidemiology For everything else, use the subheading SN - Statistics and Numerical Data. Search on the MeSH term "Databases" along with your subject to find articles where large databases were used.

Once you know the source of the data, you can search the web to see whether the data set has been updated or expanded. Many statistical publications are updated yearly.

Internet Search and Manipulation Hints

Two questions to consider before beginning your statistical search:

  • Who cares about or has a mandate to study the topic?
  • Who has the resources and staff to collect data in this topic area?
Knowing the answers to these questions will help direct you to appropriate resources.

Look for information about the:

  • File Format (HTML, PDF, Excel, text, etc...)
  • Dates of Data (not the same as the publication date of the document or page)
  • Sources of Data
  • Contact Person
  • Suggested Citation
  • Availability of Documentation
  • Data Use Limitations
  • Anything special about the data?

Statistical Resources and Publications

Projects Presenting Spatial Data (Geographic Information Systems)

Projects Consolidating Data Sets

WCMC/NYP/CU Data and Statistical Expertise

Web Directories of Data Sets

  • Directory of Health and Human Services Data Resources - Compilation of collection systems sponsored by the U.S. Department of Health and Human Services (HHS). Databases from continuing departmental data projects or program administrative and evaluation activities that met the criterion of broad utility were included. Such data projects and systems included recurring surveys and disease registries either maintained or sponsored by HHS. Databases from one-time studies or data collections were also included when the data may have broad interest.
  • Health Services & Sciences Research Resources (HSRR) - HSRR is a searchable database of information about datasets and instruments/indices employed in Health Services Research, Behavioral and Social Sciences and Public Health with links to PubMed.
  • Health and Medical Care Archive - Robert Wood Johnson Foundation - Sponsored data sets at Inter-University Consortium for Political and Social Research (ICPSR) also more at http://www.icpsr.umich.edu/ - Cornell is a member.

Data Sets

Tools and Software for Data Acquisition and Analysis

  • Epi Info / Epi Map - Epi Info and Epi Map are public domain software designed to provide for easy database construction, data entry, and analysis with epidemiologic statistics, maps, and graphs.
  • DataFerret - DataFerret, a collaborative effort between the National Center for Health Statistics and the Bureau of the Census, is a unique data mining and extraction tool. It allows you to select a databasket full of variables, and recode those variables as needed, and then develop and customize tables and charts. DataFerrett helps you locate and retrieve the data you need across the Internet to your desktop or system, regardless of where the data resides.
  • SAS (Statistical Analysis Software) is loaded on PCs in the Library Computer Room
  • Free Statistical Analysis Tools - Compiled by David Lane, Rice University

Background Reading & Additional Training in Finding and Using Data