I. Finding and Using Healthcare Data
14. How do I find non-VA health care data sets?
There are a large number of data sets available for research use that are potentially useful for health care research. This list includes some of the more important ones, but it is not a comprehensive list. The Inter-university Consortium for Political and Social Research (ICPSR) based at the University of Michigan is a source for many data sets. Researchers should check this source before purchasing data, as it may be available for free from the ICPSR. There is related information in other FAQs in this section (I. Finding and using healthcare data) of the HERC web site.
Medicare Patient Level Data MedicareMedicare is a source for many different types of data. The main limitation of Medicare data is that they only include Medicare patients, who are predominately those aged 65 and over. Data are available for inpatient and outpatient care, and for provider (physicians) and facilities. A complete list of the data available for purchase is on the CMS web site. Because of the complexity of the Medicare data, CMS supports a university-based service that provides free assistance to researchers.
A linkage of VA and Medicare data is available to VA researchers through the Veterans Affairs Information Resource Center (VIREC). Other Sources of Patient Level Data
Most states have hospital discharge data that they make available to researchers. The Hospital Cost and Utilization Project at the Agency for Healthcare Research and Quality (AHRQ) has assembled many of these. From these AHRQ has created the Nationwide Inpatient Sample (NIS) which has a 20% sample of inpatient discharges. AHRQ also acts as a clearing house for researchers who wish to purchase state discharge data in a common format. The state data are also available directly from those states that make the data available to researchers. Many states have data elements that are not available on the NIS. Many states are now linking their discharge data to other data, such as death files. Researchers need to contact the individual states, as most states put additional restrictions on access to linked data sets. States are expanding the scope of their collection of health data all of the time. For example, California now requires hospitals to report all outpatient surgeries and emergency department visits.
A major limitation of the state discharge data is that they do not include physician costs or any outpatient costs. There are sources of these data, but they are not population based. In addition to Medicare, information on physician charges and ambulatory care can be obtained from state Medicaid data and from private sector vendors that have compiled claims data. The private sector vendors include Ingenix and MedStat.
Provider Payment DataMedicare provides downloadable data sets with information about all of its payments. Link to a comprehensive list of the Medicare public use data downloads. Link to the information for the Medicare Hospital Outpatient Prospective Payment System (APCs).
Medicare does not pay for all health care services. Ingenix, has compiled a more comprehensive list of provider payment relative value units that are based on the Medicare provider payment methodology.
Other Sources of DataThe Center for Studying Health System Change's Community Tracking Study (CTS) is a unique resource. The CTS is compiling extensive longitudinal data about 12 metropolitan areas selected to be representative of the entire country.
The American Medical Association (AMA) and the American Hospital Association (AHA) are good sources of data about physicians and hospitals. These data are based on surveys, but the very high response rates results in data that are more like population data. The one disadvantages of these data is that they are fairly expensive.
The links that follow contain additional information on the AMA Annual Survey of Physiciansand the AHA Annual Survey of Hospitals.
The Area Resources File, has extensive county-level data that is compiled from a wide range of sources. The ARF is a longitudinal file; the current version of the file contains all previous year's data. These time series vary in length. Other government offices such as the Census, and the Bureau of Labor Statistics, are also useful sources of data.
Many types of population-based data are available from the National Center for Health Statistics. These include vital statistics data such as the Multiple Cause of Death and Mortality Detail. A complete list of the data available is on the NCHS web page.
When looking for data, another useful source is the Department of Health and Human Services Directory of Federal data bases.
Survey DataThere are many surveys that are done for various reasons. Some are population based, while others focus on specific groups. Without any annotation, this section lists the name of the survey and the web link for additional information.
- Current Population Survey
- Survey of Income and Program Participation
- Panel Study of Income Dynamics
- Medical Expenditure Panel Survey
- Medicare Current Beneficiary Survey
- Health and Retirement Study
- Asset and Health Dynamics Among the Oldest Old
- National Employer Health Insurance Survey
- Robert Wood Johnson/RAND Employer Health Insurance Survey
- National Health Interview Survey
- Behavioral Risk Factors Surveillance Survey
- Robert Wood Johnson Family Health Insurance Survey
- National Health and Nutrition Examination Survey, I, II, and III
- Youth Risk Behavior Survey
- National Ambulatory Medical Care Survey

