Veterans Affairs banner with U.S. FlagVeterans Affairs banner with U.S. Flag
HEALTH ECONOMICS RESOURCE CENTERSpacer

F. Average Cost Estimates

10. How do I find the HERC cost of outpatient visits by cohort of patients?

There are two different methods listed in this FAQ, one for Fiscal Years 1998-2002, and the second for data starting FY2003. The reason for this is that starting in FY2003 a new variable was added that provides a unique identifier for each record in the VA outpatient events file (also known as the outpatient SAS extract of the National Patient Care Database, or the SE file). In merging HERC cost estimates to the SE file, the user must be careful to use the correct method.

The following information is directed towards the problem of finding the cost of outpatient care of VA patients who are participating in a research study. To find the cost of care for these patients, you will need their unique VA medical record number, i.e., the patient's social security number. It is strongly advised that you have an additional identifier, preferably the patient's date of birth, to confirm that you have data for the correct patient. Be sure to observe the necessary information security precautions to protect the confidentiality of all research data.

There are three steps to find the HERC cost of outpatient visits for a cohort of patients: (1) define your cohort, (2) create a file of their visits from the outpatient events file, (3) combine your extract from the event file with HERC cost data.

  1. Define your cohort

    The VA Information Resource Center (VIReC) can provide you with instructions on how to convert a true social security number (the VA medical record number) into a scrambled social security number. Your cohort file might include other key variables: the patient's birth date, the date they enrolled in your study, and the date that they completed the study.

  2. Create a file of their visits from the outpatient events file for Fiscal Years 1998-2002

    The next step is to identify visits to VA providers by your cohort members. These visits are recorded in the VA outpatient events file.

    Use SAS to merge your cohort list with the outpatient events file. You will combine files based on patient scrambled social security number (SCRSSN). Since social security numbers are sometimes transcribed incorrectly, you should confirm that you have identified the correct patients by checking that the birth date that you obtained when the subject enrolled in your study is the same as the birth date recorded in the events file (the variable named DOB).

    You must also create a new variable, LINK2SE, in order to find the HERC cost estimate. LINK2SE is the record number in the outpatient events file. The following SAS code shows how to select visits from the NPCD and define LINK2SE.

    PROC SORT DATA=COHORT NODUPKEY;
    BY SCRSSN;
    RUN;
    DATA OUT1.COHEVENT;
    MERGE COHORT (IN=INCOHORT) IN.SE00 (IN=INEVENT); BY SCRSSN;
    IF INCOHORT AND INEVENT;
    IF INEVENT THEN DO;
    IF LINK2SE=. THEN LINK2SE=1;
    ELSE LINK2SE=LINK2SE+1;
    END;
    RETAIN LINK2SE;
    RUN;

    The program starts by sorting the cohort file by the scrambled social security number (SCRSSN). The events file is already sorted by this variable. Do not sort the events file. It is a very large file, and it is quite costly to sort it.

    The SAS data step merges the two files based on SCRSSN. The variable INCOHORT takes a value of true (numeric value of 1) if the record is in the cohort file. The variable INEVENT takes a value of true if the record is in the events file. The statement "IF INCOHORT AND INEVENT" will select the event file records of all members of the cohort, and none of the records of any other patient.

    The LINK2SE variable is defined only if the data step involves a record in the events file. When the first record in the NPCD visit dataset is encountered, LINK2SE doesn't have a value (LINK2SE=missing). The program assigns it a value of 1. LINK2SE is retained for the next and subsequent SAS data steps. For all subsequent times an NPCD record is encountered, the value of LINK2SE is incremented by 1. If there is patient in the cohort file who is not found in the NPCD dataset, the value of LINK2SE is simply carried forward unchanged.

    When selecting records from the events file using a cohort file, it is best not to use the SAS variable _N_ to define LINK2SE. If _N_ is used, and there is a patient in your list who is not found in the visits file, LINK2SE will be incorrect. The SAS variable _N_ is a count of the iterations of the data set. When SAS reads the record of the patient who is not in the NPCD outpatient file, a data step occurs, and _N_ is incremented. For all subsequent records in the NPCD file, the value of _N_ will be incorrect.

  3. Combine your extract from the event file with HERC cost data for Fiscal Years 1998-2002

    DATA OUT2.SECOST00 EXCLUDED;
    MERGE
    IN1.OPCSE00 (RENAME=(STA5A=HCSTA5A SCRSSN=HCSCRSSN VIZDAY=HCVIZDAY CL=HCCL) IN=INHERC)
    IN2.COHEVENT (IN=INSE);
    BY LINK2SE;
    IF INSE AND INHERC THEN OUTPUT OUT2.SECOST00;
    ELSE IF INSE=1 THEN OUTPUT EXCLUDED;
    RUN;

    This data set merges your the outpatient events file extract (IN2.COHEVENT) with the HERC cost file (IN1.OPCSE00), using the LINK2SE variable. Both datasets are already sorted by this variable, so it is not necessary to sort them. Both files contain the variables station identifier (STA5A), scrambled social security number (SCRSSN), visit day (VISDAY), and clinic stop (CL). These variables from the HERC cost file are renamed so that, in a subsequent step, we can confirm that the merge was done correctly. The file EXCLUDED contains records that appear in your cohort visits file but not in the HERC file. In certain years, there are a few records that are excluded from the HERC data set; see Chapter 5 of the HERC outpatient cost manual to see whether these exclusions are as expected.

    DATA CHECK1;
    SET OUT2.SECOST00;
    IF HCSCRSSN NE SCRSSN
    OR CL NE HCCL OR VIZDAY NE HCVIZDAY OR HCSTA5A NE STA5A;
    ****NOTHING SHOULD PRINT HERE;
    PROC PRINT DATA=CHECK1;

    This data step determines whether the HERC cost records have matched the correct records from the events file. The file CHECK1 should not have any records.

  4. Combine your extract from the event file with HERC cost data for Fiscal Years 2003 and later

    Starting in FY 2003, a new variable, ENCOUNTER_ID, was added to the SE data that provides a unique identifier for each record in the SE file.

    As a result, HERC has changed the recommended method for linking the HERC Outpatient Cost File to the Outpatient Events (SE) File. This section describes the new method, including example SAS code.

    As in the previous years linking program, the program starts by sorting the cohort file by the key variable of scrambled social security number(SCRSSN)-checking for and removing any duplicate values.

    PROC SORT DATA=COHORT NODUPKEY;
    BY SCRSSN;
    RUN;

    The SAS DATA Step merges the cohort file and Austin SE 03 file as done previously. Scrambled Social Security matches are outputed to a match file-in this case COHEVENT--by the Boolean flags of InCohort and InEvent. Observations found only in the cohort and not in the SE event file are outputed to Excluded01. With the inclusion of the unique Encounter_ID variable, the LINK2SE steps are no longer necessary.

    DATA OUTPUT1.COHEVENT EXCLUDED01;
    MERGE
    COHORT (IN=InCohort) IN.SE03 (IN=InEvent);
    By SCRSSN;
    IF InCohort AND InEvent THEN OUTPUT OUTPUT1.COHEVENT;
    ELSE IF InCohort THEN OUTPUT EXCLUDED01;
    RUN;

    Though a precautionary measure that maybe omitted, the SORT procedure may avoid an Out of Sort Order error in the following merge step.

    PROC SORT DATA=IN1.COHEVENT;
    By SCRSSN VIZDAY STA5A ENCOUNTER_ID;
    RUN:

    The DATA Step merges the outpatient events file extract (In2.COHEVENT) with the HERC cost file (IN1.OPCSE03), using the key variables of Scrambled Social Security Numbers (SCRSSN), Day of Visit (VIZDAY), Station identifier (STA5A), and unique Encounter Identification (ENCOUNTER_ID). The additional key variable of Encounter_ID eliminates the need and ability for post merge validation.

    DATA OUTPUT2.SECOST03 EXCLUDED02;
    MERGE
    IN1.COHEVENT (IN=InSE)
    IN1.OPCSE03 (IN=InHERC);
    By SCRSSN VIZDAY STA5A ENCOUNTER_ID;
    IF InSE AND InHERC THEN OUTPUT OUTPUT2.SECOST03;
    ELSE IF InSE THEN OUTPUT EXCLUDED02;
    RUN;

Paul Barnett, PhD
Reviewed/Updated Date: October 23, 2007