Technical Report 30: Risk Adjustment: Guide to the V21 and Nosos Risk Score Programs
Wagner T, Stefos T, Moran E, Cashy J, Shen ML, Gehlert E, Asch S, Almenoff P. Risk Adjustment: Guide to the V21 and Nosos Risk Score Programs. Technical Report 30. Menlo Park, CA. VA Palo Alto, Health Economics Resource Center; February 2016.
All tables are stored in an Excel file. Download the tables here.
Many URLs are not live because they are VA intranet-only. Researchers with VA intranet access can access these sites by copying and pasting the URLs into their browser.
For a list of VA acronyms, please visit the VA acronym checker on the VA intranet at https://vaww.va.gov/Acronyms/fulllist.cfm.
This document provides directions for creating the Centers for Medicare and Medicaid (CMS) Hierarchical Condition Categories (HCC) version 21 (V21) risk scores and Nosos risk scores from the VA MedSAS and CDW data. (Note: Nosos is the Greek word for 'chronic disease'.)
The science behind Nosos is documented the paper paper "Risk Adjustment Tools for Learning Health Care Systems: A Comparison of DxCG and CMS-HCC V21" (Wagner et al., 2016, Health Services Research; DOI: 10.1111/1475-6773.12454).
The purpose of the Nosos program is to create risk scores for VA patients, so that researchers may adjust for risk when making comparisons of treatments or outcomes. The HCC risk scores primarily use the patients’ diagnoses (ICD-9 codes), age and gender. The Nosos score builds on this, adding pharmacy records as well as VA-specific items such as VA priority status and VA-computed costs.
The Nosos scores are computed by first computing the HCC risk scores using the V21 program. These risk scores, along with the additional factors, are then used as predictors in a regression model to model the annual VA cost for each patient. Estimates are then rescaled so that the mean Nosos score for the population will always equal one.
In order to make it easier for researchers to customize the risk scores for their own purposes, we have attempted to present the scoring algorithm as a series of modular SAS macros, with minimal dependence between different tasks. For example, if a researcher wishes to use different diagnoses than what we have presented (such as excluding certain types of visits) it will only be necessary to modify the macro that extracts ICD-9 codes. A researcher who does not want to use pharmacy codes can skip the pharmacy extract macro and use a modified version of the regression macro that does not include pharmacy codes. See Appendix A for an example of pulling data and scoring Nosos for a complete fiscal year.
SAS datasets have been created with computed Nosos scores for all patients in the VA system for FY2006 and forward. There are two files per fiscal year with the same variable list: one file of risk scores calculated with pharmacy data (PHA) and one file of risk scores without pharmacy (NOPHA). A brief description of each variable is listed in Appendix B. The instructions for requesting access to the V21/Nosos data are listed below. For questions regarding to V21/Nosos, please contact Elizabeth Gehlert (Elizabeth.Gehlert@va.gov).
Access depends on the intent of the project. For more information, see the VHA Data Portal page on Operations and Quality Improvement projects data access (Intranet-only: http://vaww.vhadataportal.med.va.gov/DataAccess/OperationsAccess.aspx). This information was last updated on February 3, 2016.
Researchers who wish to access the V21 or Nosos risk scores should apply for access using the VA Data Request Tracker (DART) process (Intranet-only: http://vaww.vhadataportal.med.va.gov/DataAccess/DARTRequestProcess.aspx). Select at least the following:
- DART Data Sources Page > Requested Datasets > Other Data: Health Economics Resource Center (HERC) V21 and Nosos Risk Scores Data
- DART Data Sources Page > Identifiers: Scrambled SSN
- Research Request Memo: In the memo body denote that you will use V21/Nosos risk score data
Operations users who wish to access the V21 or Nosos risk scores should apply for access following the Operations instructions on the VHA Data Portal (Intranet-only: http://vaww.vhadataportal.med.va.gov/DataAccess/HealthcareOperationsRequestProcess.aspx) by submitting an ePAS “VHA NDS Access Form for Health Operations” and selecting the following:
- Request tab > Data Sources: Corporate Data Warehouse
- Corporate Data Warehouse (CDW) tab: select 'CDW SAS Datasets' and 'Medical SAS Files'
Once you receive NDS approval, email Kevin Martin (Kevin.Martin2@va.gov) for access to the V21/Nosos data.
The first program in the process flow (ICD9ExtractNoForward) will load a macro that creates two datasets: the diagnosis file and the person file.
The diagnosis file has two fields per fiscal year (FY): SCRSSN and DIAG. SCRSSN is the patient’s scrambled social security number. DIAG is the ICD-9 diagnosis codes. The diagnoses are obtained from the VA Patient Treatment File (PTF) and National Patient Care Database (NPCD). The inpatient ICD-9 codes are pulled from the main and bedsection files, while the outpatient ICD-9 codes are pulled from the VA outpatient (SE) files. We also include purchased care from the Fee Basis data. Appendix C provides the VINCI SAS libraries and corresponding Austin Information Technology Center (AITC) mainframe locations of the input data.
The program extracts diagnostic codes for all VA users in a fiscal year. We exclude any patient without valid ICD-9 diagnosis codes. The data are then converted into a long format so that each row represents a single diagnostic code per person; a person with more than one diagnosis in the year will have multiple rows of data. See Table 1 for a summary of the ICD-10 extract program.
Note: As of FY16, V21 includes ICD-10 codes from the inpatient main and bedsection files, as well as the ICD-10 codes from the VA outpatient (SE) files and Fee Basis data.
The person file lists all patients who had any diagnosis during the indicated period, fiscal year (FY), along with their gender and date of birth. We obtain date of birth and sex from the vital.mini.table, if available. If gender and date of birth are not in the vital table, we obtain date of birth and sex from the MAIN or ENCOUNTER files.
We have included two variations of this step. The macro icd9p0extr(fy,icddata,person) will pull all diagnoses for the fiscal year. The variation icd9p0extr_qtr(startdt,enddt,icddata,person) will pull diagnoses for a time period other than a complete fiscal year, such as the previous 12 months from a given date or a period of less than 12 months.
Other variations are available from HERC, such as versions that search two years forward in the Fee Basis tables for services provided during the fiscal year of interest. For example, if you are creating risk scores for FY05, you will want to search for services provided in FY05 in tables from FY06-07 in addition to the FY05 table. Versions that only pull certain types of encounters (such as ‘face to face’) can also be created for research use.
The V21 scoring routine (v2112h1p_nosos_version) is a SAS program provided by CMS. We have made no modifications to the scoring algorithm and have only adjusted parameters for directories and formats.
The program supplies parameters to a main macro (%V2112H1M) that calls other external macros specific to V21 HCCs:
- %AGESEXV2: Create age/sex, originally disabled, disabled variables.
- %V20EDIT1: Perform edits to diagnosis.
- %V21H87M1: Assign one ICD-9 code to multiple condition categories.
- %V20H87L1: Assign labels to HCCs.
- %V20H87H1: Set HCC=0 according to hierarchies.
- %SCOREVAR: Calculate a score variable.
Prior to running this program, it is necessary to set the SAS library references to the location of the external macros and datasets on the user’s machine.
The arguments of the V2112H1M macro are:
- INP: SAS PERSON dataset created in step 1.
- IND: SAS diagnoses dataset created in step 1.
- OUTDATA: Name of the file to be created with HCC scores.
- IDVAR: Name of patient identifier. We expect this to be SCRSSN. If using other data sources it could be patientSID or real SSN.
- KEEPVAR: List of variables that should be retained in the output set. The default values should be used.
- DATE_ASOF: Date that will be used to compute age.
- FMNAME, AGEFMT, SEXFMT: Formats.
For each patient, three scores are computed: New Enrollee score, Community score and Institutional score. See Table 2 for a summary of the V211hp program.
Basic V21 scoring is now complete.
Note: The V21 program is complete after the V211h1p program. The following programs are needed only if computing Nosos.
The program nosos_psych works similarly to the V2112 program in that it takes a list of patients and ICD-9 diagnosis codes and creates 47 indicators for mental health conditions. The categories are based on the Sloan et al (2006) article “Development and Validation of a Psychiatric Case-Mix System” . We have updated the code to account for new ICD-9 codes created after publication of the original paper and added a 47th category for Pervasive Developmental Disorder (see Appendix D for mapping scheme). See Table 3 for a summary of the nosos_psych program.
In 2016 we updated the Psychiatric Case-Mix System (PsyCMS) for ICD-10 diagnosis codes. PsyCMS now includes 62 mental health and substance use categories. Details on the transition from ICD-9 to ICD-10 are documented at http://www.herc.research.va.gov/include/page.asp?id=technical-report-psycms-icd10.
 Sloan KL, Montez-Rath ME, Spiro A, 3rd, et al. Development and validation of a psychiatric case-mix system. Med Care. Jun 2006;44(6):568-580.
The dssrx_cdw program creates indicators for 25 VA drug class categories from the VA Corporate Data Warehouse (CDW) Managerial Cost Accounting (MCA, formerly Decision Support System (DSS)) pharmacy table, [CDWWord].[dss].[PHA]. See Table 4 for a summary of the dssrx_cdw program. See Appendix E for the VA drug class mapping scheme. We have also created an Excel file that shows the crosswalk of National Drug Codes (NDC) to VA drug class categories.
The dssrx_cdw macro requires the parameter &fy for fiscal year to be set. If creating Nosos scores for periods other than a complete FY, this macro must be modified. See 'Computing Nosos scores when cost data are not available or for periods other than complete fiscal year' for more information. We have also created a variation, dssrx_cdw_dates, which can be used for periods other than a complete FY. This version will take two extra parameters, &startdt and &enddt, and pulls all drug indicators for patients between the two dates.
The encounter-level drug costs have been cleaned of exorbitant prescription costs. Encounter-level costs that exceed a set threshold were replaced by the mean cost of the clinic stop or treating specialty.
Demographics information are added using the risk_insure, risk_priority and risk_registry macros.
Insurance, race and marital status are obtained from the SAS SF file, using the values for the most recent visit day (VIZDAY) in the OUTP library of the VA Informatics and Computing Infrastructure (VINCI), in the risk_insure program. See Table 5 for a summary.
Missing values of insurance, race or marital status are replaced with their most common values (married, insured, white). We noticed that this occurs in Fee Basis only patients, and this group had higher costs than the rest of the population. If any of these three variables are missing, the variable missing_demog is set to 1 (1=yes); otherwise it is set to 0 (0=no).
VA priority (1-9) is obtained from the ADUSH Enrollment file, in which VINCI has SAS libref ENROLL. We are using the final set for each completed year, which has the name ENONEPER_SEP20%fy for 2014 and prior. See Table 6 for a summary of the risk_priority program.
The risk_registry program pulls an indicator of whether the patient was enrolled in one of 16 registires (see Table 7 for a summary of the risk_registry program and Table 8 for a list of VA registries). These tables were obtained from the Allocation Resource Center (ARC) and placed on the SAS grid server for this project. These data are obtained from ARC each year.
The program risk_dsscost pulls the cost data for patients (dsscost_&fy) from datasets stored in the VINCI project folder. These datastets were created by programmers at HERC and exist on the local VA Palo Alto Health Care System server. These files need to be obtained each year from HERC.
The three variables totdss, phar_cst_dss and fee_cost_total are necessary for the Nosos regression program.
This macro can be set to include concurrent cost (dsscost_&fy where the ICD-9 codes are from the same &fy) or prospective cost (dsscost_&fyp1 where &fyp1 is the following fiscal year).
The model may also be changed to run with either incurred or paid Fee Basis costs. Currently, the copies of the DSS cost data in the VINCI folder are based on paid cost (dsscost_fenpaid_fy&fy), but could be replaced with incurred cost (dsscost_fenincurred_fy&fy). See Table 9 for a summary of the risk_dsscost program.
The nosos_combine program takes the list of patients and HCC scores and merges in the demographics, priority, mental health indicators and drug indicators. In addition, the total length of time in nursing homes is used to determine which V21 score (community or institutional) is appropriate for each patient. That value is assigned to the variable score_cms. The variable score_cms indicates which risk score should be used for an individual. This indicator is based on the number of days spent in long-term care. For patients with 90 or more days in long-term care, the institutional risk score is used; for patients with fewer than 90 days in long-term care, the community risk score is used.
The fields for determining length of time in nursing home are los_3 and feelos_ip_cltc from DSS Cost (step 6). If los_3 + fee-los_ip_cltc > 90, the insitutiaonl scores is used.
The nosos_regression program will load the macro %nososreg that computes the Nosos score. Some basic cleaning of data (removing negative costs) is performed and the variable total_cost is created by summing total DSS, DSS pharmacy and Fee Basis costs.
The dependent variable in the regression is square root of total cost. Predictors include score_cms, drug indicators, mental health indicators, priority, white race, no insurance and registry indicators. A simple ordinary least squares (OLS) regression is performed and a predicted square root of cost (xb_sqrtls) is obtained for each patient. The mean standard error (MSE) of the regression model is saved into the macro variable &mse.
The predicted total cost for each patient (with an additive smearing correction) is then obtained by squaring the predicted square root of cost (xb_sqrtls) and adding the mean standard error. Finally, the mean value of the predicted total cost for the FY is obtained. Each predicted total cost is divided by the overall mean to obtain the Nosos risk score.
The program will also write the coefficients for each regression equation into a dataset, so that the regression model may be reused on another dataset if desired (see 'Computing Nosos scores when cost data are not available or for periods other than complete fiscal year'). See Table 11 for a summary of the nosos_regression program.
A variation on the Nosos regression without pharmacy data has been created, nosos_regression_nopharm.sas. This takes the same inputs as the Nosos regression (nosos_regression) created by nosos_combine, but it does not include pharmacy in the dependent variable and does not include drug class indicators in the independent variables of the regression model or in the final output.
Computing Nosos scores when cost data are not available or for periods other than complete fiscal year
The previous directions are for computing the Nosos score for complete fiscal years, when all ICD-9 codes and cost data are available for the entire year. The following modifications are necessary for periods other than a complete fiscal year or when cost data are not available:
- Modify the program Icd9ExtractNoForward so that the ICD-9 codes are pulled for the appropriate time period.
- V21 scoring is unchanged; only the names of the input datsets must be revised.
- Mental health scoring is unchanged.
- Use the dssrxdt macro to pull pharmacy records for the relevant time period.
- Insurance, priority and registry information can be pulled from the most recent dataset available.
- The field cms_score was determined in step 7 using length of stay in nursing home settings from step 6, which requires the DSSCOST dataset to determine whether to use the institutional or community score. If this is not available, we propose using the most recent available DSSCOST data to obtain the fields los_3 and feelos_ip_cltc for step 7. If a patient is not in the most recent dataset, assign the patient to the community group for determining cms_score.
- As cost data will not be available, no new regression coefficients can be computed. We propose storing the regression coefficients and MSE from the more recent complete FY data and scoring the available data with those coefficients.
The program scoring_risk.sas calls one macro, %scorerisk(inputset,regcoeffs,outputset), with the following parameters:
- Inputset is an existing SAS dataset containing all of the necessary fields (except cost) to complete a Nosos score.
- Regcoeff is an existing SAS dataset that was produced from a previous executing (parameter %params) of the nosos_regression (or nososreg_nopha), containing the regression parameters that were created from the most recent available regression model.
- Outputset is the SAS dataset that will be created, containing the results of the scoring macro.
Since this scoring program predicts the square root of cost, the estimate is squared; the MSE from the original regression (inputset) is then added; and costs are divided by the group mean in order to get a standardized Nosos score with a mean of 1.
You can compute either concurrent or prospective scores.
Concurrent Nosos scores
Compute the concurrent Nosos scores using diagnoses (0 years forward) and preliminary DSS costs for FY13. Everything else remains as in previous years.
Prospective Nosos scores
For prospective Nosos scores, use the full-year FY14 costs, which are not yet available at the time of this publication. Use all of the available data for FY13 (everything except DSS costs) and apply the regression coefficients from the most recent available complete prospective Nosos.
- Run the %nosos_regression macro for FY12. The %parms parameter gives the name of the dataset where the regression coefficients will be saved. In the program this is currently set to nc_&fy._pros_params. Change this location to save to a permanent library.
- The %nosos_combine macro puts together the dataset that will bew used for the regression model. Use the same output that is created by %nosos_combine for concurrent FY13 (nc_13_conc). It will have extra fields (like DSS cost) that are not needed.
- The %nosos_score macro will take the data from nc_13_conc and score it using the regression coefficients from nc_12_pros_params. Then MSE from nc_12_pros_params will also be used in the smearing
Full FY14 diagnoses and cost were not yet available when Fy14 Nosos scores were computed. Computing the Nosos scores for FY14 used the last quarter of FY13 (FY13 Q4) and the first 3 quarters of FY14 (FY14 Q1, Q2 and Q3).
- The Icd9ExtractNoForward macro will be modified to pull data from the time period.
- The mental health macro will be unchanged, as it just uses the results from Icd9ExtractNoForward.
- Insurance, demographics and priority results can be pulled for FY14. You will need to check the Priority database to get the most recent information. These are updated monthly; however, we have been using the September final datasets for previous years.
- We need to see if there are patients who were treated in FY13 quarter 4 (FY13 Q4) but not treated in FY14. If so, should they be in our Nosos datasets for FY14, we may need to get their priority, insurance and demographics from FY13.
- There is a slight modification to dssrx so it can pull prescription data from the appropriate time period.
- Use the dsscost from Fy13. WE will not use the cost variable here but we will get number of days in a nursing home to decide which HCC score becomes the cms_score.
- Combine the above datasets using the %nosos_combine macro.
- Use the %nosos_score_macro with the most recent parameters: nc_13_conc_parms for concurrent; nc_12_pros_params for prospective
We expect that the Nosos scoring algorithm will be modified and adapted for future needs. Below we list some areas that we have already identified for further research.
Currently our ‘registry’ indicator only indicates that a patient was in one of the 16 registries, giving equal weight to hepatitis C and ESRD, while ignoring the possibility that a patient may be in more than one registry. It may be desirable to use all 16 registry indicators or a subset of them, instead of one catch-all registry variable.
Race is currently set as ‘White’ or ‘Not white.’ As collection of race/ethnicity data at the VA improves, it may be desirable to use a different reference category or include indicators for several races or ethnicities in the model.
Move to ICD-10
The transition to ICD-10 required all programs to be updated. In FY16 all programs were updated for ICD-10.
CDW replaces MedSAS
Diagnoses are pulled from VINCI copies of MedSAS data. We expect that MedSAS tables will not be created in future fiscal years and data will need to be pulled from CDW. We have not yet found a way to replicate our MedSAS comorbidity results from CDW diagnosis data.
Rank order concerns
The Nosos score uses a square root regression. It is possible that the square root regression will yield negative predictions that, when squared, become positive and change the rank order of the data. To date, we have examined this issue and have had no risk order issues. This issue should be monitored in future years.
%let fym1 =%sysfunc(putn(&fy-1,z2));
%let fyp1 =%sysfunc(putn(&fy+1,z2));
/* Example of pulling the data and scoring NOSOS for complete FY08 */
/* Pull ICD9 DATA */ ;
/* V21 macro - */;
KEEPVAR =scrssn &INPUTVARS &SCOREVARS &DEMVARS
; &HCCV21_list87 &CCV21_list87,
/* V21 is complete. Everything below is for NOSOS only **** /;
/* Get mental health CC s */;
/* Get insurance data */;
/* Get priority data */;
/* Combine insurance and priority data */;
%demogs(insure=insure_08 ,priority=priority_08 ,outset=hercdemogfy08);
/*Get registry data
This libname is where Kevin Martin stored VERA data. This may change as the VINCI team upgrades or modifies the SAS Grid on VINCI */
libname vera08 "/smb/vhacdwsasrds01/VERA/2008/";
/* Get concurrent and prospective DSS cost
libname dsscost is pointing to the project folder where copies of dsscost data are stored */
libname dsscost '/smb/vhacdwfpcfs02/Projects/OPES_CMSHCCv21/ ' ;
/* pull FY 09 data for prospective model */
/*pull FY08 data for concurrent model */
%dssc(inset=dsscost.dsscost_08 ,outset=dsscost_08 );
/* Get pharmacy CC s */
/* combine all sets for prospective Nosos...note that cost data for FY09 is used */
/* combine all sets for concurrent Nosos...note that cost data for FY08 is used */
/*regression for concurrent NOSOS */
%nososreg(inset=nc_08_conc,regin=temp, regout=nosos_08_conc ,regparms=nc_08_conc_parms,nosscore=nosos_c );
/*regression for prospective NOSOS */
%nososreg(inset=nc_08_prosp,regin=temp, regout=nosos_08_prosp ,regparms=nc_08_prosp_parms,nosscore=nosos_p );
/* join the two datasets */
proc sort data=nosos_08_conc;
Proc sort data=nosos_08_prosp;
merge nosos_08_conc nosos_08_prosp (keep=scrssn nosos_p );
Nosos was jointly developed by the Health Economics Resource Center (HERC) and the Office of Productivity, Staffing and Efficiency (OPES). Funding for HERC’s efforts was provided by Operational Analytics and Reporting (OAR). This would not have been possible without major contributions from the following individuals:
- OAR: Peter Almenoff
- OPES: Eileen Moran, Theodore Stefos, Mei-Ling Shen and Jim Campbell
- Ci2i: Steven Asch
- HERC: Todd Wagner, Anjali Updahyay, John Cashy, Elizabeth Gehlert (Cowgill) and Winifred Scott
Last Updated: March 21, 2022