HERC: Ask an Economist: February 2018
Attention A T users. To access the menus on this page please perform the following steps. 1. Please switch auto forms mode to off. 2. Hit enter to expand a main menu option (Health, Benefits, etc). 3. To enter and activate the submenu links, hit the down arrow. You will now be able to tab or arrow up or down through the submenu options to access/activate the submenu links.

Ask an Economist: February 2018

Q: How do I evaluate the uncertainty of an incremental cost-effectiveness ratio?

A: Here, we will define an incremental cost-effectiveness ratio (ICER), discuss how to evaluate statistical uncertainty, and present a method for finding the variation in the ICER using SAS.

Definition of the incremental cost-effectiveness ratio

The incremental cost-effectiveness ratio is a way of finding if an intervention yields sufficient value to justify its cost. We compare the treated group to the control group, and find the difference in cost, and differences in effectiveness. Their ratio is the incremental cost-effectiveness ratio (ICER).


Statistical uncertainty of the ICER

Although the ICER appears to be a continuous variable that can be represented by a 95% confidence interval, this is not correct. Each value of the ICER represents two points in the plot of cost vs. effectiveness. For example, an ICER of $100,000 Quality-Adjusted Life Year (QALY) results if the intervention costs $100,000 and yields 1 QALY, and if the intervention saves $100,000 at a loss of 1 QALY. To learn more about the QALY, visit the Cost Effectiveness Analysis page on the HERC website.

The statistical uncertainty for an ICER must be regarded as a point in a confidence ellipsoid plotted in two-dimensional space, with cost plotted on the Y axis and effectiveness plotted on the X axis.

We can find the variation in the ICER by randomly sampling the source dataset. We find a large number of points that can be plotted in the two-dimensional space, and evaluate the distribution of points over the region. In clinical trials, we can use bootstrap sampling to find these points. (Methods for doing this using SAS are described below). For medical decision models, probabilistic sensitivity analysis generates these points.

Evaluating points in the confidence region

Consider the following plot of the incremental effect of the intervention on cost (Y axis) and its incremental effect on effectiveness (on the X axis).


At point A, and all points the in the lower right quadrant, the intervention dominates the control because it is more effective and less costly.

At point B, and all points in the upper left quadrant, the intervention is dominated by the control because it is less effective and more costly.

For the upper right and lower left quadrants, we can determine if a point is cost-effective only if we have a willingness to pay threshold. In the U.S. thresholds of $50,000/QALY or $100,000/QALY are often used.

At points C and D, the intervention is more costly and more effective, but only point C is cost-effective. This is because the cost per unit increase in effectiveness is less than the willingness to pay threshold. Point D is not cost-effective, because it is too costly per unit gain in effectiveness.

At points E and F, the intervention is less costly and less effective. Only point E is cost-effective because the reduction in costs per unit reduction in effectiveness is sufficiently high. In other words, the resources saved by the study intervention are more than the societal accepted level (the willingness to pay) per unit decrease in effectiveness.

Bootstrap sampling of the ICER with SAS

Bootstrap sampling is a method used in clinical trials to find the variation in the ICER. PROC SURVEYSELECT is a SAS procedure that can be used to select bootstrap samples.

The syntax is:

proc surveyselect data=mycohort out=bootsample method=urs samprate=1 outhits reps=1000 noprint;

* Data= specifies the source dataset being sampled. The source data has person-level observations with cost, outcomes, and indicator of treatment group.

* Out= defines the new dataset being created.

* Method= URS specifies an unrestricted sample, i.e. sampling with replacement.

* Samprate= 1 means an equal probability sampling and the dataset generated has the same number of observations as the original datasets.

* Outhits creates one observation for each selection when unit is selected more than once, in other words, sampling with replacement.

* Rep= specifies the number of replicates.

* The different replicates are distinguished by the value of the new variable REPLICATE.

Evaluating the bootstrap samples

The first step in evaluating the bootstrap samples is to find the mean value of cost and outcomes for each treatment group for each replicate. This can be done by creating separate variables for cost and outcomes for each treatment group, and assigning these variables a value of missing for observations that are not in that treatment group. Then it becomes a simple matter of finding the mean value of cost and outcomes for each treatment group in each replicate:

proc means data=bootsample noprint;

by replicate;

output out=bootresult mean=;


Now we have a dataset that has 1,000 replicates, with mean costs and mean outcomes for each treatment group. We then find the incremental cost and the increment outcomes by finding the difference between treatment groups, for each replicate.

We want to determine the percentage of replicates that are cost-effective at a given willingness to pay threshold (WTPT).

We can create an indicator variable that takes a value of 1 if this replicate was cost-effective at that value of WTPT.


Here is the logic behind this statement:

If the intervention is effective (IEFFECT >0) then it is cost-effective if ICER < WTPT. This means that as long as the cost per unit increase in effectiveness is below WTPT, it is cost-effective. This can include replicates where the intervention is cost saving, and replicates where the intervention results in higher cost.

If the intervention is not effective (IEFFECT LE 0) then it is cost-effective only if there is the cost saving is sufficiently great. This means that ICER > WTPT (the reduction in cost is very high relative to the reduction in outcomes).

This is the same thing as saying that the control did not yield enough improvement in effectiveness to justify its extra cost, that is, intervention is cost-effective and control is not.

It is also possible to evaluate the replicates as follows:


Note that the indicator variable for COST_EFFECTIVE depends on WTPT.

Bootstrapping very large datasets

The SAS code below was developed to sample a dataset of 1 million observations 1,000 times.

Since it would be difficult to create a dataset of 1 billion observations, HERC researchers created 10 replicates at a time, found the means, saved them, and then sampled another 10 replicates, copying over the first 10 boot strap samples.

HERC researchers invoked PROC SURVSELECT inside a SAS macro, and sampled 10 replicates at a time.

The team then found the mean values for those 10 replicates, saved the result, and looped back to sample again, replacing the 10 bootstrap samples with 10 new ones.

The team repeated this process 100 times, creating 1000 replicates. Using this method, the largest dataset had only 10 replicates of N observations, not 1,000 x N observations.

The final dataset had 1,000 observations with the mean value from each bootstrap replicate.




* Bootstrap sample *
* *
* Create bootstrap samples from the dataset of predicted risks
* Method URS is unrestrict sample, ie sampling with replacement
* Samprate =1 means ssample dataate is same size as input
* Outhits creates one observation for each selection
* when unit is selected more than once
* rep= specifies the number of replicates
* This is nested in a macro do loop so that each bootstrap replicate replaces the last and we do not create a untenably large file
%do bootstrap_no=1 %to &boot_tot;
proc surveyselect data=wide_risk out=bootsample method=urs samprate=1 outhits reps=10 noprint;run;
* Characterize distribution of means from bootstrap samples *
* *
* Find means from these Bootstrap samples *
* Find difference in mean risk *
* The ouptut from this step yields a dataset with one for each bootstrap sample times the number of imputations *

proc means data=bootsample noprint;
by replicate;
output out=bootresult mean=;

data bootresult;
set bootresult;
length _Imputation_ 8.;
drop _freq_ _type_;

proc append base=bootresult_all data=bootresult;