2017 Econometrics Course
Winter/Spring 2019 schedule to be announced. Archived 2017 course materials listed below.
- February 1, 2017
Econometrics Course: Introduction and Identification
Todd Wagner, Ph.D. | Slides| Video
The objective of this class is to introduce participants to the course. We start by briefly describing a randomized trial and the leverage of experimental design to understand causation. We then transition into understanding causal pathways when experimentation isn't possible. We introduce the concept of endogeneity and walk participants through the elements of an equation, as these equations often come up in other classes. Finally, we discuss the five main assumptions underlying the classic linear model, setting the stage for future classes.
This lecture will provide a conceptual framework for research design. We will review the linear regression model and define the concepts of exogeneity and endogeneity. We will then discuss three forms of endogeneity: omitted variable bias, sample selection, and simultaneous causality. The discussion will include examples and a brief overview of possible solutions.
Understanding causation with observational data is often more dependent on what we don't observe than what we do observe. Multivariate techniques can be very useful for understanding observed characteristics. Propensity scores have emerged over the past 20 years as another way to control for observables. We describe the concepts behind propensity scores and how they have been used (and misused) in practice. Finally, we work through an example using propensity scores.
- March 1, 2017
Natural Experiments and Difference-in-Differences
Christine Pal-Chee, Ph.D. | Slides| Video
Natural experiments have been increasingly utilized by researchers in recent years. In this lecture, we will define what a natural experiment is and describe different types of natural experiments. We will also provide an overview of the difference-in-differences estimator and discuss how it can be used to evaluate treatment effects in natural experiments. Finally, we discuss potential threats to validity when evaluating natural experiments.
This lecture will provide an introduction to instrumental variables (IV) regression. We will discuss necessary conditions for valid instruments, the intuition for how and why IV regression works, examples, and limitations.
This is an overview of mixed effects models. The models go by many names: multi-level, random effects, mixed, random coefficient, hierarchical, and repeated measures. We will begin by describing how mixed effects models are related to other statistical models. Real-world applications will be used as examples to demonstrate model fitting and estimation and interpretation of estimates. Finally, we will address how statisticians think about mixed effects models and how this can differ from an economist's perspective.
Standard introductions to the ordinary least squares (OLS) model pay limited attention to the right hand side variables. Several strong assumptions are made about the independent variables, including linearity and independence, that don't always hold in health applications. This lecture will address some of the common problems with right hand side variables and introduce methods to test for and to correct these problems. Issues to be addressed include non-linearity and functional form, multicollinearity, clustering and robust standard errors.
The ordinary least squares (OLS) model is based on a continuous dependent variable. This lecture will introduce some of the methods available to treat other forms of dependent variables. Topics will include dichotomous (yes/no) outcomes, count data models, and choice models.
Statistical analysis of health care cost is made difficult by two data problems. Some patients incur disproportionate costs, a statistical property called skewness. Other patients incur no cost at all; the distribution is truncated. As a result of these problems, it is rarely a good idea to analyze cost using the classic linear statistical model, ordinary least squares (OLS). Transforming cost by taking its log results in a variable that is more normally distributed, allowing use of an OLS regression. The parameters from this regression have a natural interpretation as the proportionate effect of a unit change in the independent variable on cost. Care must be used when predicting costs from a model based on the log of costs. Log models have other limitations. The most important of these is that they should not be used when there are many zero cost observations in the data.
Health care cost can be difficult to analyze. In addition to skewness and truncation, the variance in cost data may be correlated with one of the predictor (independent) variables, a problem called heteroscedasticity. As a result of these problems, ordinary least squares (OLS) regression models may generate biased regression parameters and inaccurate predictions. Generalized linear models (GLM) are an important alternative. A GLM include a link function and a variance structure. These are identified using specific tests. Another alternative is a two-part model, which can be used to analyze data with many observations in which no cost was incurred. Non-parametric tests can be used to compare the cost incurred by two or more groups. Although they have the advantage of not requiring any assumptions about the statistical properties of the cost variable, they can be too conservative, and they do not allow the analyst to control for the effect of other factors.