Misclassification of a Dependent Variable in a Discrete Response Setting

By Jerry HausmanFiona Morton
1994| Working Paper No. 1388

A dependent variable which is a discrete response causes the estimated coefficients to be inconsistent in a probit or logit model when misclassification is present. By’misclassification’ we mean that the response is reported or recorded in the wrong category; for example, a variable is recorded as a one when it should have the value zero. This mistake might easily happen in an interview setting where the respondent misunderstands the question or the interviewer simply checks the wrong box. Other data sources where the researcher suspects measurement error, such as historical data, certainly exist as well. We show that when a dependent variable is misclassified in a probit or logit setting, the resulting coefficients are biased and inconsistent. However, the researcher can correct the problem by employing the likelihood function we derive below, an can explicity estimate the extent of misclassification and the unknown slope coefficients which does not depend on an assumed error distribution and is also robust to misclassification of the dependent variable. Each of these departures from the usual qualitative response model specifications creates inconsistent estimates._x000B__x000B_We apply our methodoly to a commonly used data set, the Current Population Survey, where we consider the probability of individuals changing jobs. This type of question is well-known for its potential misclassification. Both our parametric and semiparametric estimates demonstrate conclusively that significant misclassification exists in the sample. Furthermore, the probability of misclassification is not the same across observed response classes. A much higher probability exists for misclassification of reported individual job changes than the probability of misclassification of individual who are reported not to have changed jobs.