Frequently, analysts or evaluators gather information of workers, children, programs, participants and so on, and subsequently realize that information is missing on some important variables for several respondents in the sample. For example, in a survey we have administered to artists in communities around the country, some artists choose not to report their interests in specific services or income level. Missing data is an issue in nearly every study, and the evaluator has to decide which methods are the most appropriate for dealing with this complex issue.
In this context, it is essential to first understand the nature of the data in order to identify potential problems such as attrition, skip patterns or random data collection issues. Once the overall data set is understood, it is necessary to check the missing data patterns in order to see if certain groups or certain responses are more likely to have missing values. These will help the evaluator to identify if the missing data is: missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). If the data is MCAR, we can simply delete these observations, as the estimation process will not be biased or inconsistent, though there may be some loss of precision due to a smaller sample. However, missing data is more problematic when it occurs in a nonrandom sample. In these situations the only way to obtain an unbiased estimation of the statistics is to use a procedure that accounts for the missing data. Thus, it is important to acknowledge that the consequence of missing observations is going to be contingent on the assumptions about the mechanism behind the missing information. The following table represents a brief description of some methods for dealing with missing values, as well as the advantages and disadvantages of each approach.
Analysts and evaluators must properly treat missing data because erroneous strategies for dealing with this issue can produce estimated statistics that are biased and also inaccurate, leading to invalid conclusions. There is no specific recipe to effectively deal with this problem, however many researchers and practitioners recommend starting by avoiding the problem as much as possible by minimizing the missing values during the data collection process. It is also important to correctly inspect patterns of missing values and keep track of why a value is missing. Additionally, it is necessary to include information on the number of cases dropped from the analysis, and their reason for being dropped. Finally, it is important to determine whether the missing values are likely to cause biases in the findings in order to select the appropriate method. It is essential to acknowledge that approaches to dealing with missing values are not the end in itself, and in contrast are one of the many tools to help analysts and evaluators in reporting results and methods clearly and honestly to help the audience draw accurate conclusions.
Have you had to deal with missing values in the past? What approaches have you taken to solve this problem? Which approaches work better than others for you? Please feel free to share your thoughts on this issue!
Strong introductory readings for this topic include: Afifi, A. A., & Elashoff, R. M. (1966). Missing observations in multivariate statistics I. Review of the literature. Journal of the American Statistical Association, 61(315), 595-604. Acock, A. (2005). Working with Missing Values. Journal of Marriage and the Family, 67 (November): 1012-1028. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual review of psychology, 60, 549-576. Pigott, T. (2001). A Review of Methods for Missing Data. Educational Research and Evaluation, Vol. 7, No. 4, pp. 353-383. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological methods, 7(2), 147-177. Scheffer, J. (2002). Dealing with missing data. Research letters in the information and mathematical sciences, 3(1), 153-160.
Strong introductory readings for this topic include: Afifi, A. A., & Elashoff, R. M. (1966). Missing observations in multivariate statistics I. Review of the literature. Journal of the American Statistical Association, 61(315), 595-604. Acock, A. (2005). Working with Missing Values. Journal of Marriage and the Family, 67 (November): 1012-1028. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual review of psychology, 60, 549-576. Pigott, T. (2001). A Review of Methods for Missing Data. Educational Research and Evaluation, Vol. 7, No. 4, pp. 353-383. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological methods, 7(2), 147-177. Scheffer, J. (2002). Dealing with missing data. Research letters in the information and mathematical sciences, 3(1), 153-160.