Yesterday I attended the “Multiple Imputation for missing data: State of the art and new developments”. It definitely lived up to the title. The presenters (James Carpenter, Jonathan Bartlett, Rachael Huhes, Ofer Harel and Shaun Seaman) described in a manner I could easily follow latest developments in this field. I am very interested now in trying out the Chain Imputation, the Full Conditional Spesification (FCS) and the combinations of the Inverse Probablity Weighting and the Chain Imputation approaches. The latter makes a lot of sense to me as it provides a two stage approach to the imputations where the first stage deals with missing records (completely or mostly - my language) and the second with partially missing records.
The discussion really brought home to me the importance of understanding the mechanism of ‘missingness’. Yes we all learnt that at university but it does not harm to be reminded. It is not just mastering the technology to get it to run the imputation (sas has a node in EM and proc MI) but also really really understand what you are doing. That would be achived by talking to the people who gathered the information and investigating the reasons for missing information and assumptions that could (should) be made.
One of the key questions asked by the audience was whether there was a measure or a methodology to indicate how useful the imputation was and whether it was required in the first place. You guessed it – there is not. The key consideration is not missing data but missing information (Ofer Harel had an interesting approach to get closer to this ). For example if the missing information is missing completely at random the full records contain all the information about the correlations then there is not need to impute. Using the percent of missing data is not indicative either: for example when analysing a rare event the 0.5% missing obs might be just those that hold the key to understanding.