I just finished my second day of conducting a webinar (estudy) for AEA on data cleaning. I just covered outliers....one of my favorite topics! As I was preparing my activity for the attendees I was manipulating variables in a bunch of datasets and comparing what happens when you ignore outliers (leave them as raw variables), delete outliers (delete any outliers from your variables), winsorize outliers (change the outlying value to the value that is 3sd from the mean), or you modify the outliers (change the outlying value to one unit lower or higher than 3sds from the mean....as discussed in Tabachnick & Fidell). I will say when I'm looking at all this output and seeing how even a few number of outliers can greatly change the values of my variable...it is a powerful argument to show how important it is to not ignore this step of the data cleaning process.
I've gotten into many debates with colleagues about what to do with outliers. Some are adamant about leaving them alone...keeping them as is. While others are quick to toss them out of the dataset no matter what it does to their sample size and mean value of the variable. I tend to modify my outliers so I can keep those individuals values in the dataset but with less of an impact on my mean value.
Students always ask me "what is the best/correct way to deal with outliers?" I give them my famous "it depends". There really is no agreed upon way to deal with outliers. You can find a citation to support any of the above mentioned ways to deal with outliers. THIS CAN BE FRUSTRATING! My goal is to always present to my client the most accurate, reliable data I can.
So....what does everyone else do with their outlying values? What is common in the Evaluation field? What do we teach our novice evaluators/students to do? I'd love to hear others opinions on this
No comments:
Post a Comment