Although we often worry about bias in interpreting data, particularly when look at individual level data, bias can easily take place at the onset of data collection as well. Selection bias is one of the most common unintentional issues, and occurs when the individuals in your data set are not truly representative of the population you want to learn more about.
For example, suppose we want to learn about people participating in a job training program. We don’t want to pick just 5 people to survey since they might not represent the broader group, so we decide to survey all 100 of them to prevent unintentional selection bias. However, suppose only 50 respond, with the rest selecting out of our dataset. Now, a selection bias may have made its way into our dataset without us knowing what it might be: perhaps they have language barriers, or don’t have phones or emails, without the data, we simply don’t know.
When looking at data, be sure to think about who isn’t participating as well and why that might be.