In our quest to find what works, let's not forget that context matters


(Stacey McDonald) #1

Last week I met with my team to look over results from some of our grants, and to explore the value of using Mission Measurement’s Impact Genome Project to help us learn what works. The Project quantifies the “genes” of nonprofit programs and academic research to discover what works - it breaks down the different characteristics of each project to try to then understand which ‘genes’ are more likely to be present in successful programs, with the hopes of being able to predict in the future which programs are more likely to be successful.

As we were looking over the analysis they provided us for a small subset of our grants, I saw a lot of value in this approach, but I also saw much that worries me.

The whole approach seems to pull together results and context by identify demographic characteristics of participants and setting, but it actually separates them. It assumes that the reasons for success or failure of a program are linked to specific ‘genes’ – but what if they’re not? What if it’s all (or in large part) about the context? Who’s running the program? Do they have lived experience and can relate to the program participants? What if the implementation matters just as much as the design? I’m not saying that learning more about the design isn’t important – and I do think their approach might help us do that. I just worry that in the quest to understand what works, we might oversimplify things and forget that context matters.

This brings me to my bone to pick about their evidence hierarchy, and how it suggests that program statistics are better than qualitative research. Statistics on their own are not better! You could have pre/post-survey results, but on their own you have no idea why those results happened. It’s only with supplementary research/inquiry do you know anything about why those results are what they are. No change (between pre and post results) doesn’t necessarily mean the program design (with that combination of ‘genes’) has no value. Just like you have can a course with the same length, place, and syllabus that is great one year, and awful the next, what makes it a great course or not often relies on the teacher’s skill.

What I’d like to see instead of placing one kind of inquiry as better than another is the acknowledgement of the value of both, and that both are needed. If you want to read more about how context matters, I suggest you check out Data Feminism, specifically chapter 5: The numbers don’t speak for themselves.

(Susan Ramey) #2

Stacey, you make a good point, and context is just one of those things that make evaluation such a challenging part of programming. One of the avenues to explore is preparing programming staff to do effective evaluation recognizing the who, what, where, when and how of the program delivery and its outcomes. Programming staff are obviously best positioned to perform evaluation, but I think they need training to understand goals and outcome necessities. Context is part of that process. Thanks for the article. Susan

(Paul Bakker) #3

Stacey, I love the idea of moving away from a hierarchy in order to describe the value of different evaluation designs. Traditional evidence hierarchies are rated against “internal validity,” and they do a good job at ranking designs according to the metric. The problem is internal validity is not the only and often not the most important indicator of evaluation quality. There are other criteria such as feasibility, usefullness (including generalizability), and ethics.

Program evaluation standard ( require evaluators to consider all dimensions; not just internal validity. Most often, evaluators deem randomized control trials (which are at the top of the evidence hierarchies) as not the best design due to them not being feasible, generalizable, or ethical.

If you ask a (good) evaluator what the best evaluation design, they will tell you it depends on context and the purpose of the evaluation.

Now, if the question is what design is able to give the most precise and accurate estimate of a program’s impact (if done properly), then, yes, traditional evidence hierarchies do a good job. Although, I do agree that a single group pre-post study that lacks qualitative data can result in misleading findings, and those findings may be more inaccurate than a purely qualitative study.

Trying to precisely measure programs impact is a noble cause, but it may not always be the most feasible, useful, or ethical. Evaluators have come to understand how those standards apply to their individual practice, but I think funders are struggling to deal with the complexity when trying to demonstrate that funding was used effectively, or trying to figure out what programs are best to fund.

In short, if funders want to know if they got the “best” evaluation evidence, then they should be using an evidence matrix rather than a hierarchy. Maybe CES has a role in drafting and endorsing such a matrix.

(Stacey McDonald) #4

Thank you Paul for sharing your thoughts. As always, very insightful. I love the idea of an evidence matrix, and would love to work with CES to develop one. Is this something you can suggest, or is there someone else you could connect me with?