The Problem with Randomized Controlled Trials in EdTech Research

And why the "gold standard" assumption needs to be rethought

Was this forwarded to you by a friend? Sign up, and get your own copy of the news that matters sent to your inbox every week. Sign up for the On EdTech newsletter. Interested in additional analysis? Try with our 30-day free trial and Upgrade to the On EdTech+ newsletter.

We recently shared a post about the efficacy fetish in EdTech. One of the arguments I made was that the methods used in EdTech efficacy research are often flawed, and, as a result, the research often doesn’t tell us what we think it does.

The most common research design in EdTech efficacy studies is the randomized controlled trial (RCT), where participants are divided into two groups: one receives an intervention or treatment, while the other (the control group) does not. By comparing outcomes across the groups, researchers claim they can determine the effectiveness of an EdTech tool or practice.

But the use of RCTs in EdTech research can be deeply problematic and should not be blindly applied. In many cases, they’re poorly executed, and it’s often difficult to isolate the treatment effect. Learning is messy.

Yet the issues run deeper than implementation, they stem from the a priori choice of the method itself. By emphasizing the use of RCTs regardless of subject or context, we risk promoting weak research and slowing meaningful progress in the design and application of EdTech. In fact, this gold standard assumption around RCTs gets the process backward, as the method is chosen before defining and understand the problem.

RCTs as the “gold standard” in EdTech research

These days, you can’t toss an EdTech tool without hitting a study that uses an RCT to demonstrate its efficacy, as a recent journal article on AI in education points out.

Widely considered the gold standard for measuring the efficacy of interventions, RCT (Randomized Controlled Trial) is a type of experiment that encompasses randomly assigning subjects into control and experimental groups to compare the effects of given interventions.

This is just one example of how RCTs are promoted as the gold standard for EdTech research. We see this not only in academic literature but also in the trade press and among EdTech thought leaders. For example, Ryan Craig critiques higher education’s reluctance to base EdTech purchasing decisions on evidence of impact, which is a fair point, but he conflates that argument for evidence with RCTs.

According to edtech companies, schools don’t ask for evidence-based outcomes like correlational or quasi-experimental studies let alone (the research gold standard) randomized controlled trials (RCTs). What do schools ask for? Case studies and customer references. [snip]

It’s not surprising that only 7% of edtech companies report investing in RCTs. [snip]

Instead of investing in RCTs, what seems to be happening across the industry is short cuts – passing off case studies and surveys as research – and free-riding.

Organizations and philanthropies that support EdTech research also contribute to the idealization of RCTs. For example, earlier this year, Kirk Walters, vice president of mathematics education and improvement science at WestEd, argued that.

RCTs remain the gold standard for effective research for good reason. They reduce sources of bias that plague other designs [snip] Most important, RCTs answer the key question about any ed tech product: Does it work?

Arnold Ventures for years had a strong preference for the use of RCTs in any project they supported.

Cover of a RFP from AV titled Strenthening Evidence: support for RCTs to evaluate social programs and policies

Whenever possible, Arnold Ventures has a preference for funding randomized controlled trials (RCTs). We will also consider certain rigorous quasi-experimental designs that can credibly demonstrate a causal relationship when random assignment is not feasible

These are just a few examples of the way that RCTs are promoted by researchers, opinion leaders, organizations, and foundations in EdTech.

The problem with RCTs - it’s more than just execution

Many other people have criticized the use of RCTs in EdTech research, both because of their inherent limitations and because they are often poorly executed. These limitations include, but are not limited to:

  • Novelty effects, where a temporary boost in outcomes occurs simply due to the introduction of a new tool or method;

  • Difficulty isolating the treatment from other contextual variables;

  • Short study durations that fail to capture the full impact of a tool, many EdTech RCTs last only a few weeks;

  • High costs associated with conducting RCTs rigorously; and

  • Ethical concerns, particularly around randomly assigning students to treatment or control groups, and the challenges of obtaining IRB approval in higher education settings.

But the problems with RCTs go beyond poor implementation. In a thoughtful, if jargon-heavy, article, a group of researchers from the Netherlands and Belgium (Coppe et al.) argue that RCTs in education are often flawed from the outset.

As stated by Thomas (2012), ‘scientific research is [. . .] fluid and multifarious. Assuming otherwise – taking an unyieldingly monistic view about the procedures characterizing the enterprise of inquiry – leads us on some epistemological wild goose chases’ (p. 28). We would argue, following Moon and Blackman (2014) that ‘When researchers fail to understand and recognize the principles and assumptions that are embedded in their disciplines, it can compromise the integrity and validity of their research design’ (p. 2).

They point out that the method was borrowed wholesale from medicine and is poorly suited to most educational applications, due to fundamental differences in subject matter (a point also raised by Nick Kind back in 2017). It’s far more difficult to measure learning than something like temperature or viral load, and causality in education is much more complex.

Coppe et al. further contend that the prominence of RCTs is rooted in the broader dominance of statistical methods in education research. RCTs rely on a simplified model of causality that essentially reduces the research question to “how much?” which, by design, demands quantitative methods.

Most importantly, RCTs are problematic because they’ve become the default gold standard. Too often, researchers begin with the method and design the study around it, rather than starting with the actual problem they want to investigate and selecting the most appropriate method. As the authors argue many education researchers go into studies knowing they’ll use an RCT, because it’s the most persuasive and widely accepted approach in the field.

Evidence-based education is characterized by this type of monistic view. It somehow reverses the layers of the internal logic of science. Instead of starting with ontology and epistemology in order to reflect upon methodology and methods, it starts at the methods and its methodology – with statistics and RCTs as the starting points, transferred from the evidence-based movement in medicine. In this way, statistics and RCTs became an onto-epistemological stance without being connected with the nature of the objects investigated, in a form of ‘one size fits all’.

Once I had translated this into regular-person language, I found it a compelling argument against the a priori use of RCTs. But I also couldn’t help noting the irony: many smart people in EdTech who understand that we should start with the pedagogy, not the technology, are often the same ones most impressed by RCT-based studies.

Why the fascination with RCTs

Coppe et al. attribute the dominance and a priori choice of RCTs in education research to political and ideological shifts, particularly the rise of New Public Management and a broader societal trend toward technocratic, data-driven governance. I’m not entirely convinced by the former, but I do see clear evidence of the latter. What the article lacks, however, is an account of how this shift actually took hold.

To me, the process seems fairly straightforward: there’s a deep frustration with, and perhaps even embarrassment about the early, often underdeveloped state of EdTech research. The push for RCTs and complex statistical methods is, in part, a response to that discomfort. It’s an effort to lend credibility, to make EdTech impact studies look like “real science,” rather than the tentative gestures toward progress and partial insight that they more often are.

It’s an argument I’ve borrowed and adapted from the giant (and mic-dropper-in-chief) of sociology, Howard Becker. In Writing for Social Scientists, Becker argues that social scientists often write poorly because they’re unsure of their ideas, and so they bury that uncertainty in jargon. I see a similar dynamic at work in EdTech research. Some researchers are understandably daunted by the enormous complexity of studying the impact of technology on learning and uncertain about which methods are most appropriate. To mask that uncertainty and provide a safe label of credibility, they default to the so-called gold standard of the RCT and often overcompensate by layering on unnecessarily complex statistics.

Jacket of Writing for Social Scientists by Howard Becker

Why does this matter?

By pre-selecting and venerating RCTs as the ultimate marker of rigor, their use becomes a kind of intellectual shortcut, one that gives researchers and vendors a free pass. It allows them to:

  • Avoid thinking carefully about how best to measure impact in the specific context they’re studying;

  • Disregard best practice, like running a two-week RCT that’s too short to detect meaningful change, or relying on a sample size that’s far too small; and

  • Skip the hard work of clarifying what exactly is being measured, and whether it truly serves as a valid proxy for learning.

As long as there’s an RCT attached, people tend to assume the research is sound and stop asking deeper questions. To mix metaphors, RCTs become less of a gold standard and more of a get out of research jail free card. The result is a glut of poor-quality studies and much slower progress in EdTech than we should expect, or accept.

Parting thoughts

Understanding whether and how EdTech works is essential, no matter where you sit in the ecosystem, whether as a higher education institution, a vendor, or a funder. I agree with Coppe et al. that RCTs can have a place in education research, but only after we have put careful thought into research design and after we understand the problem and the data limitations, when the starting point is the subject matter itself, paired with a willingness to mix and triangulate methods where necessary.

Interestingly, even Arnold Ventures appears to be shifting away from its strong emphasis on RCTs and moving toward a more flexible quasi-experimental design (QED) approach. I see this as a positive step. But once again, the key is to begin with the problem and let the research methods follow, not the other way around.

Let’s stop treating RCTs as the starting point and begin our research by focusing on the research problem rather than the method we will use a priori. RCTs should be only one of the potential methods we use, and in only specific and limited contexts, not the default starting point for EdTech efficacy research.

The main On EdTech newsletter is free to share in part or in whole. All we ask is attribution.

Thanks for being a subscriber.