businessnews

Don't hate the replicator, hate the game

Planet Money·Feb 27, 2026·36:04

About this Episode

The world of science has been stuck in an existential crisis over whether we actually know the things we thought we knew. Re-running an old study today doesn't always yield the same result. Same with re-enacting old experiments. Collectively, this is known as the “replication crisis.”

Economist Abel Brodeur has come up with one way to help fix this crisis: he’s invented an internationally crowdsourced surveillance system, designed to keep social scientists honest. He calls it the “Replication Games.”

Further Listening:

This episode was hosted by Mary Childs and Alexi Horowitz-Ghazi. It was produced by James Sneed and Emma Peaslee, with help from Willa Rubin. It was edited by Jess Jiang, fact-checked by Sam Yellowhorse Kesler, and engineered by Ko Takasugi-Czernowin. Alex Goldmark is Planet Money’s executive producer.

Find more Planet Money: Facebook / Instagram / TikTok / Our weekly Newsletter.

Listen free at these links: Apple Podcasts, Spotify, the NPR app or anywhere you get podcasts.

Help support Planet Money and hear our bonus episodes by subscribing to Planet Money+ in Apple Podcasts or at plus.npr.org/planetmoney.

To manage podcast ad preferences, review the links below:

See pcm.adswizz.com for information about our collection and use of personal data for sponsorship and to manage your podcast sponsorship preferences.

NPR Privacy Policy

Hosts & Guests

NPR

Host

Transcript

This message comes from AppleCard.

Earn 2% daily cash back on everything you buy when you use your AppleCard with Apple Pay.

Subject to credit approval.

AppleCard issued by Goldman Sachs Bank USA Salt Lake City Branch.

Terms and more at apple.co slash benefits.

This is Planet Money from NPR.

Alexi Horowitz-Gazzy.

Mary Childs.

Yes, you and I took a little trip up to scenic Montreal,

one of the jewels of French Canada, for a little Planet Money mission.

Yes, we did, and even though it's a little bit sad that that mission did not entail

joining the Maple Harvest, or infiltrating a Poutine Cartel next time.

Dare I say next time, it did have much bigger implications for anybody and everybody whose life

is impacted by science, which I think is basically all of us.

I think that's right, yeah.

We were there to meet a guy named Abel Broder.

Abel's this very energetic economics professor in his late 30s at the University of Ottawa,

and we found him bounding around the halls of this modernist school building in downtown Montreal.

He was getting ready to host an event he's become sort of famous for,

something called the replication games.

It's getting exciting now.

How are you feeling?

I'm feeling good.

It's the beginning of the event, so this is a moment I'm full of energy and

full of enthusiasm.

In seven hours from now, it's going to be a different conversation.

Abel is going to be tired in seven hours because at a replication game,

he is running around between 16 teams of three to five people in a kind of hackathon.

People will work all day to replicate recently published social science papers

to reproduce the results and see if the findings hold up.

Because ever since technology has made it easy to crunch data,

we've been able to go back and check old research,

and turns out it wasn't great.

Re-running an old study today, a lot of the time,

does not yield the same result.

The research no longer proves its conclusion.

And the same thing often happens when we reconduct whole experiments.

Altogether, these problems have become known as the replication crisis.

A lot of people across academia have been trying to fix this,

so we can trust research, so we can actually know what we know.

And this event, the replication games,

it's part of Abel's attempt to help solve this crisis.

The idea is to change norms through monitoring,

and just giving a small percentage, a small chance that we will monitor

can massively change the area of everyone.

You know, change the way to be able, change the way to code, change the way to do research.

So that's the goal.

After a few minutes, we head into a big lecture hall where Abel takes center stage.

All right folks, so again, you guys started.

Welcome to the replication games.

Thanks for being here, Montreal with us.

Let's get started.

To do, we have 16 papers that are being reproduced.

A couple of small things.

Around the room, dozens of social scientists are gazing up at Abel,

looking a little bit nervous.

Most of them have come from across Canada,

and most of them are first timers who now have to undergo this kind of awkward initiation, right?

I'm going to put them music because I know you guys need like, you know, a bit of motivation.

But you need to do the body movement.

Everybody has to do it.

All right, so it's a sound good.

So we do it.

I need you to do.

It's pretty easy.

Abel starts didactically clapping like an elder millennial camp counselor

and his audience joins in.

Guys, thank you so much for being here.

I hope you enjoyed.

It should be fun and thanks everyone.

Hello and welcome to Planet Money.

I'm Alexi Horowitz, Gazi.

And I'm Mary Childs.

Over the past couple decades,

the world of science has been stuck in an existential crisis.

Over whether we know the things we think we know,

it started in psychology, spread to medicine and economics.

Now, people across disciplines are trying to figure out how to solve it.

Today on the show, the story of one economist,

how he set out to learn what exactly has broken in the way social scientists create new knowledge,

and how he came up with his own daring and kind of wacky way to help fix it.

By building an internationally crowdsourced surveillance system

to keep social scientists honest.

This message comes from Applecard.

You could be earning 2% daily cash back on that purchase.

That's because Applecard users earn 2% daily cash back on every purchase,

including everyday items you buy online or in store,

when using their Applecard with Apple Pay.

Not an Applecard customer?

You can apply in the wallet app on iPhone.

Subjects are credit approval.

Applecard issued by Goldman Sachs Bank USA Salt Lake City Branch.

Terms and more at apple.co slash benefits.

This message comes from Wix.

Nothing beats seeing your ideas turn into a cold hard cash.

Well, if you use Wix Harmony, you better get used to it.

Wix Harmony makes it unbelievably easy to create a fancy new website that's built to sell.

Get the perfect blend of AI and drag-and-drop tools that puts you in control of every detail,

plus an AI agent to help you every step of the way.

Try it for free at Wix.com slash Harmony.

Okay, so the replication crisis has been a pretty big deal for almost 20 years at this point.

We've covered it on Planet Money before.

The story of how economist Abel Broder first encountered the problem

and why he set out to help fix it begins back in 2011.

Abel was getting his masters in economics and he was writing a paper on whether

smoking bands in restaurants and workplaces actually made people smoke less.

He collected this huge data set.

I had like amazing data from the CDC, which is public.

I had smoking prevalence at the county level.

Abel says that all of the established research at the time indicated that smoking bands

were hugely effective, that they'd gotten lots of people to stop smoking.

But when Abel crunched his numbers?

I was finding absolutely no effect.

None.

There was like nobody stopped smoking.

I've played with the data for six months and I find nothing.

And Abel was trying to make a name for himself in academia, which means getting his research

published in an academic journal.

And it's harder to get published if you find no effect,

especially given that the existing literature did show an effect.

So what Abel needed was something statistically significant.

For the statistically uninitiated,

significant means the result would be produced by chance less than 5% of the time.

So the probability that the result is just random is 5% or less.

That is the cutoff for whether your findings count or not.

There's this 95% 5% cutoff that really matters.

We're obsessed with these thresholds.

So Abel kept tinkering with his data set,

changing his computer code to contort the data one way and then another.

Until eventually one day, he found a way to analyze one subset of his data

that gave him what he'd been looking for.

A result demonstrating that smoking bands had decreased smoking.

And a result that was significant.

He was like, there you go.

I was so happy.

I was in the library.

I was just, yeah, I was like, significant.

I was so happy.

Finding a significant result meant that if his paper was published,

he would get to put a little asterisk or star next to his results.

And the more statistically significant the result,

the more stars you got to claim.

But Abel's happiness did not last long.

Because the more he thought about how he'd gotten that significant result,

the more it started to seem like it was working against the whole goal of social science,

you know, to actually discover true new knowledge about human behavior.

For example, policymakers need to know whether smoking bands work

to make sound policy decisions.

But here he was torturing the data to match the preconceived hypothesis.

He thought, this is so stupid.

What am I doing?

I'm writing a piece saying that smoking bands

are decreasing smoking for elements.

Because I managed to find one that work.

I was like, that's, this is dumb.

I'm doing something wrong.

Abel ultimately decided not to use his tortured results.

He wrote up his paper showing that he'd found no effect,

even if it meant his paper was less exciting.

And at first, he thought what he'd done to his data

might have just been a one-off mistake on his part.

But then you start talking to other students,

and people were like, oh yeah, that's how you publish.

Abel started to see that this was a problem of incentives.

In order to advance their careers,

academics have to publish papers in peer-reviewed journals.

And the journals want to publish work that's statistically significant and novel.

These papers can win big prizes and define new research agendas for decades.

But because of all that,

people were doing what he had done,

trimming and squeezing and coaxing the data

towards significant results.

And that can easily cross over into a kind of data manipulation

called p-hacking, p-as-in probability.

And Abel says it can happen almost subconsciously.

Because the project took like three, four years of back and forth

between co-authors, discussion.

Then six months later, you go back, you extrude again,

these other people where you do something different.

And in over time, all these decisions, actually,

when you look at it from the outside,

it's like, this is crazy what you've done.

To figure out how widespread this problem might be,

Abel decided to research the research.

He and a couple of his colleagues scraped the significance data

from a bunch of the top academic journals,

the distribution of stars that publish researchers had racked up.

And when they looked at the distribution,

they found a noticeable hump just above that 5% significance threshold.

Now, some of this could be because some people

whose research only hit 6% didn't bother submitting.

But it could also be because some researchers were tweaking

their data analysis to just barely get results

that would be more likely to get published.

But when Abel and his colleague started submitting

the research for publication,

they got a resounding series of nose.

Academic publishing seemed hesitant

to open up an empirical reckoning.

After a few years, they did manage to publish their paper.

In 2016, they called it Star Wars,

the empirics strike back.

Do you get it?

Oh, you definitely get it.

Thank you, Alexi.

So, Abel puts aside this whole idea of an empirical reckoning

and he moves on to other economic projects.

He gets tenure.

And eventually, he learns that his little paper

has become kind of a sleeper hit.

It took a long time before I realized actually the paper

was like, well, no.

Before people started talking to me at conferences,

I was like, are you just stars, guys?

That's a moment like, I needed someone senior

to tell me like, no, this is really important what you're doing.

There had been efforts to solve parts of the replication crisis.

Some of the top journals had started asking their contributors

to release replication packages with their papers.

That's basically the data and code they'd used to find their results.

And researchers were also starting to pre-register their hypotheses

before actually doing the research.

So that if the data didn't support it,

they couldn't fuss around and pretend

like they'd been looking for something else all along.

For his part, I wondered if there was anything he could do.

Like, not just study the problem, but actually help fix it.

How do I change the incentives?

How do I potentially have an impact on the norms?

How people do research?

The second I think about the norms,

I think about, oh, it needs to be large scale.

Nobody's going to change their behavior

if it's a small scale thing.

So it needs to be big.

Journals do have peer review systems

where they try to poke holes in research.

But they didn't always totally get under the hood

to scrutinize all the code and data.

So researchers weren't necessarily worried

that their stuff would get checked.

A nice analogy, I think, is

imagine you're going to date.

You might shave.

You might take care of your body.

You might take care of yourself.

A bit of the autoran, you know,

perfume maybe if it's your thing.

If you're going to make an effort to look prettier than you are usually.

The other person fully understand that

this is a nice version of you.

Or fully aware of that.

But I don't know about how much.

And perhaps it's not.

Or maybe you made a massive effort.

And usually you're a disaster.

You never clean nothing.

So when you go to the apartment,

it's like, oh my goodness, this is your apartment.

So research is a bit like this.

The published research is the cleaned up version.

So when I see a published paper,

I know it's been, you know,

it's beautiful.

It looks nice.

But there's an information as symmetry.

I don't know how dirty it is actually.

A bell thought one thing that might help this problem

was to make researchers care

as much about the cleanliness

of their data analysis

as the significance of their results.

And to do that,

you'd have to go full on room raiders

on people's published papers

to shine a fluorescent spotlight

on the back rooms of their research.

If you could take all of the data

that somebody had gathered for a given paper

and meticulously retrace their coding steps,

you could see if it was possible

to replicate their findings.

You could make sure there weren't any errors,

conscious or unconscious,

in what they'd done.

The first, you'd have to get the code.

People weren't in the habit

then of publishing all their data and code.

And when he emailed researchers asking,

nobody responded.

So he decided to create an official

seeming institution.

It needs to be a big institution with a website

with tons of famous people on it.

And when you send the email,

people would be like,

what the hell is this thing?

I need to respond.

It's legit.

So in 2022, he creates a website

for a thing he starts calling

the Institute for Replication.

A friend of mine is a wife

did the logo for free,

like a design,

like, you know what I mean?

Like just bare bones.

He recruits some serious,

famous economists for the board

to put on his legit looking website.

And pretty soon,

he does start to get responses

to some of his emails.

He's able to get some data sets

and coding packages.

And he convinces some colleagues

and junior researchers

to start doing some replications

one by one in exchange

for a co-author credit on one big paper.

So a bell can get the data and the code.

But there's still a second problem,

which was the question of scale.

Replicating one paper at a time

was not going to do much

to change the system.

What he needed was to create the sense

within the academic community

that anybody's work could be checked at any time.

It's like an IRS for the ivory tower.

So now I have to, okay,

we need to mass-reproduce journals.

So then I was like, okay,

I need to get maybe a few hundred

replications or reproductions per year.

So now I'm thinking, how do you do that?

The answer, a bell says,

came to him kind of by accident.

Around the time he got his

Potemkin website up and running,

he got an unrelated invitation

to a conference in Oslo

to a couple of seminars.

He was planning his trip

about a month ahead of time

and he noticed that he had seminars

on a Wednesday and on a Friday.

And I thought, like, what the hell do I,

am I going to do on Thursday?

Like, I've never been to Oslo.

I'm sure it's pretty and nice.

But a full day, like, I'm going to walk around

and then I'm going to have like six,

eight hours just to relax.

So I just emailed the person who invited me

and I said, could we just like do a small workshop?

It would just be like 10, maybe 15 people

about posted about it on social media.

You can come to Oslo.

It should be fun.

If you come, you're going to get

co-authorship to a meta paper.

We're going to reproduce papers.

Let's have fun.

And then, I don't know, like, 70, 80 people

ended up registering really fast.

I closed registration because I have no money.

We don't want to have food.

I didn't tell the guy who would be 80.

I said it would be 10.

So, a bell is sitting there

a couple of months before the conference

with this sudden unexpected surge of interest

and no plan.

I have 80 people.

Some coming from Arden, others coming from Sweden,

others coming from France.

Like, what do I do with these people?

He starts collecting papers

that people could replicate.

And he puts everyone into teams

by their field.

Health economics, development economics.

The first time I had no idea

was what was going on of super stress.

He had no idea what was going to happen,

what they would find.

A bell heads to Oslo

and convenes the first ever replication game

in October of 2022.

And when he checks in on one of the first

teams of replicators working on the first paper,

and go talk to them,

and they're like,

I believe there's a problem.

Like, there's tons of duplicates.

I'm like, what?

He's like, yeah, one of the data sets,

there's tons of people with the same age.

And then I come back later on,

and it's like, okay, 75% of one data sets,

everybody's 60 years old,

all woman, all living in the same village,

all doing the same thing.

It's the same duplicates.

And it's a paper about the inequality.

If everybody's the same,

there's no inequality.

And that was driving some of the mechanism.

The underlying data

upon which this entire paper rested

had been merged improperly.

Like a big copy and paste error.

To about this was disconcerting.

And I was like, oh boy, that's the first paper.

That's the first game.

What did I create?

It's going to be like this all the time,

people finding crazy mistakes.

And did I just open a can of worms

that actually most papers are just like

terrible full of crazy cool layers?

A bell was a little afraid.

He might be about to discover

that all papers were full of worms,

and that science wasn't real.

But luckily, by the end of the day,

like many teams had like good day,

everything was clean and so on.

And it was like, it's like not terrible.

He could relax.

It turns out most of the papers were not terrible.

And even better with that first event in Oslo,

a bell had found a way to crowdsource

this massive academic auditing project,

essentially for free.

If he could host enough replication games every year,

he just might be able to scare

the social sciences into acting right.

But what actually happens on the ground during these things?

After the break, we enter the 51st replication game.

Support for NPR and the following message come from ethos.

Ethos makes getting life insurance fast and easy,

100% online.

You can get a quote in seconds,

apply in minutes, and get same day coverage.

There's no medical exam,

just a few simple health questions.

You can get up to $3 million in coverage.

Some policies are as low as $30 a month.

Protect your family's future in minutes

with life insurance through ethos.

Get your free quote at ethos.com slash money.

Application times and rates may vary.

This message comes from Serval AI.

Save your IT team time on repetitive ticket requests.

The more your business grows,

the more requests pile up,

password resets, access requests, onboarding,

all pulling them away from meaningful work.

With Serval, you can cut 80% of your help desk tickets.

Serval powers some of the fastest growing companies in the world.

Get your team out of the help desk

and back to the work they enjoy.

Book your free pilot at serval.com slash money.

So we are at a replication game in real life, in Montreal.

A Balberter says that the game part is a little bit of a branding exercise.

There are no winners or prizes.

It's more like an all-day hackathon.

The teams are mostly economists with a few groups of psychologists,

and they've already chosen the papers they'll focus on.

Using just what they have in the replication package,

they will have seven hours to check the code,

examine the decisions their papers authors made,

and see if the results reproduce.

And then they'll report on whatever they find,

so it'll be out there on the record,

whether that's a nothing burger or a bombshell.

After everyone claps their rendition of,

we will replicate you.

The researchers start streaming out of the lecture hall,

and we run after them.

Jolene, did I talk to you for a sec?

I'm a laxie.

Just set the scene for me, like...

So we just finished clapping a cheesy opening song,

and we're about to split up into rooms.

The groups are scattering into classrooms across the building

to start digging into their papers.

Economics, PhD student, Jolene Hunt, and her team

are looking at a paper about education.

They're all education economists.

And so, Jolene has sort of a pedagogical view of the day.

In PhDs, you often don't get a chance to actually work together.

Yeah.

We're usually just kind of on your own and your silo,

and then like you talk to each other when you're having problems,

but it'll be nice to actually work together,

and see if my friends are actually any good at the jobs.

Rolling up their sleeves, getting down to the actual coding.

Because they're only going to have seven hours,

each group has a little list of the things they've decided

they're going to try to get through today.

There's one group led by a guy named Tibo Dupre,

who is sitting alert and ready to unpack a paper

about pensions in different countries.

Essentially, the paper focuses on 10-something countries,

but then the data set seems to have a few more countries in there.

So why some countries were included, others were not.

What if you drop a few countries out of the data sets?

Maybe there's something to be explored there.

And we wanted to understand the stakes for the day.

Why people would attend this event to do a full day

of like manual economic labor for no dollars?

So we asked them.

What are you doing here today?

Well, we're trying to see if we can replicate the results from a paper

that took a look into the effects of negotiation.

I've started with a group in the lecture hall

huddled around their laptops.

Trail lasuad is a researcher at the University of Saskatchewan

and she's in a group of economists focused on agriculture

with shichiyahu from the University of Ottawa.

You want to find that the paper checks out.

Yes, you can think like that.

In terms of your personal incentives,

would it be cooler to find like, oh no, this paper is messed up?

Friel starts laughing, seemingly at the premise of the question.

You're laughing so hard. Why?

What's mean?

I don't know, like I want to answer it.

Okay, it's me bad for the ego one.

Those are the authors of the paper.

Are there some people now?

No, you just have sympathy for them.

Yeah, because we've been all been in their shoes.

Okay, fair.

But we go up to another group and they're kind of like,

duh.

Yeah, we are trying to find something.

That's Felix Fosu, a postdoc at Queen's University.

His group is digging into a paper about cartels in Mexico.

I tell him what the other researcher said that maybe it isn't very nice

to want to find something terribly wrong in someone else's research.

But it seems like to Felix, I have now misunderstood things in the opposite direction.

No, we definitely want to find something.

Yeah, why?

I think replication is something that we have to take very important

in economics.

We need to make sure that our results are indeed claiming what they claim to be.

We need to know what works and what does not work.

Now, regardless of their specific goals,

the actual work of replication is divided into two main phases.

Phase one is the same for every team, pure and simple replication.

They will all check the paper's code, the program instructions that take some raw data

and put it into a bunch of tables that comprise the foundations for the paper's conclusions.

So now, each team takes the original code,

copy and paste it, and basically hits enter to see if it runs.

And one type of mistake that they might find is if the code is really broken.

They might find that when they push the button, the code just doesn't run.

The computer just says error.

Or another kind of mistake they might find.

Maybe the code runs great, but it spits out a different answer than what the authors wrote.

Not so great.

Or maybe the raw data is messed up in some way,

like cells merged or transposed or erased or externally filled down the whole column.

So we ask the agriculture team to show us exactly what they are doing.

So I can't code.

I don't know what I'm looking at.

What am I looking at?

Well, actually, it's kind of nothing here, but I just started.

This is Chishia again.

The paper her team picked by Diego and Juan Pablo

is about the price of eggs at big firms versus small firms.

How much pricing control they have?

I look at her laptop over her shoulder.

So what you can see here is already the variables they have.

We have the firms, we have the price, we have the day, months, and year.

Now, Chishia pulls out her iPad to scroll through the publish paper.

So we're going to firstly check whether we can perfectly reproduce all the numbers

and using the original data and codes.

If I can run parts of this, maybe you can see it.

Okay, she's pushing a little blue arrow, a little play button.

So basically, if I run this code, you will see the results.

Oh, a little box appeared in a different window.

Yes, so if you check the numbers,

minus 18.11432, and I'm looking at the published version, it says

minus 18.114 star star star.

So they are basically exactly the same.

It's the same.

Yeah, it's the same.

That's good, you know.

So we have a win.

Yes, one, and we have more to check.

A lot more, but we got one.

That's great.

Chishia will keep plugging in all the data and checking the results.

Though so far, it looks like the paper is checking out.

And if the paper passes the whole first phase,

if the code does spit out all the answers that the author said it would,

then the replicators move on to phase two.

Robustus checks.

For Robustus check, we kind of like change some parts of the model

to see whether the original conclusion is still kind of makes sense.

This phase is less objective and requires more context and thought.

It requires the economists to consider the questions that the paper authors didn't think of

or didn't write about.

The decisions the authors made and the decisions they could have made but didn't.

It's like trying to see the negative space in and around the paper.

The kind of things they might find in this phase, you know,

did the authors say that this dataset represents something?

It doesn't.

Did they use an appropriate dataset?

And did they use that data in a way that made sense?

Did they include or exclude certain specifications or factors

in order to have a result that looked exciting?

There are infinite potential choices that researchers make or don't make.

And the replicators have such limited time.

So they're not going to be able to consider and analyze everything.

They're just going to get through as much as they can.

And as the hours start to tick by,

it becomes clear that most teams are not turning up major issues.

Until mid afternoon,

we check in with this one group looking at a paper about government policies.

The basic premise is when people trust the government,

do they tend to comply with policy more?

This is Simon Prevo.

He's an econ master student and a public sector researcher.

The paper found that when people trust in government,

they comply with policies more readily.

So those policies cost the government less money.

And Simon and his teammates are now trying to unravel a mystery.

Because when they went to look at the raw data

that underlies the paper's findings,

it looked a little funny.

This is Scott Morier, another econ master student on the team.

There was a folder called RAW for the raw data,

but the files were all labeled clean.

So we were a bit confused how it was counterintuitive, right?

So Florian downloaded the data straight from the source

and followed the instructions to create the one data set.

They recreated what should be the same data set

following the instructions that the authors left.

They ran a code.

And then that's when we started getting the errors

because variables were missing.

And then as we kept going through,

we kept finding more variables that

that were being used in the regression,

but weren't necessarily included in the,

in the supposedly what is meant to be the raw data set.

Some variables are missing from the raw data set.

The authors seem to have used data in their analysis

that they did not account for.

Not good.

And then we visited the group looking at that paper

about cartel behavior in Mexico.

That group has found something too.

So in this paper, they look at the presence of different cartels.

They tell us the paper looks at 20 cartels

and data about what types of crimes were happening in one.

To see if cartels changed the types of crime they did

after the government ramped up a big war on drugs.

What we found so far is that if you exclude one of the cartels,

then the results become insignificant.

Well, yeah.

So it's just the one cartel making the results?

With cartel making the results.

So if you remove only one, then the result collapse, right?

Oh, no, you found something.

Yeah, we found something in the first test they tried.

Is that luck? Would you call that luck?

No, I think it's something that we thought about it.

That's what we placed it one on the list.

We thought it's a good place to search.

Partly luck, but partly because we thought about it

carefully.

That sounds like not luck.

They're going to keep investigating and depending on what they find,

this paper is maybe not passing this phase, the robustness check phase.

Can you draw a big sweeping conclusion about the effectiveness of a war on drugs

from a change in just one cartel?

They suspect this paper will not hold up.

Over lunch, the cartel team starts puzzling through

like how does this sort of thing even happen?

We have to be honest for sure when you do this kind of papers,

you do this kind of things, right?

You check whether when you have this,

you know, you do this type of robustness checks.

David Benatia, a professor on the team,

says this is a robustness check that he would have tried if he had been the other.

At the end of the day, our researchers limped back into the auditorium

to present what they'd all found.

So the way we like to finish is to give each team about one minute

to tell us how your day went, the different challenges you face.

Maybe we can start from the beginning, move around.

We didn't find anything too major.

There was a lot of missing variables and attrition.

Everything ran fine.

We tried to poke holes in it, but we couldn't really do it.

For the 71 replicators in the Montreal game,

14 teams got to uphold science by double checking some published work.

They spent a day coding with their friends and peers,

learned some new coding hacks, and new ways to make choices in research.

And they'll get a little authorship credit on a metapaper in a real journal.

The other two teams, the group who discovered the missing numbers,

the cartels group, they've gotten like a toxic golden ticket.

Now they'll get to write their report,

polite and formal, but nonetheless kind of a bombshell,

saying just how flawed the research is.

Maybe that makes a splash and everyone thinks they're brilliant,

or maybe it makes a splash and everyone hates them.

Next, Abel will write an email to the authors,

a somewhat standardized note saying,

hey, here's who we are and what we do,

we found some mistakes in your paper.

Would you like to respond?

He does not assume nefarious intentions,

and the authors get an opportunity to try to fix the problem

and prepare their formal response before anything goes public.

And because Abel handles it from his position at the institute

for replication, it doesn't feel so personal.

And the replicators have a little bit of insulation.

We asked Felix from the cartels group what this might mean for him

as a more junior person, a person earlier in his career.

It's kind of throwing rocks towards the top of the profession.

He'd wanted to find something and now he has.

I think it's a good work that we are doing,

but what the implications are, I don't know.

So after a few months, Abel sends his neutral-toned,

official email to the authors of the paper

that Felix and his team had replicated in Montreal,

saying that the code had worked,

but that they found the results don't hold up.

And for the authors of that paper, getting that email?

When we opened that email, we were actually happy

because we actually read your paper replicates.

This is Jacques-Omel Batistone, a researcher

at the Rockwell Foundation in Berlin

and one of the four co-authors of the paper.

He says they were thrilled to have their coding results

publicly validated.

And when it came to the bigger problem,

the fact that their results had fallen apart

when the replicators removed that one cartel,

we were not particularly worried about the content

because it was kind of self-evident

that this was not really challenging.

Not really challenging their findings

because they think the replicators misunderstood

the basic hypothesis of their study.

They say they started with this idea

that there was this one big new cartel in Mexico,

Los Zetas, and it had been doing a lot of crimes,

generating a lot of data points.

Here's another author, Marco Lemollier,

a researcher at Baconey University in Milan.

When we start to think about this project,

actually it had been mined the specific cartel of Los Zetas.

They say they set out to investigate

if the cartel, Los Zetas,

had changed the types of crimes they did

after the war on drugs.

And their papers succeeded at proving that.

What the Montreal replicators did,

in the opinion of the paper authors,

was to remove the main part of the data set

and then say the conclusion was broken.

You can do that, but why would you?

To be blunt, it doesn't make any sense.

That is Paolo Pignotti, a professor also at Baconey University.

He said it was like doing a study

on the effect of spreadsheets on productivity

and then saying, oh, but the results don't hold up

if you exclude Microsoft Excel.

We looked at their paper.

And to be fair to the replicators,

the original paper does not say explicitly,

hey, it's just Los Zetas we're focusing on.

The data from Los Zetas is lumped in

with several other new cartels.

So if the paper authors meant to study

the behavior of just Los Zetas,

that was never quite spelled out.

Mary, when we first rocked up

to the replication games back in May,

I think we were both excited at the idea

that we might watch some junior economists

uncover some major problem

with a published paper in real time.

But Abel had a different take

when we asked him about the problems

that the teams there had uncovered.

Like the team, for example,

that had found issues in the government trust paper.

That seems like success.

But success, it depends with you to find a success.

Well, the process working as it's supposed to.

I mean, in a world in which science works,

I think this should have been picked up

before it's published, cited and disseminated.

So I don't think it's a success.

That's fair.

These papers they're replicating have been published,

meaning they got past journal referees,

professional economists who are supposed

to be gatekeeping the quality of what they publish.

Some of the top journals do check

that the code runs, they press play.

But in the government trust case,

the journal referees apparently

didn't catch that numbers were missing.

That when the paper said,

oh, the documentation is in the replication package,

it was pointing to nothing.

The journal declined to comment.

Though they said they have a robust process

to investigate concerns.

To me, this is a failure of the system,

which is fine.

There's always going to be failures.

I just think that the rate of failures

is higher than what a lot of people think.

Yeah.

And it shouldn't happen that often.

In every replication game so far,

they have found something.

Though not yet any career ending fraud.

It's more like major data or coding errors

or robustness fails.

So the broader system is still broken,

even after putting on more than 50 games

and replicating about 300 papers.

Still, there are signs that the games

are having an effect.

Several replication gamers told us

their experience here will change how they do their research

because they know that their papers

too might someday end up under Abel's spotlight.

Abel says, the more games he can put on,

the more the rest of the academic world

will start to shift.

Because the evidence shows that people

don't actually change their behavior

based on the severity of the potential punishment,

like losing their job or public shaming or whatever.

They change behavior based on the odds of enforcement,

the odds of actually getting caught.

Just the idea that someone might walk through

their apartment one day,

that's enough of a threat to keep it clean.

Hey listeners, what are you doing

on the evening of Monday, April 6th?

Are you free?

Because if you are, I think you should come

to the 92nd Street Y to hang out with me

and some of my friends.

It is the first debut stop on our 12 City Book Tour

to celebrate the publication

of our first ever book, Planet Money,

a guide to the economic forces that shape your life.

Every stop on this tour will be unique with different hosts and guests, and if you get

a ticket, you can get a tour-exclusive tote bag with your purchase while supplies last.

So at the 92nd Street Y, on Monday, April 6th, it'll be me, Amanda Aronjick, Daring Woods,

book author Alex Meassie, and the economist Emily Oster, who is most famous, I think,

for letting pregnant women know that they can actually drink coffee.

So please come and bring your very best economic questions for us.

We can't wait to hang out.

Find the show nearest to you at the link in show notes or go to PlanetMoneyBook.com.

And thank you.

If you want to hear more about the replication crisis, we've done a few episodes about it

and the efforts to fix it, we'll link to those in the show notes.

If you want to support our work, you can donate at npr.org slash donate.

And thank you.

This episode was produced by Emma Peasley and James Sneed with help from Willa Rubin.

It was edited by Jess Zhang, fact-checked by Sam Yellowhorse Kessler, and engineered by

Kotaka Sugi-Chernerwin.

Alex Goldmark is our executive producer.

I'm Alexi Horowitz-Gazzy.

And I'm Mary Childs.

This is NPR.

Thanks for listening.

Don't hate the replicator, hate the game

About this Episode

Hosts & Guests

More from Planet Money

Reese’s heir vs. chocolate skimpflation

Dark times for Cuba’s economic experiment

The skyscrapers that NIMBYs and zoning couldn't stop

Our BOOK vs. the global supply chain