On pi and Step 1 Mania

What could an irrational number have to do with residency selection?

Lately, I’ve been talking a lot about USMLE. I’ve argued that the current culture of Step 1 Mania is destroying medical education, and I’ve asserted that the test should be made pass/fail.

Not everyone agrees with that. I’ve addressed some of the objections previously, but recently a new one has come up.

A couple of people have reached out to me and made the case that the fact that we have chosen use Step 1 scores in residency selection proves that they are intrinsically valuable.

The logic goes something like this:

  • Medical students are not dumb.
  • Residency program directors are not dumb.
  • Residency selection is a competitive environment, where the best candidates and programs succeed.
  • If using Step 1 scores did not provide a benefit for candidates or programs, the idea would have died out, and we’d be using some other metric instead.
  • Therefore, the fact that Step 1 scores have emerged as the champion in this residency selection “survival of the fittest” competition means that it is empirically true that Step 1 scores are valuable and beneficial.

Hmmm….

This is a curious argument. In previous posts, I’ve tried to make my case using data and evidence.

But today, let’s do something a little different. Let’s do a little thought experiment.

Suppose that…

…we lived in a world where, in order to ensure the public safety and maintain standards within the profession, physicians were required to obtain a license.

And in this world, being a physician is hard. There’s a lot of information to learn, and there’s not always time to look it up.

“A good doctor must have a good memory,” someone says. “We could use a test of memory as one requirement for licensure.”

Heads nod in agreement.

“The test must be objective,” says someone else, “so that we can ensure fairness among applicants with different cultural or educational backgrounds.”

Heads nod again.

So they devise a test: memorizing digits of pi.

__

To administer the test, the National Board of Pi Examiners (NBPE) is formed. Medical students must pay a fee – let’s say, $630 – for the opportunity to stand before the board and recite as many consecutive digits of pi as possible.

To determine who passes the test, the Board sets a minimum passing score. To be licensed, a candidate must memorize at least – oh, I don’t know, maybe 194 digits – in order to receive a passing score.

However, let’s suppose that NBPE makes one very important decision regarding the reporting of their test results. Instead of simply reporting the results of their test as pass/fail, the NBPE chooses to report the total number of digits of pi that each examinee recites.

“Students will benefit from having scores reported,” says an NBPE executive. “Even a mediocre student is entitled to know how far he escaped the noose.”

Heads nod, and henceforth, NBPE score reports include the examinee’s score.

__

So far, is there anything wrong with this?

Probably not. I mean, maybe memorizing digits of pi isn’t the best test for physician licensure – but the test is objective and the standard isn’t too onerous. And the idea of giving students their scores seems fair and rational, right?

Well, I forgot to tell you one other thing about our hypothetical world…

Let’s also suppose that…

…in this completely hypothetical world, the number of medical school graduates and residency positions were not perfectly aligned. Therefore, graduating medical schools were forced to compete with each other for a scarce commodity.

And finally, let’s suppose that in this made-up world that is totally unlike our own, overworked residency program directors want tools to separate good applicants from great, and great applicants from exceptional ones.

“Hey, what about using that NBPE test?” says a program director. “You know, the one where the students memorize digits of pi.”

“Well… it is objective.” says another.

“And numerically precise!” says a third. “It will be much easier to interpret than grades or letters of recommendation.”

And thus it comes to pass that the NBPE exam is used for residency selection.

Now, close your eyes and imagine…

What do you think will be the natural outcome of this system?

Here’s what I think will happen.

Residency programs will assure students that the number of digits of pi memorized is not the only factor that matters in residency selection. “It is but one factor among many,” the program directors will say.

Yet because it provides a relative advantage – however small – in an important competition, the most driven and dedicated students will choose to put in extra work to memorize as many digits of pi as possible.

A few residency programs – already overwhelmed by increasing applications – will decide to use a strict cutoff to filter applications and offer interviews.

And even at programs that do not use a strict cutoff, students will notice that applicants who memorized more digits of pi seem to have an advantage in gaining access to the most coveted residency positions.

NBPE researchers will identify a loose correlation between memorizing digits of pi and other similar tests of memorization such as the In-Training e Memorization exam (ITEM) or the Dictionary Recitation Test (DiRT) used by some subspecialty boards for certification. Even though these other tests are essentially measuring the same isolated skill (one which may have little to do with the actual practice of medicine), these data will be widely cited as “validation” of the NBPE’s exam.

Reassured by the aforementioned studies, program directors feel less guilty about using the pi exam – and begin to lean more heavily on its results. “It may have nothing to do with being a good resident, but our residents must pass the Dictionary Recitation or our program will be shut down,” they note. (They fail to appreciate that the relationship between scores and passage is non-linear.)

From time to time, critics will point out that memorizing digits of pi has no relevance to the practice of medicine. “Silly critics,” the NBPE executives will reply. “We don’t learn digits of pi for pi’s sake – we learn them to learn focus and stamina and willpower.” (Few critics counter that these traits might also be learned through the study of relevant content.)

Because of its growing importance in residency selection, medical students will memorize more and more digits of pi with each passing application cycle.

In response to increasing student performance, the NBPE will progressively increase the minimum passing score in order to maintain a constant failure rate. “If everyone passes, our exam will appear unnecessary!” they cry with chagrin.

As students memorize more and more digits of pi each year, they must put increasingly more time into pi memorization to gain a relative advantage over their peers – even though there no evidence that memorizing lots and lots of digits of pi makes one a better doctor than memorizing just 194.

For-profit companies will enter the marketplace, selling a variety of test prep items designed to assist in the memorization of pi. These will not be cheap – but because of the outsized role of the NBPE exam, even deeply indebted students will willingly pay for these resources (even if it means financing them at interest).

Upon observing the success of the for-profit entities, the non-profit NBPE will start selling its own practice exams to students. Eventually, the NBPE will derive more profit from sales of test-prep materials than from the the test itself.

For displaying such fine business acumen and achieving consistent corporate growth, NBPE executives will earn seven-figure salaries and memberships to social clubs offering $45 tequila shots and dancing at eye-level with the city skyline.

The preclinical medical school curriculum will be compressed, since students need more and more “protected time” to study for the NBPE exam. Eventually, the average student will spend 11 hours per day over 35 days in a protected period devoted to dedicated pi memorization at the end of their preclinical classes. (They do not get a tuition refund during this time, however.)

Noting that their students are stressed and overworked by the competing demands of coursework and preparation for the NBPE exam, well-meaning deans and faculty change their schools’ curricula to pass/fail.

With fewer demands imposed by their classes, medical students use the extra time to memorize digits of pi. Prep time and mean scores rise even higher, though after a couple of years students are just as stressed and overworked as they were before.

Without grades to evaluate candidates, residency programs become even more dependent on the only objective criterion they have left: the NBPE exam.

As these cycles repeat, year after year, students will cry out that they are suffocating under the increasing pressure of pi memorization. Some students will write a well-articulated commentary on the “pi climate” in preclinical medical education, and suggest that NBPE exam results should be reported as pass/fail.

The financially-interested NBPE will defend the status quo, perhaps even going so far as to warn of the possibility that students unburdened with the NBPE exam might spend too much time binge-watching Netflix or incessantly updating their Instagram accounts.

Does any of this sound familiar?

It should. In this completely hypothetical world, things will play out exactly as they have in the real world with the USMLE.

Think I’m wrong? Tell me which of the things above are not a natural consequence of the system we created.

USMLE Step 1 does not dominate preclinical medical education because of the intrinsic value of the exam. Residency selection metrics are not finches in the Galapagos – they’re tools used by organisms with their own survival interests to worry about. And in the ecosystem we’ve created, Step 1 Mania is the expected byproduct after considering the real incentives and payoffs to those involved.

__

Now, if you found this thought experiment off-putting, it’s probably because the example I used is so far fetched. Obviously, memorizing more and more digits of pi has absolutely nothing to do with practicing competently as a physician. Right?

Well, let me ask you this: what data do we have that the USMLE functions any better?

In the past, I pulled the references cited by NBME executives that they used to claim that Step 1 scores predicted meaningful outcomes. Those data were shaky at best.

Notice that I said “meaningful outcomes.” By that I mean “something that a patient cares about” or “anything a doctor does other than take another standardized MCQ test” – because practicing medicine isn’t a multiple choice test. This was best articulated here:

“Physicians do not report for work in hospitals and clinics and spend the day answering multiple-choice questions. Instead, they obtain patient histories and examine patients; exercise judgment; express compassion and caring; solve problems; communicate with patients, families, and colleagues; educate themselves and others; perform complex clinical procedures; and display professionalism in many ways. Continued reliance on USMLE scores to predict the ability of an individual trainee to perform these diverse tasks is unsupported by data.”

(n.b., the original article; the response from NBME executives; and the subsequent rebuttal, in which the above quote appears are all worth reading in full.)

Conclusions from this thought experiment

USMLE Step 1 Mania is not driven by the value of the test material. Let’s stop pretending that it does. We need not worry that we’re giving up something of value by honoring the test’s purpose and making it pass/fail. (If you really need a multiple-choice test to triage applications for residency, then at least let it be one designed for and proven to function for that purpose.)

Medical students will achieve the standard is set for them. Medical students are among the most hard-working, dedicated, and driven people on the planet. Tell them to memorize digits of pi (or Krebs cycle intermediates, or named skull foramina, or the strand sense of viruses, etc.) and they’ll do it. As educators, we owe it to them to set standards that are meaningful.

We need to measure what matters. Residency selection is a competitive environment. But it does not logically follow that the best ideas will naturally rise to the top. In any competitive environment, if you measure something, you’re gonna get it – so you’d better be sure that you’re measuring the things that really matter to you. I’m hard-pressed to believe that a given student learns much that matters to anyone in the final 25-50% of his or her Step 1 prep.

There is no natural end to Step 1 Mania. We are locked in a self-amplifying cycle of more test prep, higher scores, more stress, more revenue for the NBME, rinse, lather, repeat. Things will not naturally get better. If we want things to change, we have to have the courage to change them.