Why can’t I re-take USMLE Step 1?

Every few weeks, someone contacts me to ask some variation of this question:

Dear Sheriff,

I want to match in a competitive specialty – and my Step 1 score may be the difference between achieving my dream or having a second choice career for the rest of my life. That’s a lot of pressure for one test! Why can’t students re-take the USMLE to improve their scores, just like we could for the SAT or MCAT?

Sincerely,

A student

–

Good question.

So today, let’s mine this topic, with the goal of answering the question once and for all: why doesn’t the USMLE allow re-takes?

The case for re-takes

On the surface, the case for allowing re-takes is compelling. If we’re going to use USMLE scores to inform career-altering decision-making, why settle for a single point estimate of a candidate’s ability?

After all, USMLE scores aren’t all that precise – the 95% confidence interval for an individual candidate’s score spans 24 points. Allowing re-takes would allow program directors to get a more accurate assessment of a student’s abilities, and students wouldn’t have to worry so much about the consequences of having a bad test day. It’s a win-win, right?

Plus, we allow re-takes for every other standardized test that we use for admissions. If your SAT or ACT score isn’t high enough to get into the college of your dreams, it’s not the end of the world. Just try again. Same thing goes for the MCAT, the LSAT, the GMAT, the GRE, etc. Only the USMLE is a one-and-done deal.

But there’s a fundamental difference between the USMLE and those other tests – and this difference makes it extremely unlikely that the National Board of Medical Examiners (NBME) would ever allow multiple attempts for students who pass.

Standardized tests: norm-referenced vs. criterion referenced

The admissions tests noted above are all norm-referenced tests. They are designed to demonstrate how test-takers perform relative to one another.

Take, for instance, the MCAT. You don’t “pass” the MCAT. You just get a numeric score. The score shows where you fall within the distribution of all MCAT test-takers, as shown below.

For the 2018-2019 admissions season, the average MCAT score for matriculating medical students was 506. (Back when I took the MCAT, the scale was different, and a 506 would have been a 29.)

So what does it mean to score a 506 on the MCAT?

It means you scored in the 73rd percentile of all test-takers. Nothing more, nothing less. See, the score on a norm-referenced test doesn’t tell you anything in an absolute sense.

If a student scores a 506, how much science do they know? More than someone who scored a 496, and less than someone who scored 516. But beyond that, who knows? If we want to know whether someone knows enough science to succeed in medical school, we need to design the test somewhat differently. We need a criterion-referenced test.

A criterion-referenced test evaluates how the test-taker’s performance compares to a predetermined standard.

Take, for instance, the test you took to get your driver’s license. The test doesn’t tell you how you compare to other drivers – it makes an assessment of whether you have enough knowledge and proficiency to operate a motor vehicle with reasonable safety.

To pass a criterion referenced test, the test-taker must exceed the objective standard. If 100 teenagers took driver’s license exam, and they were all competent drivers, then all 100 would pass. If they were all incompetent, then they would all fail. One person’s exam score has nothing to do with another’s.

On the other hand, if we had a norm-referenced driver’s education exam, only 1 driver could be in the 99th percentile – even if that driver was incompetent, as in the figure below.

The USMLE is a criterion referenced test

Believe it or not, Step 1 was never intended to be the Residency Aptitude Test (RAT). The primary purpose of the test is to assist state medical boards in making a determination regarding a candidate’s fitness for licensure. And to do that, the NBME uses an extensive standard-setting process to establish the minimum threshold of knowledge that a physician should possess. (In the past, I’ve been very critical of some of the data that go into that standard setting process – but for the moment, let’s put those objections aside.)

In other words, unlike the norm-referenced MCAT, the USMLE is a criterion-referenced test. By passing the USMLE, a test-taker demonstrates that he or she possesses the minimum standard of knowledge needed for medical practice. Therefore, a candidate who fails Step 1 is required to re-take the exam (assuming he or she still wishes to engage in the lawful practice of medicine).

However, if re-takes were allowed, this could lead to some sticky situations.

The problem with re-takes

Imagine a student who wants to enter a competitive specialty. She takes USMLE Step 1 and passes with a 215.

This is bad news. She knows that her score will result in her application being screened out at her dream programs, and she’s heartbroken.

But let’s say that we let her take Step 1 again.

So she studies hard, memorizes some facts with peripheral relevance to real medicine, and re-takes the test.

And this time, she scores a 239.

The program director at her dream program is impressed with her fund of knowledge and her perseverance, and invites her to interview. She rocks the interview, matches at her top program, and everyone lives happily ever after.

So what’s the problem?

Well…

Suppose another student with a 215 re-takes the test to improve his score. Only this time, he fails with a 191.

What should the medical board do with him? Does he get a license, or not?

See, if a candidate fails their initial exam, but then passes the re-take, medical boards can tell themselves, “Look, now he’s competent! He may not have had enough knowledge to safely practice medicine before, but now he does. We can safely grant him a license.”

But if a candidate passes the initial exam but fails the re-take, what do you do? Require another re-take? What if they fail that? It could become a very messy situation. And this is why the NBME is highly unlikely to ever allow re-takes for students who pass.

How to have cake – and eat it, too

The current system is the policy byproduct of the NBME’s desire to have their cake and eat it too.

On one hand, they want to enjoy the financial benefits of the USMLE being used as if it were a norm-referenced admissions test. (Did I mention that they generate more excess revenue by selling cold test questions than they do from the USMLE itself?)

On the other hand, they’re obligated to protect the primary purpose of their criterion-referenced exam, which is to assist in making licensure decisions. Because if that gets jeopardized, so does their market monopoly.

(For whatever it’s worth, I’m also opposed to re-takes. I’m very sympathetic to students who have a bad exam day – but allowing re-takes will only fan the flames of Step 1 Mania at a time when the house of medical education is already burning down.)

How to make things better

Make the USMLE pass/fail. If we need a test of basic science to inform binary decisions on licensure, fine – but report the results in a manner consistent with that mission.

And if you really need a test to screen or evaluate candidates for residency, fine. But design a test that actually measures the competencies you seek, or the identifies deficiencies you’re trying to identify. (Hint: a one-size-fits-all MCQ test of basic science trivia reported as a three-digit number doesn’t do either of these things very well.)

Until then, students, we’re stuck with the USMLE being a one shot deal. So study up, and don’t walk underneath any ladders or let any black cats cross your path, because a bad test day will stick with you.