In discussing whether USMLE Step 1 results should be reported pass/fail, I’ve gotten to take part in some lively debates. I’ve heard from many who – for one reason or another – want to maintain the status quo. The “save the scored Step 1” is a varied coalition.
But you know what’s weird?
I never hear anyone argue that the USMLE is a great test, or that residency selection is working as well as it could be. In this debate, there are lots of voices – but the one missing is the unapologetic, full-throated defender of the status quo.
Instead, most defenses of a scored USMLE take the following form:
Step 1 isn’t perfect, but…
Look, #USMLEPassFail makes for some strange bedfellows, also. There are people who agree with my conclusion on test reporting policy, but for reasons that I don’t agree with. And there are also those who ultimately think a scored test is better, but I sympathize with their priorities or logic.
So if you’re new to the #USMLEPassFail debate, I’d like to introduce you to the cast of characters you may meet along the way.
Warning: If you support a scored Step 1, some of the descriptions here may seem a little… personal. It’s not intended to be. In a first draft, I included screenshots of Tweets and DMs with sample arguments. I deleted those, because it felt too ad hominem. The point here, as usual, is to stimulate critical thinking about our medical education and residency selection systems.
Sample Actual Argument: “If students reduce time and effort devoted to preparing for Step 1, they may indeed devote attention to other activities that will prepare them to be good physicians. . . However, if students were to devote more time to activities that make them less prepared to provide quality care, such as binge-watching the most recent Netflix series or compulsively updating their Instagram account, this could negatively impact residency performance and ultimately patient safety.”
Sympathy Rating (on a scale of 1-10): 1
The Oligarch is easily recognized by his position as a highly compensated executive in one of the organizations with a vested financial interest in maintaining a scored USMLE. The Oligarch prefers to avoid overt involvement in the #USMLEPassFail debate. Instead, messaging is subtle and targeted at the accentuating the anxieties of the other characters enumerated below. While others argue, the Oligarch listens carefully, working to engage others to fight on his behalf, softly fanning the flames of Step 1 Mania.
Of all the characters in this drama, I have the least sympathy for the Oligarchs. Yet at the same time, I really have no counterargument to their position. They are completely informed on the issues. They’re just acting in their own interest.
The King’s Court
Sample Argument: “Step 1 isn’t perfect, but we’re making it better. I write questions and volunteer for the NBME’s such-and-such committee. Instead of just complaining about the test, you should get involved, too.”
Sympathy Rating: 6
The King’s Court includes folks who work with the NBME, writing questions, serving on committees, attending meetings, etc. Although their work products are monetized, the King’s Court are almost always unpaid. They volunteer because care about medical education, and they want to make things better.
My only issue with the The King’s Court is that they drink a little bit too much of the organizational Kool Aid. They believe in the holy mission of the organization, but often struggle to see the downstream effects of the decisions they help make. Given their own sincere motives, many members of The King’s Court just can’t understand how financial conflict of interest could possibly impact the organization’s actions.
My sympathy rating would be higher except for the fact that some members of The King’s Court are in fact Hopeful Oligarchs. They’re less interested in changing the world than they are in being a loyal soldier – so that when the next $500,000+ salaried position opens up at the NBME, they’ll be on the short list to fill it.
Sample Argument: “Step 1 isn’t perfect, but studies show it predicts board passage rates and other important measures of physician quality.”
Sympathy Rating: 8
The Stormtrooper lacks an affiliation with the NBME, but unquestioningly believes their messaging on the issues. In so doing, most Stormtroopers are unaware that they’ve been conscripted as soldiers in the Oligarchs’ proxy war.
I’m sympathetic to the Stormtroopers, because I used to accept the NBME’s logic myself. (I think that’s actually me in the back in the picture above, second from the left.)
Have you argued that we should use Step 1 scores for residency selection because they correlate with “valued measures” and “improved practice,” as claimed by the test’s sponsors? Me too! That is, until I checked their references.
Or how about the oft-repeated line above about how Step 1 scores predict board passage? They do… but probably not in the way you’ve been led to believe.
Sample Argument: “Step 1 isn’t perfect – but my residency program receives 100 applications for every position I’m trying to fill. I need something to screen applications.”
Sympathy Rating: 6
The Utilitarian is a program director overwhelmed by Application Fever – the phenomenon of increasing over-application by students that I’ve written (and Tweeted and podcasted) about before. To reduce the mountain of applications received to a manageable molehill, the Utilitarian uses the only universally available numeric metric (which can even be sorted in ERAS!).
I get the Utilitarian’s plight.
Both college and medical school applications are reviewed by full-time admissions officers. Their whole job is to review applications, and they have an entire office of staff to help them. But residency applications are reviewed by doctors – who have to balance patient care, teaching, the actual day-to-day functioning of the residency program, interminable GME redtape, and residency recruitment.
But… if the problem is that program directors are being buried in applications, is their time best spent defending a non-evidence based metric with harmful downstream effects on medical education – or demanding application reform?
Sample Argument: “Step 1 isn’t perfect, but nobody forces PDs to use it. The fact that they have chosen to use Step 1 scores instead of anything else proves it’s the best available metric.”
Sympathy Rating: 3
The Darwinist applies evolutionary concepts to residency selection metrics. In the ecosystem of residency selection, the Step 1 score has survived as the fittest metric. Its existence, therefore, proves its value.
But does it?
Have you ever wondered why we use scores from USMLE Step 1 instead of those from Step 2 in residency selection? After all, at least Step 2 tests material that is more clinically relevant than the basic science factoids on Step 1.
The answer? Convenience.
Historically, USMLE Step 1 was taken between the second and third years of medical school – meaning that every single student had taken Step 1 by the time that application season rolled around. On the other hand, Step 2 was taken after the completion of the third year clerkships – so many students did not have scores available at the time they submitted residency applications early in their fourth year.
Following the introduction of the Electronic Residency Application Service (ERAS) in 1996, it became possible for students to apply to a lot of residency programs with just the click of a mouse. Soon, program directors directors wanted something – anything – to thin the pile of applications in front of them, and the only piece of numeric data present in every single candidate’s file was the Step 1 score.
So let’s not kid ourselves. Step 1 didn’t emerge victorious as the fittest metric from some grand evolutionary competition. It’s a tool of convenience. Its hallowed place in medical education comes from the power we’ve given it – not because of the test’s intrinsic value.
The Grumpy Old Man
Sample Argument: “Step 1 may not be perfect, but why do millennials whine about it so much? Back in the day when we took the NBME Part 1 exam, we all just got on with it. Sure, it was hard, but you didn’t hear us complaining.”
Sympathy Rating: 5
The problem here is really just ignorance. Unless you’re a student, a very recent graduate, or involved in medical education, you just honestly don’t know how bad Step 1 Mania has gotten. The Grumpy Old Man doesn’t remember things being so bad – because they didn’t used to be.
Fun fact: when USMLE Step 1 started in 1992, a score of 200 would have put you right at the mean. Totally respectable score.
Know what a 200 gets you today? A score in the 9th percentile that gets you ruled out of the 64% of residency programs across all specialties who use numeric cutoffs for screening. (Remember, Step 1 is a criterion-referenced test. There’s no “recentering” like on the old SAT – so higher scores reflect better performance)
Problem is, as scores rise higher and higher, students have to spend more and more time to learn material that’s less and less clinically relevant. It’s an arms race with no natural end that systematically devalues everything in medical education that’s not explicitly tested on the exam. (In fact, I have a hunch that some of The Grumpy Old Man’s beef with millennials would be mitigated if medical schools graduated students who were more junior clinician and less MCQ Assassin.)
The Basic Science Believer
Sample Argument: “Step 1 isn’t perfect – but science is the language of medicine! Without an in depth understanding of basic science, physicians will be but mere technicians. How will MDs be any different than NPs or PAs?”
Sympathy Rating: 3
The best counter to the Basic Science Believer is fact.
Show them some of the real questions being asked on USMLE Step 1. Then ask them to describe a situation in which having said facts memorized would enhance the medical care of a real human being. I’m all for science – but if the science we’re teaching is poorly retained and minimally relevant to providing patient care, does the fact that we saw it on a test once really distinguish us from NP/PAs, anyhow?
Look, basic science is useful – when it’s taught to the right person at the right time. I think a pass/fail Step 1 is adequate to ensure that students have a scientific foundation upon which to build upon. After that, more detailed science should be taught in a specialty-specific way.
For example, once I’d entered my nephrology fellowship, I would have loved the opportunity to audit courses covering renal physiology, histology, pathology, and pharmacology. Reviewing the scientific underpinnings of kidney function in health and disease would give me deeper understanding of my field – and probably make me a better doctor.
Would the same review be useful to someone pursuing a career in OB/GYN? No. But some embryology, reproductive endocrinology, or high-level pelvic anatomy might.
A pass/fail Step 1 isn’t an attack on science. It’s a way to be more deliberate about what we teach, instead of force-feeding students facts that will be long forgotten by the time that could have even been remotely useful.
The MCQ Assassin
Sample Argument: (posted under a pseudonym on Reddit or Student Doctor Network) “Did you see this $*%! about Step 1 becoming pass/fail? These f*%& professors are just mad because we don’t go to class.”
Sympathy Rating: 8
MCQ Assassins are medical students who go all in on Step 1.
Assassins prepare for the test with single-minded determination and ruthless efficiency. Many spend more time in UWorld than the real world and know Dustyn Williams and Hussain Sattar better than their own families. A scored USMLE will benefit them, and they don’t want it to go away until it has.
Believe it or not, I’m actually pretty sympathetic to the MCQ Assassins. I respect their work ethic. But for the sake of themselves and society, I just wish they channeled it into something else.
See, because of their drive and determination, most MCQ Assassins would succeed under any other system. So can we not use a “competition” that actually results in the competitors gaining useful skills instead of memorized minutiae?
And by the way, I do not support #USMLEPassFail because I’m trying to force students to come to lectures. Fifty years ago, the most efficient way to learn medical knowledge was in the lecture hall or from a dusty textbook. Times have changed. Students should use the best available resources to study – and these days, almost all of those exist outside the walls of the medical school.
However, medical school faculty have expertise and skills to teach that go beyond what’s in First Aid or Pathoma. Instead of trying to re-create content that’s better done elsewhere, faculty need to focus on areas where they can provide meaningful real world skills and education. A pass/fail test simply gives students the mental space to engage in that process.
Sample Argument: “Step 1 isn’t perfect, but I went to a low-tier medical school – and the only way I got big-name programs to look at me was because I killed Step 1.”
Sympathy Rating: 7
The Non-Impostor is the second stage in the intellectual life cycle for many MCQ Assassins. The Non-Impostor worked hard and scored well on Step 1, and believes that their score was the only thing that opened the door to their residency spot or career path.
Many residents feel this way – and not all of them are Non-Impostors. The key personality trait of the Non-Impostor is that their high score is tied to their self-esteem – so they perceive criticism of the test as an attack on their own intelligence or work ethic. Some Non-Impostors defend Step 1 with an emotional vigor that seems to betray a secret worry that if we do away with Step 1 scores, they’ll be outed as a phony or thought to be undeserving of their position.
What I want to say to the Non-Impostor is this:
Maybe your Step 1 score opened a door for you. But the the key ingredient wasn’t Step 1 – it was you.
Look, you did what you had to do. We told you to memorize basic science, and memorize it you did. We told you that Step 1 scores were the most important thing in your application, so you hit Step 1 with everything you had. You succeeded in the gauntlet that we put you through, and you should rightly be proud of the work you put in and the knowledge that you gained in doing so.
Making Step 1 pass/fail doesn’t take away your accomplishment. But it does make it possible to create a system in which future versions of you can distinguish themselves in challenges that will leave them better prepared for their future.
Sample Argument: “You mock Step 1 questions, but Step 1 scores aren’t useful because of the material! The scores are useful because the test is hard. It shows us which students have the perseverance, toughness, and grit to make it in medicine.”
Sympathy Rating: 2
The Fish-Grabber is what some Non-Impostors become once they’re in practice. They finally acknowledge that the material on Step 1 was next-to-worthless for preparing them for their real jobs, so the use a different argument to justify continued use of scores.
I call them Fish-Grabbers after the teachers in The Saber Tooth Curriculum who insisted upon teaching students how to snatch fish out of streams with their bare hands – even years after the last catchable fish had been caught.
“But damn it,” exploded one of the radicals, “how can any person with good sense be interested in such useless activities? What is the point in trying to catch fish with the bare hands when it just can’t be done any more?”
“Don’t be foolish,” said the wise old men, smiling most kindly smiles. “We don’t teach fish-grabbing to grab fish; we teach it to develop a generalized agility which can never be developed by mere training.”
I have little sympathy for this argument, because it acts as if the content of the test doesn’t matter. Really?
How about this: let’s scrap Step 1 and instead ask medical students to memorize digits of pi.
Better yet, let’s see who can accurately count the most grains of sand in an hourglass. We could even make them do it in a dark room with 80% relative humidity and pumped-in Muzak.
If the test’s content doesn’t matter, then why not use the most boring, pointless task you can come up with? Then you’d really find out who’s ready for a dermatology residency!
Medical students will rise to meet whatever standard we set. As educators, we owe it to them to ensure that those standards are meaningful. Let medical students show their worth on tasks that actually leave them or the world better off. Otherwise, we’re just hazing.
The Residency Selection Libertarian
Sample Argument: “Step 1 isn’t perfect, but who are you to tell residency program directors what they’re allowed to use in picking their residents? If they find value in Step 1 scores, then they have the right to use them.”
Sympathy Rating: 3
Look, I’m all for liberty. And under normal circumstances, if a residency program director chose to use factual recall of the various RNA polymerase isoforms to determine entry to her residency program, I’d leave her to it.
Except we don’t live under normal circumstances – we live in the era of Step 1 Mania.
To quote legal scholar Zechariah Chafee, Jr., “Your right to swing your arms ends just where the other man’s nose begins.”
Remember, although PDs are not compelled to use Step 1 scores, medical students are compelled to take it. So what about their rights? Do they have a right to engage in educational content that is not explicitly tested on Step 1 without suffering a competitive disadvantage?
Because as Step 1 scores rise higher and higher, that right is increasingly infringed upon, as students are required to memorize more and more material of less and less clinical utility just to ensure that their application doesn’t get thrown in the trash because of a three-digit score below an arbitrary cut point.
Arguing that PDs have an inalienable right to a compulsory assessment violates the very principle of liberty that this argument at first appears to espouse.
For the PDs who find value in a score from a MCQ test of basic science, I say: go make your own.
Let students engage in it or not based upon their desire to gain entry to your program. That’s the real libertarian, free market approach. (And guess what? If you do, I’ll bet you’ll solve your “I’ve got too many applications to read so I have to have a screening metric” problem, also.)
The Smarty Pants
Sample Argument: “Step 1 may not be perfect, but just yesterday I saw a patient with a cherry-red spot on the macula and correctly diagnosed him with Tay-Sachs disease!”
Sympathy Rating: 2
The Smarty Pants has anecdotes to share.
Tales of zebras caught by their Keyser-Fleischer rings or Birbeck granules; of lives saved through the identification of eponymous signs and recognition of triads. And it’s all because of Step 1!
The flaw in Smarty Pants’ argument is that it supposes that these clinically useful tidbits could only have been learned through Step 1 preparation. More importantly, it overlooks all the diagnoses that didn’t get made because they focused so much energy on memorizing facts with minimal clinical utility.
We idolize figures like Gregory House for their ability to blurt out a correct diagnosis with minimal information. It makes for good TV – and impresses people on rounds – but is it really the model for training our future physicians?
I say no. Should we encourage our students to look for currant jelly sputum, Charcot-Leyden crystals, and ground glass opacities in every patient who comes in with a cough?
The Scaredy Cat
Sample Argument: “Using Step 1 scores for residency selection isn’t perfect – but it’s better than the alternative! Without a Step 1 scores, residency spots will be assigned by pass/fail preclinical grades, next-to-worthless letters of recommendation, and school prestige!”
Sympathy Rating: 5
More than anything else, the Scaredy Cat is a victim of NBME fearmongering.
See, if you work for a corporation who has a vested financial interest in maintaining a scored USMLE, one way to maintain support for your product is to make people fear the alternative.
Here’s an example of such messaging from the presidents of the NBME and FSMB:
Elimination of numeric Step 1 scores may also lead to the development of inferior, less psychometrically sound, and expensive alternative assessment in the information vacuum that would be created, with potentially increased costs for students and little guarantee of the reliability and validity of the substituted assessment.
Imagine that the NBME was selling peanut butter and jelly sandwiches instead of licensing examinations.
WAITER: Good evening, sir. Welcome to PB&J’s. For your dinner, would you like a peanut butter and jelly sandwich?
PATRON: Uh, not really. Is that the best thing you’ve got?
WAITER: But sir, this is PB&J’s. All we sell are PB&J’s.
PATRON: (to the rest of the party) What do you all think? Should we go to a different restaurant?
WAITER: You could go to another restaurant. But what if they serve you something worse? For all you know, they may bring you a half-eaten burrito covered in hair and broken glass!
PATRON: Uhhh… we’ll take the PB&J. It won’t be the best meal we’ve ever had, but it’s better than the alternative!
Do you see the false dichotomy? According to the NBME, these are our only choices:
1. Continue to use Step 1 for residency selection.
2. Enter a post-Step 1 apocalyptic wasteland, where the only available metrics are biased, unfair, and next-to-meaningless.
But is that really our choice set?
No, it’s not.
The real choice set is:
1. Continue to use Step 1 for residency selection.
2. Use the best selection metrics that we as a society can come up with.
Step 1 was never intended to function as the Residency Aptitude Test, and probably does so poorly. Why are we so sure that Step 1 is the best we can possibly do?
The Impossible Standardist
Sample Argument: “Step 1 may not be perfect, but it’s the best thing we’ve got. And until you have proof that another system works better for residency selection, we need to stick with Step 1.”
Sympathy Rating: 2
The Impossible Standardists float an argument cloaked in the trappings of evidence-based medicine. It has particular appeal to physicians for this reason. After all, the standard of care treatment for a given disease may not be perfect – but we don’t deviate from it until we have evidence that the new treatment is better.
Yet this argument has two important flaws.
First, ignores a crucial aspect of the market space in which the USMLE operates.
See, most of medicine is a free market. You want to make a new treatment for a particular disease? Go for it. Better yet, there is a system in place for you to prove the efficacy of your idea to an uninterested body so that it can be used in practice.
Medical licensure testing, however, is a monopoly. For allopathic physicians in the U.S., taking the USMLE is the only way to get licensed. And thanks to the downstream effects of Step 1 Mania, it’s now the only objective measure that programs have.
Unfortunately, to prove that another system could work, we’d have to have another system to study. It’s naive at best – and frankly deceptive at worst – to demand proof that something is better when the NBME has the marketplace in a chokehold.
Second, the Impossible Standardists assume that there is sufficient proof that the Step 1-based system is functioning well. But what evidence is there that this is the case?
Follow my Twitter feed, and you’ll hear from many people who feel that Step 1 scores benefit them (see The Non-Impostors, above). But you’ll also hear from many others who feel that the current system doesn’t recognize their talents. So who’s right? How can we transcend these anecdotes and prove that one system works better than another?
The honest truth is, I don’t know. And I’ve been thinking about it a lot.
But I do know this:
Residency selection is a competitive process. And in any competition, some people will win; some will lose. Those who win will likely insist the game is fair; those who lose may cry foul.
But unlike some others in this debate, I didn’t get into this because of self-interest. I got into it because I know we are not doing as good of a job educating our students as we could be. Whatever basic science factoids students gain in Step 1 Mania, it’s more than offset by a loss in real world skills, clinical problem solving, and critical reasoning. Going all in on a test of memorized basic science is not the best way to train doctors in 2019.
So on second thought – maybe I am arguing out of self-interest. Because I want the doctors who will care for me in the future to be as well-trained as they can possibly be.
The American Dreamer
Sample Argument: “Step 1 may not be perfect – but as an international medical graduate (IMG), the only way to get programs to look at me is by doing well on the test.”
Sympathy Rating: 10
Of all the arguments, this is the one to which I am most sympathetic.
The American Dreamer doesn’t have to be an IMG. It can be anyone who believes in the “American Dream” – the idea that with talent and hard work, anyone can do anything; that where you end up doesn’t have to depend on where you start. More than anything else, American Dreamers believe in the idea of an American meritocracy.
I’m sympathetic to this argument because I want the most deserving students to succeed. I just think we’re placing our faith in the wrong instrument.
Look, I grew up in a rural area. Most of the doctors who cared for me and my family were IMGs. And they were good Doctors, with a capital ‘D’. They worked hard; put their patients’ needs above their own; and showed compassion and caring that transcended cultural barriers. In no small part, they were the role models that inspired me to pursue a career in medicine myself.
But tell me – of all those admirable traits, which were measured by their Step 1 exam score?
I do not believe that every person with the capability to be a great doctor went to medical school in the United States. As Americans, we’re in an incredibly privileged position that we get to pick and choose among some of the best and brightest from the rest of the world to join us.
But why do we think that Step 1 is the best way to identify the best and brightest?
I appreciate the challenges that IMGs face, and that PDs face when evaluating them. But folks – we can do better than Step 1. If we need a test, why not use Step 2 CK? Or what about allowing IMGs the option of having their Step 2 CS scored? Either of these tests better assess skills that PDs care about. (And I think we can do even better than that.)
Step 1 is not the Hogwart’s Sorting Hat – even though that’s what we’ve asked it to be. It’s a multiple choice test of basic science. Nothing more, nothing less. It’s time to stop making apologies for it being ‘the best we can do’ – and do better.