InCUS: The Conversation Continues

I was a college student home for Christmas break when the Y2K panic reached fever pitch.

For the benefit of readers who may be too young to vividly remember, as the turn of the millennium approached, there was growing alarm about what would happen to computer systems when dates in the MM/DD/YY format ticked over from “99” to “00”. Would our computers – or the systems that depended on them – go haywire? Would planes fall out of the sky? Could a computer glitch trigger an accidental nuclear war? Nobody was quite sure what was going to happen, but we were all pretty certain that something would. So long before the ball dropped on New Year’s Eve, I dutifully unplugged all of our family’s computers and uneasily settled in to await whatever was coming.

And then… nothing happened.

I mean, I think there were some lottery machines in Delaware that stopped working for a few hours. But we somehow all struggled through it. And 20 years later, what seemed like it would be such an important event in human history has been relegated to a historical footnote.

Until now, Y2K has ranked as the biggest non-event of my lifetime. That is, of course, until the release of the InCUS Summary Report and Preliminary Recommendations last week.

For years, there has been growing concern about the use of USMLE Step 1 scores in residency selection. And as the controversy built to the head, we were provided with InCUS – the Invitational Conference on USMLE Scoring, a “national conversation and exploration” that would determine the fate of Step 1 scoring.

Instead, we got a 20 page report without any strong recommendations that ultimately adds little to anything we knew already.

It’s taken me a few days to overcome the ennui and try to break down what happened at InCUS. But I’m finally ready.

So as I did on the eve of the InCUS meeting, let’s break this thing down, Q&A style.

So what happened at InCUS?

Not much.

Forty-five people were invited to meet in Philadelphia with around 20 executives from organizations with a financial interest in the USMLE. They talked about things and participated in a “village fair.” And at the end, five executives from these financially interested entities wrote a paper cataloguing the problems with the current system and pledging further study.

Did they actually make any recommendations?

Sort of.

Okay, okay – yes, they did make four specific recommendations. But I use the word “specific” very loosely.

Reduce the adverse impact of the current overemphasis on USMLE performance in residency screening and selection through consideration of changes such as pass/fail scoring.
Accelerate research on the correlation of USMLE performance to measures of residency performance and clinical practice.
Minimize racial demographic differences in USMLE performance.
Convene a cross-organizational panel to create solutions for challenges in the UME-GME transition.

Did you get that?

In other words, we’re going to consider changes like pass/fail grading; try harder to disprove the null hypothesis that multiple-choice question tests do not predict clinical performance; and convene another meeting to talk about all of this again.

Did any new information emerge from InCUS?

Well, they did finally address whether attendees received any financial compensation.

For the record, InCUS invitees did not receive cash honoraria. But they did receive compensation for lodging and transportation. (Not sure why it was so hard to get a straight answer to that question a few months ago.)

Other than that, the problems and solutions are the same ones we’ve all been discussing for months. Nonetheless, the report is still worth a read, as many of the points raised by the invitees are very poignantly stated (even when translated through the pen of the corporate writing group).

What’s next?

We’re now in a period of public comment. (You can – and should – submit your thoughts here. The portal is open until July 26, 2019.)

After that, final recommendations will be prepared and released this fall.

No, I meant, like, what does the future hold for USMLE scoring? Is the test going to be pass/fail or not?

At first glance, the InCUS recommendations are so vague that there’s not much to provide insight into this question.

But…

…if you look past the generic recommendations and calls for more research and meetings, the authors do tip their hand. It happens at the very end of the document, and suggest the likely roadmap forward. I’m just not sure I like where it’s going.

Check out the “Final Reflection on USMLE Numeric Scoring” on page 17, excerpted below.

Many InCUS attendees acknowledged that the controversy over USMLE numeric scores might not exist if, for example, USMLE numeric scores were weighted 10%, or even as high as 25% in residency screening and selection. . . [M]any medical schools frequently place a 15-25% weight on subject examinations in the context of a clinical clerkship. . . If, for example, certain specialties were to develop consensus opinion on weighting of a numeric score or other performance result from USMLE, and justify this approach, applicants for residency might direct their efforts accordingly…”

This passage was particularly interesting to me due to a persistent rumor I began to hear a few weeks before the report was released.

From several sources, I was told that certain NBME executives were advocating the idea that ERAS should calculate a numeric score that program directors could use for screening applications. (The “ERAS Score,” of course, would be based in part on the USMLE score).

This section of the document – both in its language and its position as the “final reflection” of the paper – seems to confirm this as the most likely strategy.

The authors also suggest how this strategy will be implemented: by convincing a group of overworked program directors in a particular specialty to commit to a specific weight for Step 1 scores. That specialty could then ask ERAS to calculate the “ERAS Score” for them, and use that score in lieu of Step 1 scores to screen candidates for interview.

Let me just say, this is a frickin’ genius move on the part of the NBME.

Think about it: if we start using the ERAS Score for candidate screening (instead of Step 1 scores per se), then the NBME gets to wash their hands of this whole messy affair. In response to any criticism of the test, they can just point their finger at ERAS.

Hey, we’re just the maker of a necessary licensing examination. If you have a problem with the ERAS Score, take it up with the AAMC!

Whoa – we didn’t decide that Step 1 scores should be 25% of residency selection decisions – the program directors did that!

The USMLE isn’t racially discriminatory – it’s the ERAS Score that’s biased!

Step 1 Mania isn’t ruining medical education – ERAS Score Mania is!

However, if Step 1 scores account for 25% of the ERAS Score, do you think students will have any less desire for the NBME’s lucrative ancillary services (like their Self Assessment exams)? Probably not. So if this strategy comes to pass, the NBME’s bottom line will be protected, and the AAMC/ERAS will now take the blame for the system’s failings.

Wow. But why would ERAS agree to this?

At first, it seems crazy, right? Why would they volunteer to be the scapegoat?

But when you consider the revenue they generate from Application Fever, it’s a small price to pay.

Here, we need to be very clear: it’s the availability of Step 1 scores that enables Application Fever in the first place.

Replacing Step 1 scores with a different quantitative metric that can be used for hard cutoffs in candidate screening will allow program directors to keep their sanity – and will prevent them from organizing to demand an end to Application Fever. So yeah, I think this will seem like the least disruptive solution to the AAMC as well.

But wouldn’t an ERAS Score solve some problems?

I guess it depends what problem you’re trying to solve.

“Use Step 1 to compute an ERAS Score” is not an answer anyone would come up with to the question, “How do we make medical education and residency selection as good as they can be?”

It is, however, a pretty good answer to the question of, “How do we defuse the current controversy with a minimally-disruptive ‘solution’ that allows us all to continue business as usual?”

Come on – an ERAS Score would still be better than the current mess. Right?

Maybe.

I’d support almost anything as an alternative to using Step 1 scores in isolation for screening. Much of the material on Step 1 is irrelevant to to the actual practice of real medicine, and program directors’ reliance on scores leads to students over-focusing on minutiae at the expense of the rest of their medical education. So the idea of minimizing the importance of Step 1 definitely has appeal.

But as with most things, the devil’s in the details.

If Step 1 scores are 25% of the ERAS Score, what would make up the remaining 75%? More importantly, how would those things be quantified?

For instance, some selective academic programs will be interested in research output as a measure of scholarly potential. So what are we going to measure? Number of publications or posters?

I’m more impressed by a candidate who did a single meaningful project than by a candidate who managed to get her name on a dozen papers and posters that were largely driven by someone else. A human being who actually reads the application can get a pretty good sense of which is which. A software algorithm, not so much.

You can go through the same exercise with any other activity that might be used to calculate a single numeric score, and you’ll find similar problems. Numeric measures are useful – but they are also highly susceptible to gaming. And outsourcing human decision making to the ERAS Score seems like a good way to favor quantity over quality; ignite multiple new arms races for students; and descend even deeper into the McNamara Fallacy that already plagues us.

Dude. You’ve gotten really cynical.

I don’t mean to be.

But I know we can do better.

As much as I’ve already talked about this, I’m still shocked when I reflect on the current state of affairs. How do we have a system that is so costly, misguided, and inefficient?

It’s our own fault.

We’ve permitted a system of medical education that prioritizes memorization of G-protein subunits over the acquisition of skills that matter to patients.

We tolerate medical schools charging $50,000+ for pre-clinical tuition even if they’re providing little more to their students than a physical venue to study First Aid and do UWorld questions.

We allow program directors to receive 100+ applications for every position in their program, encouraging them to not to look at candidates as individuals, but as a numeric scores – just so they can survive application season and revenue from Application Fever isn’t disturbed.

We even assign interviews for competitive residency positions based in part upon which student can find a Wi-Fi hotspot the fastest. (Seriously. Is this residency selection or a Black Friday doorbuster at Walmart?)

The strange thing is, all of us want to do the right thing.

Students want to learn and to be prepared to take care of sick patients. Schools want to educate students and help maximize their potential. Program directors want to evaluate applicants as future colleagues and help turn them into real doctors. Even the corporate executives – who often get a disproportionate share of my wrath – have positive intent (along with financial COI).

But we all also have personal incentives to act against these pure motives, and the current system is the perverse outcome of each group acting only in their own self-interest.

Students want to get into the “most prestigious” residency program – even if it means they’ll be ill-prepared when they get there.

Schools want to see their students match, even if it means fudging a little on the Dean’s Letter and shying away from meaningful student assessment.

Program directors just want to survive the deluge of applications and come out with a decent class.

To make the system better, we all have to give up a little bit of those self-interested goals.

So how do you propose that we fix this?

First, we need to stop allowing our students to be plundered and our program directors pushed to the breaking point by Application Fever.

Toward that end, I say we need application caps.

But for caps to work, program directors have to provide standardized, high-quality information on their program’s competitiveness – even if it hurts to do so. It’s the only way candidates will know where they should apply.

Second, we need to refocus medical education on the skills that matter.

Medical schools need to teach the skills that will help their future patients – and commit to standardized, meaningful assessments of student skills. UME and GME need to learn to trust each other again.

And how are you going to accomplish that?

By making Step 1 pass/fail.

Huh? That doesn’t accomplish anything that you suggested.

Yeah, that’s a favorite argument of the InCUS authors, too.

They repeatedly note that Step 1 is not the cause of everything that’s wrong with medical education, and changing the score reporting policy doesn’t fix everything, either. And that argument, as far as it goes, is correct.

But it’s still not an excuse for inaction. USMLE Step 1 is the critical pillar for the half-rotten edifice that we’ve collectively constructed. Take it away, and the whole thing comes tumbling down – and we can build again.

But what about programs or specialties that still want a standardized test?

Then build a test that specifically measures or predicts something that truly matters to you in your program or specialty.

A good reason for using a test in residency selection is that the results are meaningful and improve the selection process. I have no objection to that.

A bad reason for using a test in residency selection is because you just need a number for screening purposes. If the problem is excess applications, advocate for application reform – not the use of convenience metrics.

And retrofitting a one-size-fits-all test of basic science to predict every competency in every specialty just because your program receives more applications than you want to review? Well, that’s taking a bad reason to use test scores and then Super-Sizing it.

Why don’t you make your changes first, then we’ll make Step 1 pass/fail? How about that?

Because as long as we have a Step 1 score (or an ERAS Score) to use a crutch, we’ll all just keep limping along instead of working on real solutions. As the report notes, “Nothing happens unless someone takes the first step.”

Only if we all commit to making a serious change (like pass/fail grading or application caps) can we compel all of the parties to look beyond their immediate self-interest and make a system that ultimately works better.

But why should the first step be to make Step 1 pass/fail?

You know, there were some things in the report that I really liked.

My favorite was on page 9. One InCUS invitee recommended that the USMLE decision makers should “Always do what is best for patients.”

Right now, we are not doing that. Medical students are not being taught the way they should be. About that, there can be no argument. We should focus our time and resources on endowing our future physicians with skills that will actually help their future patients – not burden them with an all-consuming science trivia competition just so we don’t have to confront the problem of application inflation..

And that’s really my biggest worry with InCUS. If we acquiesce to an ERAS Score as being an acceptable solution, we may have missed the best opportunity in a generation to truly reform medical education, and do what is best for the patients of the future.

Let’s not settle for a Band-Aid when we need a tourniquet. We can do better.