Step 2 CS, Part One: How did we get here?

Back in the fall, a medical student sent me a link to this post on Reddit.

You should read the whole thing.

In just five short paragraphs, this Redditor captures much of the prevailing student sentiment on the USMLE’s Step 2 Clinical Skills (CS) examination.

Ever since I started speaking up about problems with USMLE Step 1, students have asked me to comment on Step 2 CS. Well, you’d better be careful what you ask for – because you just might get it.

But be forewarned: I have a lot to say, so this post will take place over three parts.

In the first, we’ll explore the history of the USMLE Step 2 CS examination, with the goal of answering the question, How did we get here? In the second, we’ll review opposition to Step 2 CS, and in Part Three, I’ll argue my case for what should be done about Step 2 CS.

Origins

USMLE Step 2 CS began in 2004. So you may think that’s where our story begins. Think again.

To tell the story properly, we need to go back to the beginning – to the very first time that the NBME offered a clinical skills exam.

This takes us all the way back to…

1916 – The original NBME exam

The first NBME exam was administered in Washington, D.C. from October 16-21, 1916. Importantly, when I say “October 16-21,” I don’t mean that you took the exam on one of those days. I mean that the exam took six days.

See, the original NBME exam was comprehensive. It covered almost everything that a doctor in those days could know, with half-day sessions on anatomy, chemistry, physiology, pathology, bacteriology, obstetrics, surgery, dermatology, medical jurisprudence, pharmacology, and hygeine.

The length wasn’t the only thing different about the original NBME exam. There were also no multiple choice questions. Instead, candidates’ knowledge was assessed using written or oral examinations, or by the real-world application of skills with laboratory or practical examinations.

The format of the exam is a point that deserves some discussion.

Today, if I were to tell a group of medical students that I was going to give them a test, they’ll likely automatically envision a multiple choice question test. This is what they’ve been conditioned to expect after years of experience in the American educational system.

But intuitively, is a multiple choice question test how you’d think of assessing a professional engaged in highly complex tasks and decision-making? Probably not. It seems natural that if you’re interested in evaluating doctors, you should probably evaluate them doing doctor stuff.

And in fact, this is what the NBME chose to do for their original exam. Remember, they could have used any test format that they could dream up – but they chose to use an exam that, for many of its portions, approximated the actual practice of medicine.

Thus, for the medicine portion of the exam, each candidate was assigned three patients at Garfield Memorial Hospital. They had to “present a written clinical history, including the physical examination,” then “stand an oral examination on the clinical history and diagnostic conclusions.”

Garfield Memorial Hospital, Washington, DC. Photo from c. 1919.

Unlike the “patients” on Step 2 CS today, those at Garfield weren’t standardized. They were real patients, with conditions including typhoid, pulmonary tuberculosis, acute endocarditis, aortic aneurysm, aortic insufficiency, exophthalmic goiter, pernicious anemia, cirrhosis, tabes dorsalis, bronchial asthma, diabetes mellitus, and motor hemiplegia, just to name a few.

Problems with the original NBME exams

These days, the NBME is a powerful monopoly in the medical licensing marketplace. But this wasn’t always the case.

In the beginning, the NBME was a fledgling corporation actively competing for market share with the exams offered by state licensing boards. Not all states accepted NBME certification, but for the states that did, physicians seeking licensure could choose which test they wanted to take.

Unfortunately, though their exam format offered face validity, the NBME faced two major hurdles in increasing their market share.

Convenience – If you think it sucks to have to travel to one of the five Step 2 CS sites today, imagine how much fun it would be to travel to a single test site in an era before interstate highways or air travel. Only 10 candidates took the original NBME exam in 1916 – and it’s easy to understand why.

Cost – Running a comprehensive, multifaceted medical examination was a logistical challenge – and pricey. For the first few years of its existence, the cost of the NBME exam was subsidized by nonprofit foundations. This allowed the NBME to keep registration fees low: the original examinees were charged only a $5 registration fee. However, by 1924, that funding dried up, and examinees had to pay $80 – the equivalent of $1200 today.

The combination of high cost and inconvenience severely constrained growth of the NBME exam. There just weren’t that many candidates willing to voluntarily take on the burden and expense of taking it. In fact, it’s possible that the NBME’s business model might have failed altogether… if not for the appearance of its psychometric savior.

1950-1954 – Enter the MCQ

The death of the original NBME clinical skills exam was due to the birth of a new type of test: the multiple choice question (MCQ) test.

In 1914, Frederick J. Kelly proposed the concept of multiple choice testing in his doctoral dissertation at Kansas State Teachers’ College. The MCQ, he argued, would fix two problems in student assessment. First, it would eliminate subjective judgment in how teachers marked papers. Second, since grading would be standardized, teachers wouldn’t have to spend so much time grading assignments.

Over time, the MCQ movement took hold in American education. And by the 1930s – thanks to Columbia professor Benjamin D. Wood‘s collaboration with IBM – technology became available to rapidly machine-score MCQ punchcards.

In 1950, following a leadership change, the NBME began to consider changing its exams to an MCQ format. This generated an intense debate, and resulted in a three-year study of the NBME’s MCQs from 1951-1953.

The “pernicious influence” of our reliance on MCQ tests is seen in modern day Step 1 Mania.

Still, the benefits of MCQs were obvious.  Beyond the lure of objectivity, MCQ tests were portable and could be machine graded.  From a business standpoint, it was a no-brainer. And so, in 1955, the NBME began to use an MCQ format for its licensing exams.

From a business standpoint, the NBME’s decision to switch to an MCQ format in 1955 was a slam dunk.

In other words, the NBME abandoned its original clinical skills exams (and all of its other “subjective” exam formats) in favor of the business advantages provided by multiple choice question tests.

So why did the NBME go back to offering a clinical skills exam?

Origins

The modern USMLE Step 2 CS exam has its origins in a different test: the Clinical Skills Asssessment that was provided by the Educational Commission for Foreign Medical Graduates, or ECFMG.

The ECFMG is the organization tasked with certifying the readiness of IMGs for residency training in the United States. And in the late 1970s, they had a problem.

Many IMGs who had passed the ECFMG’s written examinations of medical knowledge and English proficiency nonetheless arrived unprepared to start internship. Program directors complained that some IMGs – despite their ability to correctly answer MCQs – barely knew how to interview a patient or conduct anything more than a rudimentary physical examination.

In 1981, the ECFMG sponsored an invitational conference where this topic was discussed, and plans were made to create a new test to evaluate IMGs’ clinical skills – the Clinical Skills Assessment, or CSA.

The ECFMG was acutely aware of the burdens placed on IMGs with the existing certification process. And so, when they designed the CSA, they imposed some boundary conditions. The exam should take only one-half day. The resources needed to administer the exam should be those “available in most academic medical centers” to prevent examinees from having to travel to far-flung testing centers. Oh, and the total cost per examinee was to be kept at $200 or less (around $478 in 2019 dollars)

The pilot study for the CSA was conducted in 1985. The results seemed to confirm program directors’ concerns, as IMGs scored significantly lower than U.S. medical graduates.

Distribution of performance in the original pilot study of the ECFMG’s CSA. The difference in scores was primarily related to IMGs’ weaker performance in performing a physical examination and recognizing lab abnormalities.

A follow up study in 1987 found similar results, and CSA was ultimately validated in a large study that was published in JAMA in 1993. And so it came to pass that, on July 1, 1998, the CSA went live as a requirement for IMGs.

By most accounts, the CSA was a success.

Program directors seemed to like it. (Though the effect size was small, one study found that IMG residents who had taken the CSA were less likely to arrive at residency with deficiencies in their history taking and physical examination skills.)

Over the first two years of its administration, 96.9% of IMGs passed the CSA. In no small part, this reflected self-selection among the IMGs who chose to take it. But it almost certainly signified better clinical skills preparation, too.

See, all of the IMGs who failed the CSA between 1998 and 2000 failed due to poor performance in the doctor-patient communication component of the test. In contrast to the pilot studies of the CSA, few IMGs failed the exam due to deficiencies in physical examination, data gathering, decision-making, or composing a written note. In other words, just the presence of the CSA was sufficient was sufficient to decrease the number of IMGs who arrived unprepared for residency training.

Of course, not everyone was happy with the CSA. As one member of the ECFMG leadership noted in 2000,

The most widely heard complaint made by graduates of foreign medical schools regarding the CSA is that it is unfair for U.S. medical graduates to be exempt from taking a similar national assessment examination.

-Gerald Whelan, MD

On fairness – and finances

This argument about fairness carried some weight.

Remember, many IMGs are U.S. citizens, entitled to the same rights and protections that all U.S. citizens enjoy. (As I’ve discussed previously, it was a similar argument – that the NBME’s decision to exclude IMGs from its exams was discriminatory – that led to the abolition of the old two-test licensing system and the birth of the USMLE.)

But the fairness argument had a ready counter.

For medical schools in the United States and Canada, accreditation by the Liaison Committee on Medical Education (LCME) provided a certain assurance of quality and a standardized educational experience. But international medical schools are not subject to the LCME’s review.

This, of course, is the very reason that ECFMG certification exists: to vouch for the educational outcomes of students whose education took place in institutions beyond the LCME’s purview. From this standpoint, asking the ECFMG to evaluate clinical skills was just as reasonable as asking the LCME to evaluate the quality of a medical school’s OSCE, and making USMGs take the CSA could be seen as duplicative.

Of course, fairness wasn’t the only thing on the minds of executives at the NBME. As usual, they were also concerned with their organization’s finances.

Here’s a quote from quote former NBME Chair Carol Aschenbrener in which she frankly discusses the organization’s strategic planning at the time all of this was going on (emphasis added).

[I]n 1999 senior management and the Executive Board recognized the need to take a long, deep and continuing look toward the future. Yes, the USMLE was, and continues to be, the “gold standard” for physician licensure, a product and service in which all involved can take immense pride. And yes, the NBME was continuing to expand its portfolio of client services. But, the bottom line depended excessively on the USMLE; the NBME, for practical purposes, is still dependent on a single product for its survival... Thus, the Executive Board began shifting attention and energy to incorporate strategic thinking as a central component of its work.

The thing is… when your organization depends excessively on the revenue from the USMLE, what do you do when that revenue drops?

And from that standpoint, the ECFMG’s introduction of the CSA was problematic.

In 1995, 33,597 IMGs took USMLE Step 1. In 1999 – after the CSA became a requirement – the number of IMG Step 1 test takers plummeted to just 9,844.

This was a big deal. In the boom years before the CSA went into effect, there were significantly more IMGs taking the USMLE than USMGs. Former NBME President Don Melnick commented upon the sudden drop in revenue in the organization’s 2001 Annual Report (highlighting added).

The “precipitous drop” in IMG test takers was a direct result of the ECFMG’s CSA.

To be fair, the NBME had at least given lip service to the idea of a clinical skills exam for decades. (The “need for and/or desirability of a test of clinical skills” was recommended in the proposal to merge the old NBME and FLEX exams into the USMLE.) But in these trying financial times, the Board’s plans seem to have accelerated… and evolved.

See, the original plan for Step 2 CS was to administer the test at medical schools. Thanks in part due previous emphasis from the LCME, many U.S. medical schools already had well-established standardized patient programs and permanent testing facilities, and offering the test locally would reduce scheduling difficulties and traveling expenses for students.

An excerpt from the 1998 Federation Bulletin describes the original plan for USMLE Step 2 CS. (Red box added.)

But after the September 2001 meeting of the NBME Executive Board, plans abruptly changed. Instead of allowing the new clinical skills examination to be offered locally, the NBME determined that the test could only be taken in one five cities at test centers that would be operated jointly by the NBME and ECFMG. And oh yeah, the test would cost $1000 per student.

Despite an exhaustive search, little explanation for this decision seems to have been offered – just that their earlier implementation plan was found to be “not feasible.”

Hmmm.

In summary

Can you tell the story of USMLE Step 2 CS in a way that makes the NBME out to be heroes, standing up for patient safety and equity in licensure? Sure. (They often do.)

But doing so ignores the other motives that were clearly lurking beneath the surface. And as I reviewed the history myself, this much was apparent:

Evaluating the clinical skills of physicians is a pretty logical thing to do. The NBME did it – until it suited their business interests to stop doing so. And when it again suited their finances to provide a clinical skills exam, they did – making sure to offer it in such a way that it provided maximum financial benefit to their organization.

Ignoring this part of the Step 2 CS story seems wrong. You can feel it in the raw emotion expressed by the anonymous Redditor above. But if we are going to make Step 2 CS better, we need to honestly acknowledge where we’ve been – and how we got here.