Journal Club: COMLEX-USA and Physician Discipline

Ask a medical student what their LEAST favorite licensing examination was. (Go ahead, I’ll wait.)

Did they say USMLE Step 2 CS? I thought so! There’s just something about requiring a student to pay $1300 to travel across the country, eat some USDA Grade D lunchmeat sandwiches, and play doctor that really leaves a sour taste in the mouth.

Of course, if you asked an osteopathic medical student, you probably got a different answer: the COMLEX-USA Level 2-PE.

The Level 2-PE is Step 2 CS’s osteopathic doppelgänger, and I’ve been hearing a lot about it lately. This is mainly because, unlike the NBME (which sensibly cancelled the USMLE Step 2 CS exam for 12-18 months due to the COVID-19 pandemic), the National Board of Osteopathic Medical Examiners (NBOME) has stubbornly insisted that students still need to pass COMLEX-USA Level 2-PE for licensure.

In defending this decision, there’s been a lot of high-minded talk about the benefits that the Level 2-PE examination provides for patients and society. And recently, the NBOME’s position has been buttressed by an article that came out in Academic Medicine that purports to show that osteopathic physicians who scored higher on COMLEX-USA are less likely to receive disciplinary action from state medical boards.

In fact, the NBOME’s Twitter account has been downright sassy in touting the value of COMLEX based on this study.

Whoa – when did the NBOME hire the Wendy’s social media team?

So let’s take a closer look at this paper. Does it really show that patients benefit from the COMLEX-USA Level 2-PE?

It’s time for a little Sheriff of Sodium Journal Club.

The Article

Roberts WL, et al. An investigation of the relationship between COMLEX-USA licensure examination performance and state licensing board disciplinary actions. Acad Med 2020; 95: 925-30.

Source: PubMed

The idea here is simple.

You take all the osteopathic physicians who graduated between 2004 and 2013, then look to see if there’s any association between their COMLEX-USA scores and state board disciplinary action (i.e., license suspension, practice limitation, public reprimand, etc.). Easy enough, right?

It’s a reasonable premise. The outcome of interest – physician disciplinary action – is certainly one that matters to patients. We may argue about what makes a good doctor, but there’s little debate that doctors whose licenses are revoked are bad ones.

And indeed, the authors did find an association between higher performance on certain portions of the COMLEX-USA – namely, Level 3 and the Level 2-PE Biomedical/Biomechanical Domain (BD) subscore – and decreased odds of being disciplined by the state board.

So what’s the problem?

Unfortunately, there are several.

Problem #1: State boards don’t use COMLEX-USA scores.

To become licensed as an osteopathic physician, you must pass each exam in the COMLEX series (unless you want to practice in a state that allows osteopathic physicians to become licensed by taking the USMLE).

But the key word in the above sentence is “pass.” If you pass the COMLEX, you can be licensed, whether you pass it with a 400 or an 800. Residency program directors might be concerned with COMLEX scores, but state boards are not.

(And this Level 2-PE BD subscore analyzed in the paper? It’s completely hidden from state boards. Remember, the results of the Level 2-PE exam are only reported as pass/fail.)

The point is, the NBOME’s claim that “COMLEX-USA delivers useful information” doesn’t hold water if the boards don’t actually use the information it delivers when making their decisions.

Problem #2: Only adjusted odds ratios are presented.

Even if state boards did use COMLEX-USA scores in making licensure decisions, there’s another problem. The authors only present the association between COMLEX-USA scores and disciplinary action after multivariable adjustment (for gender and length of time in practice).

The people on state boards are smart – but I doubt that many of them are capable of performing multivariable logistic regression in their head.

Unfortunately, whether there is a relationship between COMLEX scores and disciplinary action without statistical adjustment is unknown – to readers of their paper, at least.

Now, it’s not that adjusting for gender and length of time in practice is unreasonable. In matter of fact, men are twice as likely as women to receive state board disciplinary action, and in a mathematical sense, accounting for that is important.

Similarly, to justify including length of time in practice in the model, the authors cite a paper that shows that physician age was positively correlated to the likelihood of receiving disciplinary action. (Interestingly, however, the NBOME authors found the opposite relationship in their model, with lower odds of disciplinary action for greater time in practice. Hmmm…)

So why wouldn’t the authors provide the unadjusted odds ratios?

I don’t know. (I actually e-mailed the paper’s corresponding author two weeks ago with this question, among others… and got no response.)

But it stands to reason that if the unadjusted ORs helped tell the story they are telling in the paper, the authors would have included them. And if presenting the unadjusted ORs undermined the story they wanted to tell, they might have chosen to present only the results from the multivariable modeling.

My guess is that there is no relationship between COMLEX-USA scores and disciplinary action without multivariable adjustment. Or possibly, there’s a relationship that the authors don’t want to draw attention to.

If you look at Table 4, you may notice that the ORs for COMLEX-USA Level 1 scores and disciplinary action are all positive. That is, those with higher Level 1 scores have increased odds for receiving discipline by the board. The 95% confidence intervals cross 1… but you do wonder if there is a data signal there.

Again, these are the ORs after multivariable adjustment. Is that OR of 1.43 (95% CI 0.95-2.15) for COMLEX-USA Level 1 and “other [disciplinary] action” stronger without adjustment? Does the p drop from 0.09 to <0.05, and demonstrate a significant relationship between higher Level 1 scores and more frequent disciplinary action? Who knows.

(Well, actually, the authors know, but they ain’t tellin’.)

Problem #3: The effect size is small.

This study included 26,196 osteopathic physicians. Only 187 of them received state board disciplinary action. That’s 0.7%.

How much does a high COMLEX-USA score shift that probability? Here again, the authors don’t provide much data to help us with the real-world interpretation of their findings. So we’ll have to estimate a little bit.

First, let’s pretend that the adjusted ORs that they provide are actually unadjusted ORs. Because if we do, we can approximate the absolute risk reduction. (As I suggested above, the adjusted ORs they present probably represent a ‘best case’ estimate of the value of COMLEX scores in predicting disciplinary action, so this is probably a charitable assumption.)

Because the event rate is <10%, the OR approximates the risk ratio (RR). And since we know the overall risk for disciplinary action, we can use the RR to estimate the risk for disciplinary action at various levels of COMLEX-USA performance.

In this paper, the authors standardized COMLEX scores – so each 1 unit change corresopnds to change of one standard deviation in COMLEX score.

With that background, let’s take a look at the OR for license revocation and COMLEX-USA Level 2-PE BD scores, so we can get a sense of the real-life effect size.

The authors report an adjusted OR of 0.75 (95% CI 0.58-0.98; p=0.03). As noted above, we’ll assume that approximates the RR.

The authors also tell us that 66 physicians had their licenses revoked, so the overall risk for license revocation for a DO with an average COMLEX-USA Level 2-PE BD score would be around 0.25%.

If we go up by one standard deviation in COMLEX score, the risk for license revocation changes by a factor of 0.75. In other words, the absolute risk decreases from 0.25% to 0.19%.

Go up another standard deviation, and we get to an absolute risk of just 0.14% for a DO who scored two SD above the mean.

We can do the same thing for low-scorers. If we do, we’ll find that even those who barely passed Level 2-PE with a BD subscore that was two SD below the mean still only have an absolute risk for license revocation of just 0.45%.

In other words, the absolute risk difference for physicians whose scores differed by four standard deviations is still only 0.26%.

We can further define the magnitude of this difference using the number needed to treat. But if we do, we’ll sadly learn that we’d have to exclude from practice 385 physicians whose Level 2 PE-BD performance was at the 5th percentile in order to have one fewer license revocation than if we’d only allowed into practice physicians whose performance was at the 95th percentile.

If COMLEX-USA is capable of identifying physicians at risk for disciplinary action, it sure seems like an inefficient means of doing so.

Problem #4: Lack of content validity.

Content (or logical) validity describes the extent to which a test accurately captures all of the elements of the construct that it is intended to assess.

In other words, if we’re going to use COMLEX-USA scores to predict physician discipline, we might want to know a little bit more about what particular issues lead to physician discipline, so we can decide if those things are adequately evaluated by the COMLEX-USA.

This 2004 paper from Archives of Internal Medicine catalogued the specific reasons that 890 physicians received discipline over for physician discipline over a 13 year period in California.

Here, in descending order, are the most common reasons physicians received board action;

  1. Negligence
  2. Unprofessional conduct
  3. Self use of drugs/alcohol
  4. Conviction of a crime
  5. Inappropriate prescribing
  6. Sexual misconduct
  7. Fraud
  8. Mental illness

Now, which of these things do we think is accurately measured by the Level 2-PE BD subscore?

Description of the Level 2-PE Biomedical/Biomechanical (BD) domain, from the NBOME website.

Problem #5: Lack of association for most of COMLEX-USA.

The authors focus on Level 3 scores and Level 2-PE BD subscores, because those were the portions of the COMLEX-USA that were associated with disciplinary action. But scores for Level 1, Level 2-CE, and the Level 2-PE Humanistic Domain (HD) subscore were not associated with disciplinary action. (The latter is particularly bothersome to me, given the issues with content validity noted in Problem #4 above. If Level 2-PE HD scores are meaningful, we might expect them to have the best predictive ability for disciplinary action.)

The authors weren’t bothered by this.

Even though COMLEX-USA Level 1, Level 2-CE, and Level 2-PE HD scores did not show a significant relationship with disciplinary actions, it would be an overstatement to claim no relationship exists between these measures and the criterion when the null hypothesis cannot be rejected.

Honestly, I’m surprised the reviewers and editor let that statement stand. In pure mathematical sense, it’s true – but it still seems like an overreach.

If you can’t find a relationship between COMLEX scores and board discipline in a study that included >26,000 osteopathic physicians – every single DO who took the COMLEX-USA over an entire decade – then the onus should be on you to explain why you think a relationship still might exist, and what kind of study it would take to prove it.

Problem #6: Financial conflict of interest.

There’s one other problem with this paper.

Four of the paper’s authors are employed by the NBOME. And yet, despite this, the authors listed no conflict of interest.

A conflict of interests exists when a professional judgment about a primary interest – (such as reporting scientific research) may be influenced by a secondary interest (such as financial gain).

Does the NBOME stand to gain financially from the way these data are presented and interpreted? I certainly think so.

Remember, as I mentioned above, many states allow DOs to become licensed by taking the USMLE. Because of this, there are many who believe the COMLEX-USA is superfluous. (Full disclosure: I’m one of them.) The NBOME can justify its existence and protect its business interests by demonstrating the value of the COMLEX-USA to medical boards – and papers like this are a big help.

Conflict of interest is a condition – not a behavior. If conditions exist in which a secondary interest is present, those conditions should be disclosed.

This is the standard we’d demand for paper describing the use of a new drug, written by authors employed by a pharmaceutical company. There is no reason to expect less from the NBOME.

And obviously, just because the authors might gain financially from the results of the paper doesn’t mean that the data presented are fraudulent.

But it does mean that those of use reading the paper need to be a little bit careful. It means the validity and applicability and effect size and analytic decisions all need to be critically evaluated, to be sure that bias hasn’t crept in and misled us. It means that we can’t stop at a snappy Tweet or even a catchy abstract – we need to dig into the paper and put the findings into a real world context.

And that’s exactly what I hope readers of this paper will do.


Step 2 CS, Part One: How Did We Get Here?

Step 2 CS, Part Three: Winners and Losers Edition

The USMLE for DOs: How to Stop Fleecing Osteopathic Medical Students

Checking the References: The Evidence in Support of Step 1