Last week, ProPublica released a study of medical outcomes, focused on the number of negative outcomes caused by each surgeon in its study population. While the report was hailed by popular media as a major step forward in health care quality transparency, it was also met with consternation by many in the medical community. We asked the Grand Rounds data science team to share its five most significant comments on the study. Here’s what they had to say:
1. Important Large-Scale Data Release: The ProPublica study is a landmark development in public disclosure of physician-level outcomes. While CMS (the Centers for Medicare and Medicaid Studies) and other entities have provided hospital-level quality data, our research consistently shows that understanding individual quality is critical due to wide quality variations within practices and institutions. For example, it’s not uncommon for our analyses to suggest that selecting a top doctor at a mid-tier hospital will lead to better outcomes than picking a middling physician at a top-ranked hospital. Rating over 16,000 surgeons, ProPublica’s analysis is the largest single public study of risk-adjusted physician-level outcomes to date, and it will help patients understand the importance of individual performance in addition to institution performance. By comparison, the New York State Cardiac Surgery Reporting System (CSRS), generally lauded as the gold standard for public reporting on individual physicians, currently reports on fewer than 200 surgeons.
2. Pioneering Use of Claims Data: The fact that the ProPublica analysis is based on claims data represents a crucial deviation from past efforts, such as the NY CSRS, which have typically relied on medical records or self-reported observations. Claims data has always been a promising source of health care insights, but the medical community has been skeptical of claims-based approaches to inferring physician quality due to the challenge of adjusting for patient risk. Claims data, in contrast to medical records, generally provide far less context around an individual patient, making it more difficult to assess whether a provider’s high rate of complications may simply be a product of the type of patients she is treating. In other words, a surgeon may have high complications precisely because he is an excellent physician who sees only the most difficult, high-risk patients. ProPublica’s approach attempts to mitigate this problem by focusing on procedures where the associated patient populations are generally healthy. This is a big assumption. While it is encouraging to see claims data being used to evaluate provider quality, and in a public forum, the methodology applied will need to stand the test of time and endure close scrutiny.
3. Designed to Help Consumers: ProPublica’s approach stands out in being expressly structured to help consumers make decisions. Too often, academic studies and bulk data collections lack actionable impact. For example, numerous resource websites display raw procedural volumes based on Medicare billing data without providing context to help patients understand the data. Patients are often left wondering how to interpret procedural volume (is 20 cases too few? too many? just right?) and mix (how often does the surgeon perform spinal fusion surgery vs. standard spine surgery?). ProPublica could have simply displayed raw complication rates and left patients to interpret the data. Instead, they adjusted both for patient risk and for the fact that some variation in observed complication rates is likely due to random chance, particularly among low-volume surgeons. Physicians who have performed few procedures are statistically more likely to have very high or low observed complication rates, regardless of skill level (think of a baseball player with one at-bat: his batting average will either be 1.000 or 0.000). ProPublica adjusted low-volume outliers to trend closer to the average rate of complications across surgeons. A move viewed with some controversy by the physician community, the adjustment reduces the chance of consumer overreaction to a particularly high or low complication rate most likely due to random chance. Will the adjustments be right for every physician? Surely not, but the logic behind the approach is well-founded.
4. But Still a Limited Scope: While this is a significant step forward, it should be noted that this data set only applies to a small fraction of patient needs. Among elective surgeries, many common procedures are covered. Yet in looking at all of the reasons patients seek care, these procedures account for less than 0.5% of physician visits annually. For the other 99.5% of visits, patients still have virtually zero public access to this type of information. The ProPublica scorecard is a big step forward, but we still have a long way to go.
5. And Does Not Address the Difference Between Procedural Skill and Clinical Judgment: Being a physician requires many different skills that are often cognitively distant: deductive reasoning, empathy, communication, manual dexterity, memorization of a vast number of conditions, etc. Just as a physician’s performance may vary across therapies, we consistently observe that the physicians who are best at making diagnoses aren’t always the best at recommending or performing treatments. What’s more, we see that skill at performing a surgery does not necessarily correlate with clinical judgment in determining whether surgery is appropriate in the first place. Most available outcomes metrics, including ProPublica’s, are squarely geared toward procedural skill. Just as standardized testing efforts in education have gravitated toward measuring “hard” math and science skills rather than more difficult-to-measure skills like writing and critical reasoning, physician assessment efforts have generally shied away from evaluating clinical judgment. Yet evidence suggests that U.S. healthcare is plagued far more by poor judgment than by poor procedural skill. Consider knee replacement surgery. According to ProPublica’s analysis, the top quartile of surgeons had an average complication rate of 1.9% while the bottom quartile had a complication rate of 2.9%. Now compare that to the fact that studies show more than 50% of knee replacement surgeries may not even be clinically necessary. Measuring complication rates is important, but the most direct route to fewer complications is simply avoiding unwarranted surgeries in the first place.