If you walk into the office of any Silicon Valley tech company, you’ll inevitably hear the word “scale” thrown around in multiple contexts. Engineering teams want to build their systems for scale; marketing teams try to scale their company brand; inevitably a few employees are talking about the latest Yosemite wall they scaled over the weekend. One of the responsibilities of the Data Science team at Grand Rounds is determining the quality of a provider for a specific patient population’s health needs and preferences, and we’re continually looking for opportunities to scale these efforts.
With over 100 doctor specialties, thousands of medical procedures, tens of thousands of diagnosable conditions, and infinite degrees of severity, it’s not feasible to go through the checklist one-by-one to evaluate doctor quality across all possible metrics. Instead, we prioritize physician models using an internal framework that takes into account condition prevalence and severity, model feasibility, and, ultimately, impact to our members. One route to achieving high impact is to build models that scale to cover multiple specialties, procedures, and/or conditions.
Recently, we used a common methodology to build a collection of models that has the potential to impact over half of the adults in the U.S. As one of our staff physicians, Dr. Anthony Holbert, discussed in a recent blog post, the majority of Americans suffer from at least one chronic condition (e.g., diabetes or asthma), and 1 in 4 U.S. adults has two or more chronic conditions.1 By definition, all chronic conditions share a common property: they can be managed and controlled, but not readily cured. It is largely the responsibility of a patient’s primary care physician (PCP) to coordinate his or her care, either through in-practice interventions or by referring them to specialists.
A measure of a PCP’s effectiveness in managing their patients’ chronic illnesses is their ability to minimize potentially avoidable complications (PACs). PACs are either (1) related to the patient’s chronic condition, or (2) resultant from patient safety shortfalls and health system failures. Some, but not all, PACs are preventable (hence the “potentially” qualifier), and the propensity for experiencing a PAC varies from patient-to-patient based on their medical history, condition severity, and demographic information. Because the propensity for patient panels to experience PACs varies substantially across physicians, we cannot directly compare rates of PACs across PCPs. PCPs whose patient panels have higher disease burdens will have higher PAC rates, regardless of that PCP’s ability to minimize PACs.
To control for varying patient panels across PCPs, we built a general risk modeling framework that predicts, based on their medical and demographic history, the propensity of a patient to experience a medical outcome, and we used that framework to calculate the probability that a patient with a specific chronic condition will experience a PAC. These probabilities were calculated for multiple chronic conditions, including (among others) diabetes, asthma, and hypertension. Using these probabilities, we can compute standardized metrics that evaluate a physician’s ability to minimize PACs for a given condition, regardless of the disease burden of that physician’s patient panel.
We implemented our risk modeling framework with scale in mind, as it is completely generalizable to other outcome models. Consider the timeline in the image below.
In general, there are two time windows associated with a medical outcome model. There is a risk assessment window, over which we determine a patient’s disease burden by looking at which conditions they have been diagnosed with during that window. Following the risk assessment window, there is an observation window, where we look for evidence of a particular outcome. The time scales associated with each of these windows can vary depending on the outcome we are observing. The chronic conditions models require a one-year observation window following the chronic condition diagnosis, and at least six months of risk assessment preceding that diagnosis. A model to predict whether a woman will have a C-section, on the other hand, would use a single day observation window (date of delivery), and the preceding nine months as the risk assessment window. In both cases, there is a clean separation between the risk assessment window and the observation window, which we have labeled as the “episode trigger.” For chronic conditions, the episode trigger is the chronic condition diagnosis; for C-sections, the episode trigger is the onset of labor.
Given this pattern, our general risk modeling framework requires three inputs per patient:
- episode trigger date;
- risk assessment window (the size of this window is common to all patients); and
- outcome indicator
Using the episode trigger date and risk assessment window, we can automatically query our claims database for all patient diagnoses that occurred before the episode trigger and within the risk assessment window. These diagnoses, along with some supplemental information (e.g., patient age, gender, expected outcome by zip code) are converted into features, and used as inputs to a machine learning model, with the target being the outcome indicator. The model learns how much each diagnosis predisposes a patient to experience the medical outcome. Finally, using this model, we can go back compute for each patient, their probability (at the start of the trigger episode) of experiencing the medical outcome.
The above risk assessment procedure is provider-agnostic. Indeed, this flavor of risk assessment is useful in contexts where it is required to control for patient factors when assessing the performance of providers. There is another use case for which we at Grand Rounds assess the probability that a member will experience a medical outcome: matching patients with providers. By augmenting the patient features in the risk assessment model with features specific to the provider that was administering care, we learn how the choice of provider influences the patient’s probability of experiencing the medical outcome. These models are then used when evaluating the quality of a match between a patient and provider. For each provider considered for the match, we substitute features specific to those providers into the risk assessment model to calculate a unique per patient per provider propensity score estimating the outcome probability if the patient were to select that provider. This flavor of member customization is implemented for each of our chronic condition models, allowing us to highly rank physicians that can minimize complications for a patient’s specific condition.
At Grand Rounds, data science efforts continue to focus on projects that scale clinical impact to our members. Our data science and clinical teams have worked closely to extend these chronic condition models to the specialists treating each condition, and our opioid prescribing model has similarly been built out to cover other addictive medications across multiple physician specialties. Each time we develop or extend a physician model, it integrates quickly and seamlessly into our Matching Algorithm via a recommendation serving platform developed by the infrastructure experts on the Data Science team. Together, these efforts are giving Grand Rounds a comprehensive view of both the strengths and weaknesses of each physician, and we are using these insights to deliver the highest clinical impact for our members.
- “About Chronic Diseases,” Centers for Disease Control and Prevention, https://www.cdc.gov/chronicdisease/about/index.htm, November 2018