Grand Rounds Blog

Healthcare in America has been transitioning from the traditional fee-for-service reimbursement model to a value-based care delivery model for over a decade. In this brave new world, providers (physicians, hospitals, and others involved in care delivery within the healthcare system) are evaluated and reimbursed based not solely on the volume of services rendered, but also on patient outcomes$^1$. One obvious and commonly cited problem with assessing provider performance on a given patient outcome metric, e.g., mortality within 30 days after acute myocardial infarction (heart attack), is that not all patient cases from which the metric is being aggregated are the same. For example, a rural male, >65 years of age, smoker with history of high blood pressure likely represents a higher risk for 30-day mortality from a heart attack than an urban female, 35 years old, non-smoker, with no history of high blood pressure. To put it more plainly, physicians commonly reject the system as being unfair – due to some patients being sicker than others. Risk is the likelihood any one patient faces of experiencing an outcome (e.g., mortality). Risk adjustment is the process whereby the outcomes of a provider are statistically adjusted to account for the expected varying characteristics of the patient population (or panel) contributing to risk for a given outcome$^2$. If effective, risk-adjustment appropriately allocates more credit to providers when they handle cases where the expected outcome is not favorable and vice versa. Effective risk-adjustment allows a physician whose patients really are sicker than average to be compared to a physician whose patients are healthier than average.

Risk-adjustment itself pre-dates the era of value-based care. When the Health Care Financing Administration (predecessor to CMS) and states such as New York began publishing provider outcomes from various registries in the late 80’s and 90’s, providers protested that these numbers obfuscated the challenging cases they encountered$^3$. Today, risk-adjustment is as important as ever. In this blog post, we will discuss the importance of risk-adjustment at Grand Rounds, different statistical approaches to risk-adjustment, and some of the challenges faced when using claims data.

At Grand Rounds, our uniquely comprehensive understanding of provider quality is one of the ways in which we help deliver better outcomes to members. Our many quality signals are utilized in our Matching Algorithm to surface the most relevant and highest quality providers to members. One of these signals regarding primary care physicians and specialists includes identifying the rate of potentially avoidable conditions of chronic disease (PACs) among a provider’s patient panel. Risk-adjustment is a key ingredient to appropriately assessing a physician on such a metric. Read this blog post by Senior Data Scientist Peyton Rose to understand the motivation for our PACs metric and how we implemented risk-adjustment in this use case.

## Statistical Approaches to Risk Adjustment

Given a sample of n patients for whom m risk variables have been measured, X, as well as outcomes Y (=1 if the outcome is observed, =0 otherwise), one can construct a risk model to estimate $\pi_i = Pr(Y_i = 1|X_i)$, where $Y_i$ is the outcome for ith patient and $X_i$ are the risk factors the same patient. Often, a linear (logistic) model is assumed:

$\pi_i = logit(\alpha + \beta X_i)$

where $logit(Z) = \frac{1}{1+\exp(-Z)}$, is the intercept, and represents m coefficients for the risk variables. The variables are chosen to represent endogenous risk factors, i.e., “fixed effects” such as demographics (age and sex) or clinical history. Each patient is seen by 1 of p providers for whom the outcome of interest, e.g. 30-day mortality after a heart attack, is being aggregated. The risk-adjustment process entails incorporating the risk model such as the one described above into the reporting of the outcome. Several approaches and variants are used. Perhaps the most common is the observed-to-expected (OE) ratio: $OE_p = \overline{Y}\frac{\sum_{i\in S_p}Y_i}{\sum_{i\in S_p}\pi_i}$, where $\overline{Y}$ is the background rate at which the outcome is observed,$S_p$ is the set of patients (panel) seen by provider $p ^4 ^5$

One issue with this approach is that the size of a provider’s panel, |$S_p$|, can be small, resulting in variation due to sampling rather than underlying differences in performance. To illustrate, consider the following example: 30-day mortality for Condition I occurs at rate 1%; provider A has seen 12 cases of Condition I with 2 mortalities (16.7%) and provider B has seen 20 cases with 1 mortality (5%). Assuming equal risk between providers, these measurements would imply provider A has a >300% relative mortality rate when compared to provider B. One approach to systematically account for this undesirable noise is “shrinkage”. A shrinkage estimator is an estimator of a sample mean that incorporates prior information. In risk-adjustment, this often takes the form of drawing (or shrinking) raw estimates for any provider’s measure towards the mean measure across providers. The Agency for Healthcare Research and Quality reports quality metrics for hospitals and uses a shrinkage estimator (written in notation consistent with that above): $OE_p^s = w_pOE_p + (1-w_p)\overline{Y}$, where $OE_p^s$ represents the shrinkage estimator for provider $p$ and $w_p$, 0<$w_p$<1, is a weight representing confidence in the risk-adjusted estimate $OE_p$$^6$. The lower the confidence i.e the lower the value of $w_p$, the greater the influence of $\overline{Y}$ on $OE_p^s$s and vice versa. The value of the weight for a provider is usually chosen based on the number of cases seen by that provider. In fact, in a hierarchical model of a provider’s metric $M_p$, where the metric is assumed to be drawn from a Gaussian $N(\theta_p, \sigma^2_p)$and the $\theta_p$’s are drawn from a Gaussian $N(\mu, \tau^2)$$^7$, then the posterior distribution of $\theta_p$ is $N(\hat{\theta_p}, \omega_p\sigma^2_p)$, where $\hat{\theta_p} = \omega_pM_p + (1-\omega_p)\mu$ and $\omega_p = \frac{\tau^2}{\tau^2 + \sigma^2_p}$. The posterior mean follows the same form as the one used by AHRQ. The factor $\tau^2$, quantifies the spread in the metric and ultimately the extent of shrinkage in the face of noisy estimates of providers’ performance. In a fully Bayesian analysis, $\tau^2$can be assumed to follow a prior distribution (e.g. $Inv – \chi^2$) and the posterior distribution estimated. Alternatively, in the Empirical Bayes framework, a single value of $\tau^2$can be estimated from the data itself.

In addition to modeling the fixed effects upon risk, the variation introduced by providers can be modeled jointly, more systematically accounting for shrinkage. In such models, the variation in outcomes introduced by providers is represented with “random” effects rather than fixed effects. Because outcomes are measured for the same provider multiple times, the independent errors assumption of a fixed effects model is violated. Instead, the errors are “clustered” for each provider. Random effects are related to hierarchical models, often discussed in the Bayesian statistics literature. When building a risk-adjustment model with random effects, fixed effects are included as well (the latter representing clinical risk associated with patient cases)–these models are referred to as mixed effects models. Using the inverse logit of the probability of an adverse outcome, the definition of a mixed-effects model can be given as follows:

$\pi_i = logit(\alpha + u_{0p} + \beta X_i)$,

where $u_{0p}$is estimated for each provider p and is assumed to follow the distribution $u_{0p} \sim N(0,\sigma^2_p)$.

Random effects can also be nested: these models are known as multilevel, e.g., physicians practice within hospitals, such that the variation can be shared when appropriate. When more data is available for a given provider (or hospital), the effect of pooling is diminished and vice versa. Pooling will increase the bias in outcome estimates but decrease the variance, which stabilizes estimates made for providers with few observations$^8$.

One of the most salient challenges in risk-adjustment is acquiring and structuring sufficient data to build a predictive risk model. Claims data has the advantage of being structured and readily generated as part of clinical practice but also has the disadvantages of being limited in conveying clinical nuance laid out in the coding system (e.g. ICD10 or HCPCS) as well as being biased towards clinical factors that are reimbursable. Because of the availability of claims data, datasets available for building risk-models can be large, increasing statistical power to discover relationships between clinical factors and risk. While physical measurements are not recorded in claims, studies of risk scores computed from chart-versus-claims data for the same patients show comparable prognostic ability of the two. By integrating multiple sources of information, we can build more powerful risk models.

Another challenge lies in varying sample sizes between providers. While shrinkage certainly attempts to address this, the problem of choosing the hyperparameters ($w_p$ or $\tau^2$above) is still open. Frequentist (Maximum Likelihood) and Empirical Bayes approaches attempt to learn this from the data but can end up shrinking too much, i.e., high performers look more average as do low performers. On the other hand, a fully Bayesian approach relies on a subjective prior. No panacea exists; each modeling decision must be considered within the context of the question being asked.

## Looking Forward

As we develop more quality signals, our risk-adjustment approaches will evolve and improve. In addition, as we deliver value to members, the Data Science team will demonstrate the causal impact of these signals used in our Matching Algorithm. To do this, we will use another statistical technique known as cohort matching, which, like risk-adjustment, tries to equalize the clinical risk factors when comparing populations of patients but is intended to assess the causal impact of a particular intervention rather than the clinical impact of a provider$^9$. Look out for an exposition in a subsequent blog post!

Risk adjustment is just one of the many challenges within Data Science for Healthcare that make this domain such an exciting one to work in. Given both the questions we ask and the nature of the data, we often can’t directly apply methods that work in other fields. But this is why we do what we do…we get to be creative to solve hard problems that stand to make a huge impact in patients’ lives.

Want to solve healthcare’s hardest problems with an award-winning team? Join us!

$^1$ Pham H. and Ginsburg P. “Payment and Delivery-System Reform — The Next Phase”, NEJM Catalyst, October 2018
$^2$ Shahian D. and Normand S. “Comparison of “Risk-Adjusted” Hospital Outcomes”, Health Services and Outcomes Research, November 2007
$^3$ Hannan E, et a. “The New York State Cardiac Registries: History, Contributions, Limitations, and Lessons for Future Efforts to Assess and Publicly Report Healthcare Outcomes”, Journal of American College of Cardiology, June 2012
$^4$ Bottle A. and Aylin P. “Risk-Adjustment Principles and Methods”. Statistical Methods for Healthcare Performance Monitoring 1st Edition, August 2016.
$^5$ Ryan A. et al. “What Is the Best Way to Estimate Hospital Quality Outcomes? A Simulation Approach”, Health Services Research, August 2012

$^6$Applying the AHRQ Quality Indicators to Hospital Data
$^7$ Jones H. and Spiegelhalter D. “The Identification of “Unusual” Health-Care ProvidersFrom a Hierarchical Model”, The American Statistician, August 2011
$^8$ Gelman A. et al. Bayesian Data Analysis, November 2013
$^9$ Shahian D. and Normand S. “Comparison of “Risk-Adjusted” Hospital Outcomes”, Health Services and Outcomes Research, November 2007

## Author

Other things you might be interested in.
Opinionated Orchestration with Airflow on Kubernetes
5 Ways to Make an Internal Hackathon Successful