Targeted Superlearning🔗
In IPD meta-analysis, univariate moderator analyses are commonly used to identify patients with lower or higher treatment effects. However, their ability to do so may often be limited. The true mechanism that determines how much patients benefit from a psychological intervention is usually unknown; but it will likely depend on a complex interplay of multiple variables that is not easily captured by a simple treatment-covariate interaction. In this sense, even the approaches used in Articles 3 and 4 to model individualized treatment benefits are still fairly restricted: they assume that HTE can be sufficiently explained by one pre-defined parametric model, or a single data-adaptive algorithm.
To this end, in Article 6 (“Estimating Effect Variability Using Targeting Superlearning”), I explore HTE with models that can detect more complex treatment-covariate interactions, based on the targeted superlearning framework (Naimi & Balzer, 2018; Phillips et al., 2023; Polley et al., 2011). I employ this approach in a comprehensive IPD database of preventive intervention trials for subthreshold depression, and examine effects on depressive symptom severity up to 24 months. This section provides some further background and technical details on this methodology, as well as on ancillary software developed as part of the study. The same technical descriptions may also be found, along with other details, in the preregistration of the analysis (Harrer, Sprenger, et al., 2023).
Superlearning (also known as stacked regression; Breiman, 1996) is an ensemble machine learning method that combines multiple prediction algorithms based on their performance in a data set, which is determined using V-fold cross-validation. Prediction algorithms employed in a superlearner typically include both simple linear regression models as well as more “aggressive”, data-adaptive algorithms (e.g., boosting models, support vector machines, neural nets). By combining multiple learners, a superlearner has been shown to asymptotically perform as well as or better than the best-performing individual algorithm included in the framework (“oracle” property; Van der Laan et al., 2007). This method has also been found to be robust across various contexts, and even in very small samples (Montoya et al., 2022; Polley et al., 2011). A key motivation behind superlearning is that the true data-generating process that produced some observed data is often unknown but may be complex, where the framework ensures that the ensemble will converge with the best possible solution for the learning task at hand.
In a treatment setting, the superlearner framework can also be combined with targeted maximum likelihood estimation (TMLE) (Van Der Laan & Rubin, 2006) to estimate optimal dynamic treatment rules (ODTRs) (Murphy, 2003). This ODTR assigns each patient to the optimal (i.e., most effective) treatment based on their pre-test values. This is typically achieved by first estimating the “blip function” (Robins, 2004). This function uses a patient’s set of pre-test values as input and then returns an individualized treatment effect estimate (the “blip”), as derived by the superlearner. While often used as an intermediary step to derive the ODTR, blip distributions themselves can also be of substantive interest (Montoya et al., 2022). For instance, they allow to examine how much patient benefits may vary within a study, who benefits most, and which patients may experience negative effects (i.e., fare worse under treatment than under control).
For each study, targeted superlearning was therefore used to estimate the blip distribution across all patients in the trial. Since superlearners were constructed separately for each study, the maximum amount of baseline information included in the IPD database could be used. A broad selection of candidate algorithms was considered within the ensemble, ranging from linear models to more complex machine learning architectures (see Table 1 in Harrer, Sprenger, et al., 2023, for an overview). The distribution of blips, expressed as patient-specific standardized mean differences (SMD), was then presented visually to gauge how strongly individual effects may differ within and across studies. Based on the blips, I also calculated the meta-analytic proportion of patients who experience (i) negative effects, or (ii) only clinically negligible benefits of the preventive interventions (defined as individualized SMDs < 0.24; Cuijpers et al., 2014). Lastly, I constructed an interactive web app that uses the estimated blips to generate individualized meta-analytic effect estimates for specific populations or individuals (e.g., women under the age of 40 with PHQ-9 scores below 6, or any other combination of patient characteristics that were assessed across studies).
I now provide a more technical elaboration of the approach3With slight adaptations, I follow the notation used in Montoya et al. (2022, 2023).. Assume that the individual participant data
Where
Where
This definition shows that the blip as analogous to the conditional average treatment effect (CATE) (Vegetabile, 2021); the counterfactual treatment effect for some individuals with identical covariate values
For RCTs, this doubly robust estimator guarantees that

I also estimated the proportion of patients with negative and clinically negligible effects, defined as
where
where