Estimating heterogeneous treatment effects (HTEs) from survival data is paramount for precision medicine and individualized policy. However, the inherent complexities of survival analysis—censoring, unobserved counterfactuals, and intricate identification assumptions—have led to inconsistent and fragmented evaluation practices for existing HTE estimation methods. This paper introduces SurvHTE-Bench, the first comprehensive benchmark designed to address this critical gap.
Bridging the Evaluation Chasm in Survival HTE
The introduction of SurvHTE-Bench marks a significant step towards standardizing the evaluation of methods for estimating heterogeneous treatment effects in the presence of censored survival data. Prior to this work, the landscape of survival HTE estimation was characterized by a lack of unified assessment protocols, hindering direct comparisons and progress. This new benchmark aims to rectify this by providing a standardized framework that spans synthetic, semi-synthetic, and real-world datasets, enabling a more rigorous and reproducible comparison of current and future survival HTE methods under diverse conditions and realistic assumption violations.
A Multi-faceted Benchmark for Robust Causal Inference
SurvHTE-Bench is architected as a modular suite, encompassing synthetic datasets engineered to systematically vary causal assumptions and survival dynamics, offering known ground truth. It also includes semi-synthetic datasets that combine real-world covariates with simulated treatments and outcomes, alongside real-world datasets derived from a twin study with known ground truth and an HIV clinical trial. This multi-faceted approach allows for the first rigorous comparison of established methods, such as Causal Survival Forests and survival meta-learners, across a spectrum of challenging scenarios. The SurvHTE-Bench benchmark is poised to become an essential tool for researchers and developers in the causal survival analysis domain.