TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Discovery
Introduced in TimeGraph: Synthetic Benchmark Datasets for Robust Time-Series Causal Discovery2025
TimeGraph is a comprehensive suite of synthetic datasets designed to benchmark causal discovery algorithms on time-series data. The dataset captures real-world complexities by incorporating temporal dynamics such as trends, seasonality, and nonstationarity, as well as sampling challenges including irregular time intervals and structured missingness. It features diverse noise types, including Gaussian, heavy-tailed, and heteroskedastic variations, and supports scenarios with latent confounding to enable evaluation under partially observed systems. The underlying causal structures span both linear and nonlinear relationships, including polynomial and trigonometric forms.
The motivation behind TimeGraph is to address the current lack of robust and realistic benchmarks in time-series causal discovery. Existing datasets often overlook the intricate challenges observed in real-world domains such as Earth system science, healthcare, and economics.
TimeGraph serves as a unified testbed for comparing linear and nonlinear causal discovery methods (e.g., PCMCI+, Granger causality, NOTEARS, etc), evaluating algorithmic robustness under conditions of missing data and irregular sampling, training and validating deep causal models and representation learning frameworks, and analyzing the sensitivity of methods to confounding and autocorrelation.