LSCD · Lomb–Scargle Conditioned Diffusion

Abstract

Time series imputation with missing or irregularly sampled data is a persistent challenge in machine learning. Most frequency-domain methods rely on the Fast Fourier Transform (FFT), which assumes uniform sampling, therefore requiring interpolation or imputation prior to frequency estimation. We propose a novel diffusion-based imputation approach (LSCD) that leverages Lomb–Scargle periodograms to robustly handle missing and irregular samples without requiring interpolation or imputation in the frequency domain. Our method trains a score-based diffusion model conditioned on the entire signal spectrum, enabling direct usage of irregularly spaced observations. Experiments on synthetic and real-world benchmarks demonstrate that our method recovers missing data more accurately than purely time-domain baselines, while simultaneously producing consistent frequency estimates. Crucially, our framework paves the way for broader adoption of Lomb–Scargle methods in machine learning tasks involving irregular data.

Lomb-Scargle

FFT vs Lomb-Scargle: FFT assumes uniform sampling, requiring interpolation when missing or irregularly sampled data is present. Lomb-Scargle operates on irregular data directly, avoiding interpolation and providing a more accurate frequency estimate.

Model Architecture

**Figure 2:** Diagram of our Lomb–Scargle Conditioned Diffusion (LSCD) approach for time series imputation.

Synthetic-Sines Benchmark

Miss Type	Miss rate	Metric	Method
Miss Type	Miss rate	Metric	Mean	Lerp	BRITS	GP-VAE	US-GAN	TimesNet	CSDI	SAITS	ModernTCN	LSCD
MCAR	10 %	MAE	1.380	1.305	0.943	1.399	0.933	1.220	1.336	0.885	0.973	0.765
		RMSE	1.947	1.991	1.657	1.986	1.636	1.803	1.889	1.569	1.727	1.453
		S-MAE	0.081	0.081	0.052	0.082	0.053	0.069	0.008	0.043	0.049	0.003
	50 %	MAE	1.373	1.449	1.095	1.383	1.152	1.481	1.359	1.041	1.129	0.975
		RMSE	1.930	2.070	1.759	1.950	1.845	2.017	1.922	1.699	1.817	1.658
		S-MAE	0.264	0.324	0.170	0.266	0.191	0.239	0.027	0.159	0.173	0.014
	90 %	MAE	1.375	1.586	1.320	1.377	1.369	1.579	1.361	1.292	1.360	1.271
		RMSE	1.935	2.142	1.899	1.938	1.970	2.143	1.925	1.878	1.963	1.870
		S-MAE	0.439	0.572	0.383	0.439	0.407	0.462	0.044	0.375	0.406	0.036
Sequence	10 %	MAE	1.353	1.542	1.330	1.355	1.384	1.391	1.413	1.323	1.329	1.359
		RMSE	1.905	2.092	1.915	1.908	1.995	1.959	1.988	1.890	1.931	1.962
		S-MAE	0.055	0.075	0.056	0.054	0.061	0.062	0.006	0.055	0.056	0.005
	50 %	MAE	1.374	1.564	1.347	1.376	1.393	1.467	1.378	1.342	1.354	1.316
		RMSE	1.934	2.115	1.928	1.936	1.999	2.038	1.943	1.917	1.960	1.913
		S-MAE	0.271	0.369	0.269	0.271	0.297	0.321	0.028	0.268	0.277	0.026
	90 %	MAE	1.386	1.573	1.362	1.388	1.403	1.489	1.372	1.352	1.375	1.313
		RMSE	1.946	2.127	1.941	1.949	2.007	2.062	1.943	1.929	1.982	1.913
		S-MAE	0.288	0.389	0.286	0.288	0.305	0.338	0.029	0.283	0.292	0.027
Block	10 %	MAE	1.306	1.507	1.255	1.309	1.334	1.379	1.304	1.268	1.275	1.259
		RMSE	1.807	2.014	1.786	1.811	1.885	1.898	1.804	1.785	1.825	1.774
		S-MAE	0.105	0.146	0.100	0.104	0.116	0.124	0.011	0.103	0.106	0.010
	50 %	MAE	1.306	1.505	1.279	1.308	1.333	1.451	1.314	1.285	1.309	1.269
		RMSE	1.815	2.014	1.806	1.817	1.881	1.978	1.835	1.804	1.852	1.810
		S-MAE	0.287	0.383	0.278	0.286	0.306	0.344	0.029	0.285	0.296	0.027
	90 %	MAE	1.339	1.523	1.319	1.340	1.359	1.506	1.329	1.320	1.356	1.320
		RMSE	1.868	2.052	1.862	1.869	1.927	2.054	1.874	1.859	1.916	1.870
		S-MAE	0.359	0.473	0.351	0.358	0.376	0.439	0.036	0.358	0.374	0.035

Table 1: Imputation performance on the Synthetic-Sines benchmark with three missing-data patterns: MCAR (pointwise random), Sequence gaps, and Block outages, each tested at 10 %, 50 % and 90 % missing rates. For every setting we report MAE↓ and RMSE↓ in the time domain, and S-MAE↓ on the spectral domain.

Real Imputation Benchmarks

Dataset	Miss rate	Metric	Method
Dataset	Miss rate	Metric	Mean	Lerp	BRITS	GP-VAE	US-GAN	TimesNet	CSDI	SAITS	ModernTCN	LSCD
PhysioNet	10 %	MAE	0.714	0.372	0.278	0.469	0.323	0.375	0.219	0.232	0.351	0.211
		RMSE	1.035	0.708	0.693	0.783	0.662	0.690	0.545	0.583	0.697	0.494
		S-MAE	0.032	0.020	0.016	0.026	0.020	0.022	0.013	0.014	0.020	0.012
	50 %	MAE	0.711	0.417	0.385	0.521	0.449	0.453	0.307	0.315	0.440	0.303
		RMSE	1.091	0.840	0.833	0.907	0.852	0.840	0.672	0.735	0.803	0.664
		S-MAE	0.111	0.087	0.064	0.083	0.076	0.076	0.052	0.055	0.071	0.052
	90 %	MAE	0.710	0.565	0.560	0.642	0.670	0.642	0.481	0.565	0.647	0.479
		RMSE	1.097	0.993	0.975	1.038	1.060	1.031	0.834	0.971	1.026	0.832
		S-MAE	0.148	0.189	0.104	0.124	0.125	0.131	0.093	0.108	0.137	0.093
PM 2.5	10 %	MAE	50.685	15.363	16.519	23.941	32.999	22.685	9.670	15.424	24.089	9.069
		RMSE	66.558	27.658	26.775	40.586	48.951	39.336	19.093	30.558	40.052	17.914
		S-MAE	0.135	0.039	0.039	0.060	0.080	0.056	0.023	0.034	0.059	0.022

Table 2. Time- and frequency-domain imputation errors on two real-world datasets. PhysioNet is evaluated at 10%, 50% and 90% missingness rates, while PM 2.5 is evaluated at 10%. Metrics are MAE↓, RMSE↓ and S-MAE↓.

Lomb-Scargle Spectrum: Quick Start

Installation:

pip install git+https://github.com/asztr/LombScargle.git

Usage Example:

import torch
import math
import LombScargle

# Define example time series with single frequency = 5
t = torch.linspace(0, 10.0, 200) #timestamps
y = torch.sin(2*math.pi*5.0*t) #values

# Select frequencies to evaluate
freqs = torch.linspace(1e-5, 10.0, 100)

# Compute the normalized spectrum
ls = LombScargle.LombScargle(freqs)
P = ls(t, y, fap=True, norm=True)  # [1, 100] array of power values

BibTeX

@inproceedings{lscd2025,
  title     = {LSCD: Lomb–Scargle Conditioned Diffusion for Time-Series Imputation},
  author    = {Elizabeth Fons and Alejandro Sztrajman and Yousef El-Laham and Luciana Ferrer and
               Svitlana Vyetrenko and Manuela Veloso},
  booktitle = {Proc. 42nd International Conference on Machine Learning},
  year      = {2025}
}

LSCD · Lomb–Scargle Conditioned Diffusion for Time Series Imputation