LSCD · Lomb–Scargle Conditioned Diffusion for Time Series Imputation

International Conference on Machine Learning (ICML 2025)
1J.P. Morgan AI Research    2University of Cambridge    3UBA & CONICET
*equal contribution
Paper Code Colab

Abstract

Time series imputation with missing or irregularly sampled data is a persistent challenge in machine learning. Most frequency-domain methods rely on the Fast Fourier Transform (FFT), which assumes uniform sampling, therefore requiring interpolation or imputation prior to frequency estimation. We propose a novel diffusion-based imputation approach (LSCD) that leverages Lomb–Scargle periodograms to robustly handle missing and irregular samples without requiring interpolation or imputation in the frequency domain. Our method trains a score-based diffusion model conditioned on the entire signal spectrum, enabling direct usage of irregularly spaced observations. Experiments on synthetic and real-world benchmarks demonstrate that our method recovers missing data more accurately than purely time-domain baselines, while simultaneously producing consistent frequency estimates. Crucially, our framework paves the way for broader adoption of Lomb–Scargle methods in machine learning tasks involving irregular data.

Lomb-Scargle

FFT vs Lomb-Scargle: FFT assumes uniform sampling, requiring interpolation when missing or irregularly sampled data is present. Lomb-Scargle operates on irregular data directly, avoiding interpolation and providing a more accurate frequency estimate.
Figure 1: Density of leading frequencies for FFT and Lomb-Scargle (LS) on a synthetic sine dataset. (Left) Fully observed time series. (Right) Time series with 75% missing data. The interpolation required by FFT significantly distorts the spectral distribution, whereas LS better preserves the original frequency structure.

Model Architecture

Figure 2: Diagram of our Lomb–Scargle Conditioned Diffusion (LSCD) approach for time series imputation.

Synthetic-Sines Benchmark

Miss TypeMiss rateMetricMethod
MeanLerpBRITSGP-VAEUS-GAN TimesNetCSDISAITSModernTCNLSCD
MCAR10 %MAE1.3801.3050.9431.3990.9331.2201.3360.8850.9730.765
RMSE1.9471.9911.6571.9861.6361.8031.8891.5691.7271.453
S-MAE0.0810.0810.0520.0820.0530.0690.0080.0430.0490.003
50 %MAE1.3731.4491.0951.3831.1521.4811.3591.0411.1290.975
RMSE1.9302.0701.7591.9501.8452.0171.9221.6991.8171.658
S-MAE0.2640.3240.1700.2660.1910.2390.0270.1590.1730.014
90 %MAE1.3751.5861.3201.3771.3691.5791.3611.2921.3601.271
RMSE1.9352.1421.8991.9381.9702.1431.9251.8781.9631.870
S-MAE0.4390.5720.3830.4390.4070.4620.0440.3750.4060.036
Sequence10 %MAE1.3531.5421.3301.3551.3841.3911.4131.3231.3291.359
RMSE1.9052.0921.9151.9081.9951.9591.9881.8901.9311.962
S-MAE0.0550.0750.0560.0540.0610.0620.0060.0550.0560.005
50 %MAE1.3741.5641.3471.3761.3931.4671.3781.3421.3541.316
RMSE1.9342.1151.9281.9361.9992.0381.9431.9171.9601.913
S-MAE0.2710.3690.2690.2710.2970.3210.0280.2680.2770.026
90 %MAE1.3861.5731.3621.3881.4031.4891.3721.3521.3751.313
RMSE1.9462.1271.9411.9492.0072.0621.9431.9291.9821.913
S-MAE0.2880.3890.2860.2880.3050.3380.0290.2830.2920.027
Block10 %MAE1.3061.5071.2551.3091.3341.3791.3041.2681.2751.259
RMSE1.8072.0141.7861.8111.8851.8981.8041.7851.8251.774
S-MAE0.1050.1460.1000.1040.1160.1240.0110.1030.1060.010
50 %MAE1.3061.5051.2791.3081.3331.4511.3141.2851.3091.269
RMSE1.8152.0141.8061.8171.8811.9781.8351.8041.8521.810
S-MAE0.2870.3830.2780.2860.3060.3440.0290.2850.2960.027
90 %MAE1.3391.5231.3191.3401.3591.5061.3291.3201.3561.320
RMSE1.8682.0521.8621.8691.9272.0541.8741.8591.9161.870
S-MAE0.3590.4730.3510.3580.3760.4390.0360.3580.3740.035

Table 1: Imputation performance on the Synthetic-Sines benchmark with three missing-data patterns: MCAR (pointwise random), Sequence gaps, and Block outages, each tested at 10 %, 50 % and 90 % missing rates. ​ For every setting we report MAE↓ and RMSE↓ in the time domain, and S-MAE↓ on the spectral domain.

Real Imputation Benchmarks

DatasetMiss rateMetricMethod
MeanLerpBRITSGP-VAEUS-GANTimesNetCSDISAITSModernTCNLSCD
PhysioNet10 %MAE0.7140.3720.2780.4690.3230.3750.2190.2320.3510.211
RMSE1.0350.7080.6930.7830.6620.6900.5450.5830.6970.494
S-MAE0.0320.0200.0160.0260.0200.0220.0130.0140.0200.012
50 %MAE0.7110.4170.3850.5210.4490.4530.3070.3150.4400.303
RMSE1.0910.8400.8330.9070.8520.8400.6720.7350.8030.664
S-MAE0.1110.0870.0640.0830.0760.0760.0520.0550.0710.052
90 %MAE0.7100.5650.5600.6420.6700.6420.4810.5650.6470.479
RMSE1.0970.9930.9751.0381.0601.0310.8340.9711.0260.832
S-MAE0.1480.1890.1040.1240.1250.1310.0930.1080.1370.093
PM 2.510 %MAE50.68515.36316.51923.94132.99922.6859.67015.42424.0899.069
RMSE66.55827.65826.77540.58648.95139.33619.09330.55840.05217.914
S-MAE0.1350.0390.0390.0600.0800.0560.0230.0340.0590.022

Table 2. Time- and frequency-domain imputation errors on two real-world datasets. PhysioNet is evaluated at 10%, 50% and 90% missingness rates, while PM 2.5 is evaluated at 10%. Metrics are MAE↓, RMSE↓ and S-MAE↓.

Lomb-Scargle Spectrum: Quick Start

Installation:
pip install git+https://github.com/asztr/LombScargle.git

Usage Example:
import torch
import math
import LombScargle

# Define example time series with single frequency = 5
t = torch.linspace(0, 10.0, 200) #timestamps
y = torch.sin(2*math.pi*5.0*t) #values

# Select frequencies to evaluate
freqs = torch.linspace(1e-5, 10.0, 100)

# Compute the normalized spectrum
ls = LombScargle.LombScargle(freqs)
P = ls(t, y, fap=True, norm=True)  # [1, 100] array of power values

BibTeX

@inproceedings{lscd2025,
  title     = {LSCD: Lomb–Scargle Conditioned Diffusion for Time-Series Imputation},
  author    = {Elizabeth Fons and Alejandro Sztrajman and Yousef El-Laham and Luciana Ferrer and
               Svitlana Vyetrenko and Manuela Veloso},
  booktitle = {Proc. 42nd International Conference on Machine Learning},
  year      = {2025}
}