MIT ARCLab STORM-AI Competition

Atilla Saadat
May 30
3 min read

Updated: Sep 15

Real‑Time Thermospheric Density Forecasting with a BiGRU‑Attention Network

STORM‑AI Phase 2 Project Recap

Why I Tackled Thermospheric Density Forecasting:

Low‑Earth Orbit (LEO) is no longer roomy real estate: more than 5,500 operational spacecraft already circle Earth, and the mega‑constellations on the launch manifest will multiply that figure in the next few years . When the Sun hurls a geomagnetic storm our way, the thermosphere can swell by an order of magnitude, amplifying drag and scrambling conjunction assessments. Today’s options are unsatisfying:

First‑principles general‑circulation models (e.g., TIE‑GCM, GITM) track fine‑scale physics but demand hours on a super‑computer for a single 3‑day forecast .
Empirical climatologies (NRLMSISE‑00, JB2008) respond in milliseconds yet miss post‑storm cooling and routinely stray by > 40 % during disturbances .

I wanted a real‑time model that keeps the speed of climatologies but inherits the storm awareness of physics codes. The MIT ARCLab STORM‑AI Challenge supplied the perfect testbed: predict orbit‑averaged density at 10‑min cadence for the next 72 h, and be judged by an exponentially time‑weighted OD‑RMSE skill score. That precise metric, plus a hard runtime envelope, drove every design choice in my solution.

A 60‑Day History In, a 72‑Hour Forecast Out:

Data buffet (181 features × 1,440 time‑steps)

Each training sample is a 60‑day slice of hourly drivers:

168 dynamic OMNI2 channels – solar‑wind plasma, IMF components, geomagnetic indices, plus explicit lags at {1, 2, 3, 4, 6, 12} h and a 3 h rolling mean to expose multi‑scale variability .
Orbit context – six Keplerian elements and derived latitude/longitude/altitude.
Temporal encodings – sin/cos of longitude, day‑of‑year, and sidereal time.
Sequence‑wise normalizer ρ₀ removes altitude trends so the network learns relative fluctuations transferable across satellites .

A two‑stage scaler (quantile → standard) maps every feature to Ɲ(0, 1), taming proton‑flux outliers and ensuring numeric stability.

Stage	Layer	Output
Encoder	3 × BiGRU (h = 384/dir)	1440 × 768 hidden states
Attention	Additive attention	768‑D context vector, z
Head	LayerNorm → GELU → Linear	432‑step log‑density residual

The additive attention automatically spotlighted the handful of storm‑time hours that dominate drag errors, giving the model storm “situational awareness” without deep stacks .

Metric‑aligned training: Loss is a time‑weighted MSE whose exponential kernel is identical to the OD‑RMSE leaderboard weights, so 95 % of the gradient signal lives in the first ≈19 h of the horizon . AdamW, gradient clipping to 1.0, AMP, and early stopping land a converged model in ~4 h on a single RTX 3090 Ti; inference clocks in at a few milliseconds.

This end‑to‑end pipeline hits real‑time throughput, outperforms the transformer baseline, and ranks 4th on the Final leaderboard — all while running light enough for flight‑dynamics servers or even on‑board processors.

Results:

MIT STORM-AI Ranking: 4th / 18 validated participants

Phase 1.1 Public (Medium)	Phase 1.1 Public (Hard)	Public Score	Private Score	Model Score	Normalized Model Score	Report Score (Q)
0.6784	0.5398	0.5675	0.1156	0.4998	0.7094	0.807

The model outperformed the official transformer baseline while keeping the parameter count and run‑time budget tiny.

Key Innovations

Metric‑aligned loss – maximizes leaderboard skill instead of generic RMSE.
Per‑sequence density normalization – each file is scaled by its own background density to generalize across altitudes.
Explicit multi‑scale lags (1–12 h) + moving averages – exposes short‑term dynamics without increasing network depth.
Additive attention – automatically up‑weights the handful of storm‑time hours that dominate density variability.

Why It Matters

Operational speed – forecasts arrive fast enough for on‑ground collision‑risk screening or even on‑board drag compensation.
Storm‑time fidelity – exponential weighting + attention means accuracy where operators care most (first 24 h).
Reproducibility – the entire pipeline (data loaders, scalers, PyTorch model) is open‑sourced and deterministic.

What’s Next

I’m would explore three extensions:

Hybrid residuals – add the learned correction on top of NRLMSIS 2.1 to blend physics priors with data‑driven agility.
Uncertainty quantification – last‑layer Laplace + Monte‑Carlo dropout for calibrated confidence intervals.
Continual learning hooks – lightweight fine‑tunes so the model stays sharp through Solar Cycle 25.