Case Study 01 — LagrangeBench¶
LagrangeBench (Toshev et al., NeurIPS 2023; tumaer/lagrangebench) ships pretrained
checkpoints for two architectures — GNS (non-equivariant) and SEGNN
(E(2)-equivariant by design) — across seven particle-based fluid datasets in JAX.
Substrate scope. physics-lint’s shipped CLI and GitHub Action ingest
structured-grid model data — .npz/.npy dumps and grid adapters.
LagrangeBench is particle-based (Lagrangian SPH), a substrate the
pip-installable physics-lint package does not yet ingest. This case study
therefore applies the physics-lint rule methodology — the PH-SYM-001
(rotation) and PH-SYM-002 (reflection) equivariance checks, the calibrated
bands, the SARIF result schema — to particle-rollout data through a dedicated
research harness (external_validation/_rollout_anchors/_harness/), not
through the public physics-lint check CLI. That harness is not part of the
shipped package. CLI/Action loader integration for the mesh and particle
substrates is planned for v1.2.0 (see docs/backlog/v1.2.md); the result
below is a methodology demonstration on a particle substrate — and evidence
that the rule design transfers cleanly when that integration lands.
Headline result — equivariance gap detected on published checkpoints.
Across 20 trajectories per (rule, stack) at single-step inference
(n_rollout_steps = 1), SEGNN-TGV2D’s per-trajectory equivariance error sits
monomodally at the float32 noise floor (~2.3×10⁻⁷ to 3.4×10⁻⁷ across the
80 active-symmetry rows); GNS-TGV2D’s same-rule signature is bimodal,
splitting roughly 50/50 between an APPROXIMATE band (~3.6×10⁻⁴ to
4.2×10⁻⁴) and a FAIL band quantized at ~0.02. The ~3.2 OOM gap between
SEGNN’s monomodal floor and GNS’s APPROXIMATE-band lower mode is the
load-bearing cross-stack signature — SEGNN’s E(2)-equivariance is
exact-by-construction; GNS’s is approximate-by-training, consistent with
Helwig et al.’s data-augmentation characterization. The full table and
SARIF artifacts are linked under “Validation harness” below.
P2.1 multi-trajectory expansion. Per DECISIONS.md D0-26, the rung-4b table reports 20 trajectories per (rule, stack) on TGV2D for both GNS and SEGNN. The trajectory set is deterministic and reproducible from the committed audit JSON; selection was pre-registered before any Modal fire.
P2.3 scope qualifier (structural-empirical link). Per DECISIONS.md
D0-28, the equivariance gap above is reported on the
GNS-as-shipped checkpoint — a single realization of the “non-equivariant
architecture under typical training” class. The project deliberately does
not extend rung-4b with a self-trained non-equivariant GNN as a
second architectural data point; the rollout-anchor portfolio rests on
published checkpoints (F3 borrowed-credibility framing, see
methodology/docs/2026-05-01-rollout-anchor-extension-design.md §1.1),
and a self-trained baseline would be in a structurally weaker evidence
class. The structural-empirical-link argument is defeasible — a
non-equivariant architecture trained with full SO(2) augmentation could
in principle approximate equivariance to near-noise-floor accuracy — but
the rung-4b reading on GNS-as-shipped is consistent with the
architectural reason.
What this case study does NOT cover. PH-BC-001 no-slip on a body-surface
velocity field is structurally inapplicable to LagrangeBench’s particle
representation (no mesh; no surface trace operator). PH-CON-001 mass and
PH-CON-002/003 energy/dissipation are exercised in the rung-4a sibling
artifact (gns-tgv2d + segnn-tgv2d columns) rather than re-derived here.
The CS01 deliverables are scoped to particle-side equivariance evidence
on the two published checkpoints; broader architectural sweeps are out
of v1.0 scope.