Case Study 01 — LagrangeBench

LagrangeBench (Toshev et al., NeurIPS 2023; tumaer/lagrangebench) ships pretrained checkpoints for two architectures — GNS (non-equivariant) and SEGNN (E(2)-equivariant by design) — across seven particle-based fluid datasets in JAX. Substrate scope. physics-lint’s shipped CLI and GitHub Action ingest structured-grid model data — .npz/.npy dumps and grid adapters. LagrangeBench is particle-based (Lagrangian SPH), a substrate the pip-installable physics-lint package does not yet ingest. This case study therefore applies the physics-lint rule methodology — the PH-SYM-001 (rotation) and PH-SYM-002 (reflection) equivariance checks, the calibrated bands, the SARIF result schema — to particle-rollout data through a dedicated research harness (external_validation/_rollout_anchors/_harness/), not through the public physics-lint check CLI. That harness is not part of the shipped package. CLI/Action loader integration for the mesh and particle substrates is planned for v1.2.0 (see docs/backlog/v1.2.md); the result below is a methodology demonstration on a particle substrate — and evidence that the rule design transfers cleanly when that integration lands.

Headline result — equivariance gap detected on published checkpoints. Across 20 trajectories per (rule, stack) at single-step inference (n_rollout_steps = 1), SEGNN-TGV2D’s per-trajectory equivariance error sits monomodally at the float32 noise floor (~2.3×10⁻⁷ to 3.4×10⁻⁷ across the 80 active-symmetry rows); GNS-TGV2D’s same-rule signature is bimodal, splitting roughly 50/50 between an APPROXIMATE band (~3.6×10⁻⁴ to 4.2×10⁻⁴) and a FAIL band quantized at ~0.02. The ~3.2 OOM gap between SEGNN’s monomodal floor and GNS’s APPROXIMATE-band lower mode is the load-bearing cross-stack signature — SEGNN’s E(2)-equivariance is exact-by-construction; GNS’s is approximate-by-training, consistent with Helwig et al.’s data-augmentation characterization. The full table and SARIF artifacts are linked under “Validation harness” below.

P2.1 multi-trajectory expansion. Per DECISIONS.md D0-26, the rung-4b table reports 20 trajectories per (rule, stack) on TGV2D for both GNS and SEGNN. The trajectory set is deterministic and reproducible from the committed audit JSON; selection was pre-registered before any Modal fire.

P2.3 scope qualifier (structural-empirical link). Per DECISIONS.md D0-28, the equivariance gap above is reported on the GNS-as-shipped checkpoint — a single realization of the “non-equivariant architecture under typical training” class. The project deliberately does not extend rung-4b with a self-trained non-equivariant GNN as a second architectural data point; the rollout-anchor portfolio rests on published checkpoints (F3 borrowed-credibility framing, see methodology/docs/2026-05-01-rollout-anchor-extension-design.md §1.1), and a self-trained baseline would be in a structurally weaker evidence class. The structural-empirical-link argument is defeasible — a non-equivariant architecture trained with full SO(2) augmentation could in principle approximate equivariance to near-noise-floor accuracy — but the rung-4b reading on GNS-as-shipped is consistent with the architectural reason.

What this case study does NOT cover. PH-BC-001 no-slip on a body-surface velocity field is structurally inapplicable to LagrangeBench’s particle representation (no mesh; no surface trace operator). PH-CON-001 mass and PH-CON-002/003 energy/dissipation are exercised in the rung-4a sibling artifact (gns-tgv2d + segnn-tgv2d columns) rather than re-derived here. The CS01 deliverables are scoped to particle-side equivariance evidence on the two published checkpoints; broader architectural sweeps are out of v1.0 scope.