# PhysFlow-Earth: Physics-Constrained Rectified Flow for Earth Observation Super-Resolution and Climate Downscaling

Arun Sharma, University of Minnesota, Twin Cities

_In preparation. Target: NeurIPS 2026 Climate Change AI workshop_

<div class="section abstract" role="doc-abstract">

<div class="centerline">

<span class="ptmb8t-x-x-120">Abstract</span>

</div>

> Generative super-resolution models can produce visually plausible Earth-observation fields while violating the physical quantities that make the fields useful: coarse precipitation totals, wind divergence, or spectral index consistency in multispectral imagery. PhysFlow-Earth is a conditional rectified-flow stack for satellite and climate downscaling that exposes these constraints as differentiable residuals during training. The implementation combines a rectified-flow objective, a Diffusion Transformer velocity backbone, low-resolution conditioning tokens, and a learned physics codebook. Residual modules enforce average-pool consistency for precipitation, horizontal-divergence penalties for wind, and band-ratio consistency for Sentinel-2. This paper documents the project as an arXiv-style systems paper grounded in the current repository. It reports only implementation validation from the test suite and leaves benchmark claims for future reproduction on WorldStrat, SEN2VENuS, ERA5, and CHIRPS.

</div>

## <span class="titlemark">1 </span> <span id="x1-10001"></span>Introduction

Remote-sensing and climate products often require information at a spatial resolution finer than the native sensor or simulation grid. Super-resolution and statistical downscaling methods address this mismatch by learning a mapping from coarse fields to high-resolution fields. Diffusion and flow-based generative models are attractive because they represent uncertainty and can synthesize sharp spatial structure \[[11](#Xho2020denoising), [23](#Xlipman2023flow), [24](#Xliu2022rectified)\]. In scientific geospatial settings, however, sharpness is not enough. The high-resolution output must preserve coarse aggregate mass, respect physically meaningful band relationships, and avoid vector-field artifacts.

PhysFlow-Earth implements a conservative alternative: a conditional rectified flow whose training loss includes differentiable physics residuals evaluated on the projected clean sample. The repository is deliberately structured as a reusable research implementation. It provides the flow wrapper, residual operators, a DiT-style velocity model, a Diffusers-like pipeline interface, Hydra configuration, a Gradio Space, and CPU tests.

This paper takes a conservative stance. The README describes intended benchmark targets and deployment workflows, but the current paper does not restate unverified leaderboard claims. Instead it explains the method, records what the tests establish, and names the measurements needed before submission.

<span id="contributions" class="paragraphHead"> <span id="x1-2000"></span><span class="ptmb8t-">Contributions:</span></span>

1\.  
A conditional rectified-flow objective for Earth-observation super-resolution with residuals applied to the predicted clean sample.

2\.  
Differentiable residual modules for coarse mass conservation, horizontal wind divergence, and Sentinel-2 band-ratio preservation.

3\.  
A DiT velocity backbone with low-resolution conditioning tokens and cross-attention to a learned physics codebook.

4\.  
A reproducible software package with shape tests, residual tests, training-step tests, and a CPU-safe Hugging Face Space.

<figure class="figure">
<p><img src="figures/main-375ccbd4af0e321b0943c22f7450a9b7.svg" loading="lazy" alt="Figure" /> <span id="x1-2005r1"></span></p>
<figcaption><span class="id">Figure 1: </span><span class="content">Detailed PhysFlow-Earth architecture. The diagram shows the tokenized conditional transport path, the repeated attention-style velocity block, the clean-sample decoder, and the physics residual heads. The evaluation side is explicit so image metrics, scientific residuals, calibration, and sampling cost are not conflated. </span></figcaption>
</figure>

<span id="scope" class="paragraphHead"> <span id="x1-3000"></span><span class="ptmb8t-">Scope:</span></span> The central research tension in Earth-observation downscaling is that visual quality and scientific validity are not the same objective. A high-resolution precipitation field can look crisp while violating the coarse accumulation that conditioned it. A wind field can look spatially realistic while producing implausible divergence artifacts. A multispectral sample can improve image metrics while distorting indices used by downstream land-cover or water analysis. This makes naive image super-resolution a poor default for scientific geospatial data.

PhysFlow-Earth positions generative modeling as a conditional transport problem with explicit scientific residuals. The model is allowed to represent uncertainty and high-frequency structure, but it is penalized when the predicted endpoint violates known aggregate or spectral constraints. This is a middle path between unconstrained image generation and hard-coded physical simulation. It is not a replacement for numerical weather prediction or radiative-transfer modeling. It is a lightweight framework for making neural downscaling less indifferent to the quantities that domain users care about.

The choice of rectified flow is pragmatic. Diffusion models are strong but can require many denoising steps and a more complex training objective. Rectified flow gives a direct velocity target and a clean endpoint estimate. That endpoint estimate is precisely where residuals should be evaluated. The method therefore has a natural place to ask: if this is the model’s current clean high-resolution field, does it preserve mass, divergence behavior, and band relationships?

The paper should be read as a research implementation rather than a leaderboard claim. The repository implements the residuals, model wrapper, DiT-style velocity path, pipeline surface, and tests. What it does not yet contain is a public benchmark table on WorldStrat, SEN2VENuS, ERA5, WeatherBench, or CHIRPS. This distinction matters. This paper builds enough theory and evaluation detail to make those future results interpretable instead of merely decorative.

<span id="expanded-contributions" class="paragraphHead"> <span id="x1-4000"></span><span class="ptmb8t-">Expanded contributions:</span></span> The expanded version contributes four additional research assets: a residual-weight selection protocol, an area-weighted aggregation extension for global grids, an uncertainty-evaluation plan for generative downscaling, and a set of reader-facing claim boundaries. These are necessary because a physics-guided generative model can otherwise sound impressive while remaining scientifically under-specified.

## <span class="titlemark">2 </span> <span id="x1-50002"></span>Related Work

<span id="expanded-citation-map" class="paragraphHead"> <span id="x1-6000"></span><span class="ptmb8t-">Expanded Citation Map:</span></span> The expanded references place PhysFlow-Earth between image restoration, generative modeling, physics-informed learning, and weather/remote-sensing downscaling. SRCNN, SRGAN, EDSR, RCAN, SwinIR, and U-Net represent the classical deep restoration backbone family \[[8](#Xdong2016srcnn), [18](#Xledig2017srgan), [21](#Xliang2021swinir), [22](#Xlim2017edsr), [38](#Xronneberger2015unet), [46](#Xzhang2018rcan)\]. DDPM, score-based modeling, Palette, latent diffusion, EDM, DiT, rectified flow, flow matching, and stochastic interpolants define the generative side \[[1](#Xalbergo2023stochastic), [11](#Xho2020denoising), [14](#Xkarras2022edm), [23](#Xlipman2023flow), [24](#Xliu2022rectified), [31](#Xpeebles2023scalable), [37](#Xrombach2022latent), [40](#Xsaharia2022image), [42](#Xsong2021score)\]. Physics-informed neural networks, theory-guided data science, scientific-knowledge integration, Fourier neural operators, FourCastNet, GraphCast, Pangu-Weather, and generative precipitation nowcasting motivate residual checks beyond perceptual sharpness \[[2](#Xbi2023panguweather), [12](#Xkarniadakis2021physicsinformed), [13](#Xkarpatne2017theory), [16](#Xlam2023graphcast), [20](#Xli2021fourier), [29](#Xpathak2022fourcastnet), [32](#Xraissi2019physics), [33](#Xravuri2021skillful), [45](#Xwillard2022integrating)\]. WorldStrat, SEN2VENuS, CorrDiff, and precipitation diffusion define the likely benchmark neighborhood \[[6](#Xcornebise2022worldstrat), [19](#Xleinonen2023precipitation), [25](#Xmardani2023corrdiff), [26](#Xmichel2022sen2venus)\].

<span id="generative-superresolution" class="paragraphHead"> <span id="x1-7000"></span><span class="ptmb8t-">Generative super-resolution:</span></span> Diffusion models have become a standard route to high-fidelity image generation \[[11](#Xho2020denoising), [40](#Xsaharia2022image), [42](#Xsong2021score)\]. Rectified flow and flow matching simplify sampling by learning velocity fields between noise and data \[[23](#Xlipman2023flow), [24](#Xliu2022rectified)\]. PhysFlow-Earth follows this family but treats physical consistency as a training objective, not a post-hoc filter.

<span id="transformers-for-diffusion-and-flow" class="paragraphHead"> <span id="x1-8000"></span><span class="ptmb8t-">Transformers for diffusion and flow:</span></span> Diffusion Transformers replace U-Net inductive biases with patch-token self-attention and adaptive normalization \[[31](#Xpeebles2023scalable)\]. This design is useful for multispectral and climate grids because conditioning can be represented as tokens rather than only as concatenated channels.

<span id="physicsguided-machine-learning" class="paragraphHead"> <span id="x1-9000"></span><span class="ptmb8t-">Physics-guided machine learning:</span></span> Physics-informed neural networks and knowledge-guided machine learning show that scientific constraints can improve generalization and prevent physically invalid outputs \[[13](#Xkarpatne2017theory), [32](#Xraissi2019physics)\]. PhysFlow-Earth keeps the constraint layer lightweight: residuals are ordinary differentiable PyTorch modules with a common interface.

<span id="earthobservation-downscaling" class="paragraphHead"> <span id="x1-10000"></span><span class="ptmb8t-">Earth-observation downscaling:</span></span> Remote-sensing super-resolution and climate downscaling have different data assumptions but a shared mathematical structure: infer a high-resolution field conditioned on a coarser observation. WorldStrat provides paired high-resolution commercial satellite imagery and Sentinel-2 context for global super-resolution research \[[6](#Xcornebise2022worldstrat)\]. SEN2VENuS provides Sentinel-2 and VEN<span class="mathjax-inline">\\\mu \\</span>S acquisitions for radiometrically consistent super-resolution \[[26](#Xmichel2022sen2venus)\]. Diffusion-based atmospheric downscaling work such as CorrDiff and spatiotemporal precipitation diffusion shows that generative models can sharpen weather fields while representing uncertainty \[[19](#Xleinonen2023precipitation), [25](#Xmardani2023corrdiff)\]. PhysFlow-Earth fits into this line but makes the residual constraints explicit in the loss. Earth-system ML and climate-ML surveys further argue that process understanding and scientific diagnostics should be integrated into model design rather than left as a post-hoc visualization layer \[[34](#Xreichstein2019deep), [36](#Xrolnick2019climate)\].

<span id="literature-synthesis" class="paragraphHead"> <span id="x1-11000"></span><span class="ptmb8t-">Literature synthesis:</span></span> PhysFlow-Earth combines two research streams that are often evaluated with different instincts. Diffusion, score-based modeling, rectified flows, latent diffusion, and flow matching provide powerful conditional generative models for images \[[11](#Xho2020denoising), [14](#Xkarras2022edm), [23](#Xlipman2023flow), [24](#Xliu2022rectified), [37](#Xrombach2022latent), [42](#Xsong2021score)\]. Remote-sensing super-resolution and downscaling papers, including SRGAN, EDSR, RCAN, SwinIR, CorrDiff, GraphCast, FourCastNet, Pangu-Weather, and precipitation nowcasting, emphasize scientific validity, calibration, and geophysical structure \[[2](#Xbi2023panguweather), [16](#Xlam2023graphcast), [18](#Xledig2017srgan), [21](#Xliang2021swinir), [22](#Xlim2017edsr), [25](#Xmardani2023corrdiff), [29](#Xpathak2022fourcastnet), [33](#Xravuri2021skillful), [46](#Xzhang2018rcan)\].

The key tension is that perceptual sharpness and physical consistency are not identical objectives. A visually plausible high-resolution field can violate conservation, band relationships, or uncertainty calibration. Physics-informed neural networks and theory-guided data science address this tension by placing governing equations or residual constraints into the learning objective \[[12](#Xkarniadakis2021physicsinformed), [13](#Xkarpatne2017theory), [32](#Xraissi2019physics), [45](#Xwillard2022integrating)\]. PhysFlow-Earth uses that idea in a generative downscaling setting: the residual does not replace the likelihood or flow objective, but it biases the sampler toward fields that respect declared physical checks.

Earth-observation benchmarks also require careful geographic splits. WorldStrat, SEN2VENUS, and climate AI surveys show that remote-sensing models often fail when geography, season, sensor conditions, or domain shift changes \[[6](#Xcornebise2022worldstrat), [26](#Xmichel2022sen2venus), [34](#Xreichstein2019deep), [36](#Xrolnick2019climate)\]. The literature therefore supports an evaluation that reports image metrics, residual metrics, uncertainty metrics, and held-out geography together. That combined reporting is the distinguishing feature of the paper.

<span id="foundational-reference-anchors" class="paragraphHead"> <span id="x1-12000"></span><span class="ptmb8t-">Foundational reference anchors:</span></span> The bibliography also anchors the project-specific contribution in older and broader technical foundations: statistical learning and pattern recognition, deep learning, information theory, convex and numerical optimization, stochastic approximation, adaptive gradient methods, causality, and early AI framing \[[3](#Xbishop2006pattern)–[5](#Xbubeck2015convex), [7](#Xcover2006elements), [9](#Xgoodfellow2016deep), [10](#Xhastie2009elements), [15](#Xkingma2015adam), [17](#Xlecun1998gradient), [27](#Xmurphy2012machine), [28](#Xnocedal2006numerical), [30](#Xpearl2009causality), [35](#Xrobbins1951stochastic), [39](#Xrumelhart1986learning), [41](#Xshannon1948communication), [43](#Xturing1950computing), [44](#Xvapnik1998statistical)\]. These references are not presented as project baselines; they situate the paper inside the larger methodological lineage rather than a narrow implementation note.

## <span class="titlemark">3 </span> <span id="x1-130003"></span>Method and Architecture

<span id="problem-formulation" class="paragraphHead"> <span id="x1-14000"></span><span class="ptmb8t-">Problem Formulation:</span></span> Let <span class="mathjax-inline">\\x\_{\ell }\\</span> be a low-resolution input field and <span class="mathjax-inline">\\x_h\\</span> be the high-resolution target. The task is to learn a conditional generator <span class="mathjax-inline">\\G\_{\theta }(x\_{\ell })\\</span> that produces high-resolution samples <span class="mathjax-inline">\\\hat {x}\_h\\</span> matching the data distribution while preserving a set of physical or spectral constraints. Each modality defines a residual operator <span class="mathjax-inline">\\R_m\\</span>:

<div class="mathjax-env mathjax-equation">

\begin{equation} R_m(\hat {x}\_h, x\_{\ell }) \rightarrow \mathbb {R}^{C_m \times H_m \times W_m}. \end{equation}

</div>

<span id="x1-14001r1"></span>

For coarse mass conservation, <span class="mathjax-inline">\\R\\</span> is an average-pooling error. For wind, it is a finite-difference divergence. For Sentinel-2, it compares indices such as NDVI and NDWI after downsampling.

<span id="method" class="paragraphHead"> <span id="x1-15000"></span><span class="ptmb8t-">Method:</span></span>

<span id="rectifiedflow-training" class="paragraphHead"> <span id="x1-16000"></span><span class="ptmb8t-">Rectified-flow training:</span></span> PhysFlow-Earth learns a velocity model <span class="mathjax-inline">\\v\_{\theta }(x_t,t,c)\\</span> where <span class="mathjax-inline">\\c\\</span> includes low-resolution conditioning. For clean sample <span class="mathjax-inline">\\x_1\\</span> and random noise <span class="mathjax-inline">\\x_0\\</span>, the linear interpolant is

<div class="mathjax-env mathjax-equation">

\begin{equation} x_t = (1-t)x_0 + tx_1, \quad t\sim \mathcal {U}(0,1). \end{equation}

</div>

<span id="x1-16001r2"></span>

The vanilla rectified-flow target is the constant velocity

<div class="mathjax-env mathjax-equation">

\begin{equation} v^\* = x_1 - x_0. \end{equation}

</div>

<span id="x1-16002r3"></span>

The implementation computes a projected clean sample

<div class="mathjax-env mathjax-equation">

\begin{equation} \hat {x}\_1 = x_t + (1-t)v\_{\theta }(x_t,t,c), \end{equation}

</div>

<span id="x1-16003r4"></span>

and evaluates residuals on <span class="mathjax-inline">\\\hat {x}\_1\\</span>. The total objective is

<div class="mathjax-env mathjax-equation">

\begin{equation} \mathcal {L} = \\v\_{\theta }(x_t,t,c)-(x_1-x_0)\\\_2^2 + \lambda \_{\text {phys}}\sum \_m w_m \\R_m(\hat {x}\_1,c)\\\_2^2. \end{equation}

</div>

<span id="x1-16004r5"></span>

Because the residual is evaluated before detaching the model output, gradients flow back through the velocity prediction.

<span id="residual-operators" class="paragraphHead"> <span id="x1-17000"></span><span class="ptmb8t-">Residual operators:</span></span>

<span id="mass-conservation" class="paragraphHead"> <span id="x1-18000"></span><span class="ptmb8t-">Mass conservation:</span></span> For precipitation or other scalar intensive fields, the high-resolution output should preserve coarse cell means. The residual is

<div class="mathjax-env mathjax-equation">

\begin{equation} R\_{\text {mass}}(\hat {x}\_h,x\_{\ell }) = \text {AvgPool}\_{s}(\hat {x}\_h)-x\_{\ell }. \end{equation}

</div>

<span id="x1-18001r6"></span>

<span id="horizontal-divergence" class="paragraphHead"> <span id="x1-19000"></span><span class="ptmb8t-">Horizontal divergence:</span></span> For a wind field <span class="mathjax-inline">\\(u,v)\\</span>, PhysFlow-Earth computes a finite-difference divergence:

<div class="mathjax-env mathjax-equation">

\begin{equation} R\_{\text {div}} = \frac {\partial u}{\partial x}+\frac {\partial v}{\partial y}. \end{equation}

</div>

<span id="x1-19001r7"></span>

The operator is intentionally simple and unit-testable. Production runs should calibrate spacing from the actual grid.

<span id="bandratio-consistency" class="paragraphHead"> <span id="x1-20000"></span><span class="ptmb8t-">Band-ratio consistency:</span></span> For Sentinel-2, the model compares downsampled spectral indices:

<div class="mathjax-env mathjax-equation">

\begin{equation} \begin {aligned} \text {NDVI}(x)&=\frac {x\_{\text {NIR}}-x\_{\text {red}}}{x\_{\text {NIR}}+x\_{\text {red}}+\epsilon },\\ \text {NDWI}(x)&=\frac {x\_{\text {green}}-x\_{\text {SWIR}}}{x\_{\text {green}}+x\_{\text {SWIR}}+\epsilon }. \end {aligned} \end{equation}

</div>

<span id="x1-20001r8"></span>

The residual is the concatenation of average-pooled index differences.

<span id="velocity-backbone" class="paragraphHead"> <span id="x1-21000"></span><span class="ptmb8t-">Velocity backbone:</span></span> The velocity network is a Diffusion Transformer. High-resolution noisy fields are patch-embedded. Low-resolution fields are tokenized separately and concatenated to the patch sequence. Each block applies self-attention, cross-attention to a learned physics codebook, and an adaptive normalization MLP driven by sinusoidal time embeddings. The output head maps patch tokens back to pixel space through pixel shuffle.

<span id="implementation" class="paragraphHead"> <span id="x1-22000"></span><span class="ptmb8t-">Implementation:</span></span> The project is packaged as <span class="pcrr8t-">physflow</span>. Core modules are intentionally narrow:

- <span class="pcrr8t-">flow.RectifiedFlow</span>: wraps a velocity model and implements the hybrid loss.
- <span class="pcrr8t-">physics.residual</span>: defines residual modules with a shared interface.
- <span class="pcrr8t-">models.DiTVelocity</span>: implements the tokenized velocity backbone.
- <span class="pcrr8t-">models.PhysFlowPipeline</span>: exposes inference in a pipeline style.
- <span class="pcrr8t-">space/app.py</span>: demonstrates downscaling inputs and physics dashboards in Gradio.

## <span class="titlemark">4 </span> <span id="x1-230004"></span>Evaluation

Table [1](#current-validation-in-the-physflowearth-repository) summarizes the current implementation-grounded checks. These are not a replacement for benchmark evaluation, but they protect the scientific invariants the method depends on.

<div class="table">

<figure id="x1-23001r1" class="float">
<span id="current-validation-in-the-physflowearth-repository"></span>
<div class="tabular">
<table id="TBL-2" class="tabular">
<tbody>
<tr id="TBL-2-1-" style="vertical-align:baseline;">
<td id="TBL-2-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Area</span></p></td>
<td id="TBL-2-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">What is checked</span></p></td>
<td id="TBL-2-1-3" class="td10" style="text-align: right; white-space: normal;"><span class="ptmb8t-">Count</span></td>
</tr>
<tr id="TBL-2-2-" style="vertical-align:baseline;">
<td id="TBL-2-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Physics residuals</p></td>
<td id="TBL-2-2-2" class="td11" style="text-align: left; white-space: normal;"><p>average pooling inverse, zero divergence on constant fields, linear-gradient divergence, mass residual, band-ratio residual</p></td>
<td id="TBL-2-2-3" class="td10" style="text-align: right; white-space: normal;">6</td>
</tr>
<tr id="TBL-2-3-" style="vertical-align:baseline;">
<td id="TBL-2-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Flow training</p></td>
<td id="TBL-2-3-2" class="td11" style="text-align: left; white-space: normal;"><p>velocity loss decreases on a tiny model, physics loss is non-negative, full backward pass runs</p></td>
<td id="TBL-2-3-3" class="td10" style="text-align: right; white-space: normal;">3</td>
</tr>
<tr id="TBL-2-4-" style="vertical-align:baseline;">
<td id="TBL-2-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Model and Space path</p></td>
<td id="TBL-2-4-2" class="td11" style="text-align: left; white-space: normal;"><p>DiT output shape, pipeline output shape, UI construction, constants, requirements, HF frontmatter</p></td>
<td id="TBL-2-4-3" class="td10" style="text-align: right; white-space: normal;">8</td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 1: </span><span class="content">Current validation in the PhysFlow-Earth repository. </span></figcaption>
</figure>

</div>

The next benchmark layer should evaluate WorldStrat and SEN2VENuS for Sentinel-2 super-resolution, ERA5 or WeatherBench-style grids for winds, and CHIRPS or ERA5 precipitation for conservation. Metrics should include PSNR, SSIM, LPIPS where appropriate, and residual-specific physical scores such as mass error, divergence norm, and spectral-index preservation.

<span id="theory-conditional-transport-with-scientific-residuals" class="paragraphHead"> <span id="x1-24000"></span><span class="ptmb8t-">Theory: Conditional Transport with Scientific Residuals:</span></span> PhysFlow-Earth can be understood as learning a conditional transport map from a simple base distribution to a high-resolution Earth-observation distribution. Let <span class="mathjax-inline">\\p_0\\</span> be the noise distribution and <span class="mathjax-inline">\\p_1(x_h\mid x\_{\ell })\\</span> be the conditional data distribution of high-resolution fields given a low-resolution field. A continuous-time generator defines an ordinary differential equation

<div class="mathjax-env mathjax-equation">

\begin{equation} \frac {d x_t}{dt}=v\_{\theta }(x_t,t,x\_{\ell }),\qquad x_0\sim p_0,\quad t\in \[0,1\], \end{equation}

</div>

<span id="x1-24001r9"></span>

and produces <span class="mathjax-inline">\\x_1\\</span> by integrating the learned velocity field. Rectified flow chooses a simple linear interpolation between paired samples and trains the model to predict the displacement between endpoints \[[24](#Xliu2022rectified)\]. Flow matching generalizes this view to probability paths and corresponding vector fields \[[23](#Xlipman2023flow)\].

The scientific question is where physics enters this transport. A post-hoc filter can reject samples that violate a constraint, but rejection is expensive and does not teach the generator. A projection layer can force the output into a feasible set, but hard projections may be non-differentiable or may destroy perceptual detail. PhysFlow-Earth uses a softer design: it evaluates differentiable residuals on the predicted clean sample and adds those residuals to the training objective. This keeps sampling simple while shaping the learned velocity field.

<span id="residuals-as-weak-constraints" class="paragraphHead"> <span id="x1-25000"></span><span class="ptmb8t-">Residuals as weak constraints:</span></span> Let <span class="mathjax-inline">\\\mathcal {C}\\</span> be the ideal feasible set

<div class="mathjax-env mathjax-equation">

\begin{equation} \mathcal {C}(x\_{\ell }) = \\x: R_m(x,x\_{\ell })=0\\ \forall m\\. \end{equation}

</div>

<span id="x1-25001r10"></span>

The training objective with residual penalties is a weak enforcement of this set:

<div class="mathjax-env mathjax-equation">

\begin{equation} \mathbb {E}\left \[ \\v\_{\theta }(x_t,t,x\_{\ell })-v^\*\\\_2^2 +\sum \_m \lambda \_m \\R_m(\hat {x}\_1,x\_{\ell })\\\_2^2 \right \]. \end{equation}

</div>

<span id="x1-25002r11"></span>

When <span class="mathjax-inline">\\\lambda \_m\\</span> is small, residuals act as regularizers. When <span class="mathjax-inline">\\\lambda \_m\\</span> is large, they approximate constrained optimization but may reduce sample diversity or introduce artifacts. A publication-quality version should therefore report a Pareto curve: perceptual quality versus physical residual. One scalar score is not enough.

<span id="cleansample-projection" class="paragraphHead"> <span id="x1-26000"></span><span class="ptmb8t-">Clean-sample projection:</span></span> The residual is evaluated on

<div class="mathjax-env mathjax-equation">

\begin{equation} \hat {x}\_1=x_t+(1-t)v\_{\theta }(x_t,t,x\_{\ell }), \end{equation}

</div>

<span id="x1-26001r12"></span>

not on <span class="mathjax-inline">\\x_t\\</span>. This is important. At intermediate time <span class="mathjax-inline">\\t\\</span>, <span class="mathjax-inline">\\x_t\\</span> contains noise by construction and should not satisfy physical constraints. The projected clean sample is the model’s current estimate of the endpoint. Penalizing that estimate gives the velocity model useful gradients without asking noisy states to be physically meaningful.

<span id="conservation-and-aggregation" class="paragraphHead"> <span id="x1-27000"></span><span class="ptmb8t-">Conservation and aggregation:</span></span> For a coarse scalar field, the most basic consistency condition is aggregation:

<div class="mathjax-env mathjax-equation">

\begin{equation} A_s x_h \approx x\_{\ell }, \end{equation}

</div>

<span id="x1-27001r13"></span>

where <span class="mathjax-inline">\\A_s\\</span> averages each <span class="mathjax-inline">\\s\times s\\</span> high-resolution block. This is a discrete conservation law. For precipitation it can represent mass or accumulation consistency; for downscaled scalar variables it represents agreement with the coarse product. The residual

<div class="mathjax-env mathjax-equation">

\begin{equation} R\_{\text {mass}}=A_s\hat {x}\_h-x\_{\ell } \end{equation}

</div>

<span id="x1-27002r14"></span>

is simple, differentiable, and easy to test. It is also incomplete: it does not ensure realistic texture, extremes, or temporal coherence. It should be reported alongside distributional metrics.

<span id="vectorfield-residuals" class="paragraphHead"> <span id="x1-28000"></span><span class="ptmb8t-">Vector-field residuals:</span></span> For wind, the divergence proxy is

<div class="mathjax-env mathjax-equation">

\begin{equation} R\_{\text {div}}=\nabla \_h\cdot \hat {u}, \end{equation}

</div>

<span id="x1-28001r15"></span>

where <span class="mathjax-inline">\\\nabla \_h\\</span> is a horizontal finite-difference operator. This is a weak proxy rather than a full atmospheric equation. It does not include vertical motion, pressure gradients, Coriolis terms, boundary-layer effects, or terrain. Its role in the current repository is to test the architecture for vector residuals. The benchmark paper should be careful: it can claim a differentiable divergence residual, not a full Navier-Stokes or primitive-equation solver.

<span id="spectral-residuals" class="paragraphHead"> <span id="x1-29000"></span><span class="ptmb8t-">Spectral residuals:</span></span> For Sentinel-2 imagery, radiometric consistency can be more useful than a generic image prior. NDVI and NDWI constraints are examples of index-level consistency. If the high-resolution output looks sharp but changes vegetation or water indices after downsampling, it is less useful for downstream Earth-science tasks. The residuals therefore compare band-ratio functions after aggregation. Because ratios are sensitive to denominator noise, the implementation includes <span class="mathjax-inline">\\\epsilon \\</span> and should evaluate sensitivity to radiometric scaling.

<span id="design-space" class="paragraphHead"> <span id="x1-30000"></span><span class="ptmb8t-">Design Space:</span></span> PhysFlow-Earth sits between three families of methods:

1\.  
deterministic super-resolution models trained with pixel or perceptual losses;

2\.  
diffusion or flow models trained for conditional sample quality;

3\.  
physics-informed models trained with residual constraints.

The project chooses the third path only where constraints are cheap and differentiable. It does not try to solve a full PDE inside the sampler. This is deliberate. A lightweight residual layer is easier to test, easier to ablate, and more likely to survive contact with heterogeneous satellite and climate products.

<span id="why-rectified-flow" class="paragraphHead"> <span id="x1-31000"></span><span class="ptmb8t-">Why rectified flow:</span></span> Rectified flow is attractive for this implementation because the target velocity <span class="mathjax-inline">\\x_1-x_0\\</span> is simple and the predicted clean sample has a closed-form estimate at any training time. For a portfolio implementation, this reduces moving parts relative to a multi-step denoising objective. In a future production model, the choice should be empirical: compare rectified flow, conditional diffusion, and flow matching under the same residual losses and sampling budgets.

<span id="why-a-dit-backbone" class="paragraphHead"> <span id="x1-32000"></span><span class="ptmb8t-">Why a DiT backbone:</span></span> Earth-observation fields are multi-channel arrays with long-range spatial structure. A DiT-style backbone makes conditioning modular. Low-resolution fields, time embeddings, variable identifiers, and physics-codebook tokens can be represented as tokens. This is useful when the same architecture must handle precipitation, wind, and multispectral imagery. The tradeoff is compute: transformers scale poorly with token count unless patch size, attention pattern, or windowing is chosen carefully.

<span id="why-a-physics-codebook" class="paragraphHead"> <span id="x1-33000"></span><span class="ptmb8t-">Why a physics codebook:</span></span> The learned physics codebook in the current repository is a conditioning mechanism. It should not be oversold as symbolic physics. Its purpose is to give the model a small set of learned latent anchors that can interact with patch tokens. A future paper should ablate the codebook size and compare it with ordinary learned condition tokens.

<span id="additional-literature-context" class="paragraphHead"> <span id="x1-34000"></span><span class="ptmb8t-">Additional Literature Context:</span></span>

<span id="diffusion-and-scorebased-generation" class="paragraphHead"> <span id="x1-35000"></span><span class="ptmb8t-">Diffusion and score-based generation:</span></span> DDPM introduced a practical denoising objective for generative modeling \[[11](#Xho2020denoising)\]. Score-based generative modeling framed the same family through stochastic differential equations \[[42](#Xsong2021score)\]. SR3 showed that iterative refinement can produce strong image super-resolution results \[[40](#Xsaharia2022image)\]. These methods are relevant because downscaling is a conditional image-generation problem. They are insufficient by themselves because scientific fields require consistency beyond visual plausibility.

<span id="flow-matching-and-rectified-transport" class="paragraphHead"> <span id="x1-36000"></span><span class="ptmb8t-">Flow matching and rectified transport:</span></span> Rectified flow learns a velocity field that moves noise to data along straight or nearly straight paths \[[24](#Xliu2022rectified)\]. Flow matching provides a broader framework for learning continuous normalizing flows from prescribed probability paths \[[23](#Xlipman2023flow)\]. PhysFlow-Earth uses this family because the velocity objective is direct and because residuals can be evaluated on the projected endpoint.

<span id="climate-and-weather-downscaling" class="paragraphHead"> <span id="x1-37000"></span><span class="ptmb8t-">Climate and weather downscaling:</span></span> CorrDiff uses residual corrective diffusion for kilometer-scale atmospheric downscaling \[[25](#Xmardani2023corrdiff)\]. Precipitation video diffusion uses temporal conditioning to represent high-frequency precipitation patterns \[[19](#Xleinonen2023precipitation)\]. These works motivate generative approaches for weather fields but also highlight the need for physical diagnostics: sharp precipitation maps are not automatically mass-consistent or decision-useful.

<span id="remotesensing-superresolution-datasets" class="paragraphHead"> <span id="x1-38000"></span><span class="ptmb8t-">Remote-sensing super-resolution datasets:</span></span> WorldStrat and SEN2VENuS are natural benchmark candidates \[[6](#Xcornebise2022worldstrat), [26](#Xmichel2022sen2venus)\]. WorldStrat emphasizes global coverage and a pairing between high-resolution commercial imagery and Sentinel-2. SEN2VENuS emphasizes radiometrically consistent cross-sensor training data. A strong paper should use both if licensing and data volume allow because they stress different failure modes.

<span id="evaluation-protocol" class="paragraphHead"> <span id="x1-39000"></span><span class="ptmb8t-">Evaluation Protocol:</span></span>

<figure class="figure">
<p><img src="figures/main-5c327efcb76c41679d0ad287e8ef156f.svg" loading="lazy" alt="Figure" /> <span id="x1-39001r2"></span></p>
<figcaption><span class="id">Figure 2: </span><span class="content">Evaluation structure for PhysFlow-Earth: image quality, physical residuals, uncertainty, and compute are reported as a frontier rather than one scalar score. </span></figcaption>
</figure>

The evaluation must separate image quality, physical consistency, calibration, and sampling cost. Table [2](#recommended-evaluation-protocol-for-a-full-physflowearth-paper) gives the recommended measurement plan.

<div class="table">

<figure id="x1-39002r2" class="float">
<span id="recommended-evaluation-protocol-for-a-full-physflowearth-paper"></span>
<div class="tabular">
<table id="TBL-3" class="tabular">
<tbody>
<tr id="TBL-3-1-" style="vertical-align:baseline;">
<td id="TBL-3-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Axis</span></p></td>
<td id="TBL-3-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Metrics</span></p></td>
<td id="TBL-3-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Purpose</span></p></td>
</tr>
<tr id="TBL-3-2-" style="vertical-align:baseline;">
<td id="TBL-3-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Visual fidelity</p></td>
<td id="TBL-3-2-2" class="td11" style="text-align: left; white-space: normal;"><p>PSNR, SSIM, LPIPS, spectral angle</p></td>
<td id="TBL-3-2-3" class="td10" style="text-align: left; white-space: normal;"><p>standard comparison with super-resolution baselines</p></td>
</tr>
<tr id="TBL-3-3-" style="vertical-align:baseline;">
<td id="TBL-3-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Distributional realism</p></td>
<td id="TBL-3-3-2" class="td11" style="text-align: left; white-space: normal;"><p>CRPS, rank histograms, extreme-value frequency</p></td>
<td id="TBL-3-3-3" class="td10" style="text-align: left; white-space: normal;"><p>checks uncertainty and tails for climate variables</p></td>
</tr>
<tr id="TBL-3-4-" style="vertical-align:baseline;">
<td id="TBL-3-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Mass consistency</p></td>
<td id="TBL-3-4-2" class="td11" style="text-align: left; white-space: normal;"><p>average-pool residual, coarse-cell bias</p></td>
<td id="TBL-3-4-3" class="td10" style="text-align: left; white-space: normal;"><p>verifies conservation against the conditioning field</p></td>
</tr>
<tr id="TBL-3-5-" style="vertical-align:baseline;">
<td id="TBL-3-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Vector consistency</p></td>
<td id="TBL-3-5-2" class="td11" style="text-align: left; white-space: normal;"><p>divergence norm and spatial spectrum</p></td>
<td id="TBL-3-5-3" class="td10" style="text-align: left; white-space: normal;"><p>detects physically implausible wind artifacts</p></td>
</tr>
<tr id="TBL-3-6-" style="vertical-align:baseline;">
<td id="TBL-3-6-1" class="td01" style="text-align: left; white-space: normal;"><p>Spectral consistency</p></td>
<td id="TBL-3-6-2" class="td11" style="text-align: left; white-space: normal;"><p>NDVI and NDWI residuals after downsampling</p></td>
<td id="TBL-3-6-3" class="td10" style="text-align: left; white-space: normal;"><p>checks downstream remote-sensing utility</p></td>
</tr>
<tr id="TBL-3-7-" style="vertical-align:baseline;">
<td id="TBL-3-7-1" class="td01" style="text-align: left; white-space: normal;"><p>Sampling cost</p></td>
<td id="TBL-3-7-2" class="td11" style="text-align: left; white-space: normal;"><p>number of function evaluations, wall time, memory</p></td>
<td id="TBL-3-7-3" class="td10" style="text-align: left; white-space: normal;"><p>distinguishes quality from compute budget</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 2: </span><span class="content">Recommended evaluation protocol for a full PhysFlow-Earth paper. </span></figcaption>
</figure>

</div>

Baselines should include bicubic interpolation, deterministic CNN or U-Net super-resolution, conditional diffusion without residuals, rectified flow without residuals, and PhysFlow-Earth with each residual family enabled separately. The residual ablations matter more than the aggregate score. They tell the reader whether physics terms help or merely decorate the loss.

<span id="dataset-cards-for-future-runs" class="paragraphHead"> <span id="x1-40000"></span><span class="ptmb8t-">Dataset Cards for Future Runs:</span></span> Each dataset card should report:

- spatial resolution of input and target,
- temporal matching tolerance,
- channel list and radiometric scaling,
- geographic split and climate or land-cover diversity,
- cloud, missing-data, and quality-mask policy,
- training, validation, and test counts,
- whether examples are paired, weakly paired, or synthetically degraded.

Downscaling papers are especially vulnerable to leakage. Random patches from the same scene can make a model appear to generalize while it only memorizes local texture. A credible split should hold out geography and time where possible.

## <span class="titlemark">5 </span> <span id="x1-410005"></span>Discussion and Limitations

<span id="perceptualphysics-conflict" class="paragraphHead"> <span id="x1-42000"></span><span class="ptmb8t-">Perceptual-physics conflict:</span></span> A model can reduce LPIPS while increasing coarse-cell mass error. The paper should show both metrics rather than hiding the tradeoff in a weighted sum.

<span id="residual-shortcutting" class="paragraphHead"> <span id="x1-43000"></span><span class="ptmb8t-">Residual shortcutting:</span></span> If the residual weight is too high, the model may learn blurry fields that satisfy aggregation but lose high-frequency structure. Conversely, if the weight is too low, the residual becomes decorative.

<span id="radiometric-mismatch" class="paragraphHead"> <span id="x1-44000"></span><span class="ptmb8t-">Radiometric mismatch:</span></span> Band-ratio residuals assume compatible scaling and band definitions. Cross-sensor datasets can violate that assumption if preprocessing differs by source.

<span id="coordinate-and-area-effects" class="paragraphHead"> <span id="x1-45000"></span><span class="ptmb8t-">Coordinate and area effects:</span></span> Average pooling assumes equal-area pixels. Climate grids and satellite products often require latitude-aware area weights. The current repository uses simple pooling; a full paper should implement area-weighted aggregation for global grids.

<span id="training-recipe" class="paragraphHead"> <span id="x1-46000"></span><span class="ptmb8t-">Training Recipe:</span></span> A stable training recipe should start with the residual weights at zero for a short warmup, then ramp them to the target values:

<div class="mathjax-env mathjax-equation">

\begin{equation} \lambda \_m(s)=\lambda \_m^{\max }\min (1,s/S\_{\text {warm}}). \end{equation}

</div>

<span id="x1-46001r16"></span>

This avoids early optimization where random outputs are heavily penalized by residuals before the velocity field has learned the data scale. The paper should report whether residual ramping was used.

<span id="claim-checklist" class="paragraphHead"> <span id="x1-47000"></span><span class="ptmb8t-">Claim Checklist:</span></span> This paper can safely claim a conditional rectified-flow implementation, implemented residual modules, DiT output-shape validation, and a public CPU-safe demo. It should not yet claim superior downscaling, calibrated uncertainty, climate-model replacement, or production-quality physical fidelity. Those claims require benchmark tables and domain review.

<span id="future-figures" class="paragraphHead"> <span id="x1-48000"></span><span class="ptmb8t-">Future Figures:</span></span> The final paper should include:

1\.  
flow diagram showing low-resolution conditioning, velocity model, projected clean sample, and residual loss;

2\.  
examples of coarse input, target, baseline, and PhysFlow output;

3\.  
residual heatmaps for mass, divergence, and spectral indices;

4\.  
Pareto curves showing image quality versus physical residual;

5\.  
sampling-speed comparison between rectified flow and diffusion baselines.

This paper names these figures but does not synthesize fake outputs.

<span id="residual-weight-selection" class="paragraphHead"> <span id="x1-49000"></span><span class="ptmb8t-">Residual Weight Selection:</span></span> Selecting <span class="mathjax-inline">\\\lambda \_m\\</span> is a scientific modeling decision, not a tuning detail. If the residual weight is too small, the model ignores physics. If it is too large, the model can satisfy the residual while losing distributional realism. A useful sweep reports a Pareto frontier:

<div class="mathjax-env mathjax-equation">

\begin{equation} \left (\operatorname {LPIPS}(\lambda \_m),\\ \\R_m(\hat {x},x\_{\ell })\\\_2\right ). \end{equation}

</div>

<span id="x1-49001r17"></span>

The final paper should not choose a single value without showing this tradeoff. For multi-residual training, a grid over all weights may be expensive; a staged sweep can first tune each residual independently and then test combined settings.

<span id="normalization" class="paragraphHead"> <span id="x1-50000"></span><span class="ptmb8t-">Normalization:</span></span> Residuals must be normalized before weighting. A divergence residual and an NDVI residual can have different units and scales. A practical rule is to divide each residual by its baseline standard deviation on the training set:

<div class="mathjax-env mathjax-equation">

\begin{equation} \tilde {R}\_m=\frac {R_m-\mu \_m}{\sigma \_m+\epsilon }. \end{equation}

</div>

<span id="x1-50001r18"></span>

The current code keeps residual modules simple. A benchmark version should report whether residual normalization is used.

<span id="areaweighted-aggregation" class="paragraphHead"> <span id="x1-51000"></span><span class="ptmb8t-">Area-Weighted Aggregation:</span></span> Average pooling assumes every high-resolution pixel contributes equal area to a coarse cell. This is reasonable for local projected imagery but questionable for latitude-longitude climate grids. For global grids, mass residuals should use area weights:

<div class="mathjax-env mathjax-equation">

\begin{equation} R\_{\text {area}} = \frac {\sum \_{i\in c} a_i\hat {x}\_i}{\sum \_{i\in c}a_i}-x\_{\ell ,c}. \end{equation}

</div>

<span id="x1-51001r19"></span>

For regular lat-lon grids, <span class="mathjax-inline">\\a_i\\</span> is approximately proportional to <span class="mathjax-inline">\\\cos (\phi \_i)\\</span>. Adding this option would make PhysFlow-Earth more defensible for global climate downscaling.

<span id="temporal-extension" class="paragraphHead"> <span id="x1-52000"></span><span class="ptmb8t-">Temporal Extension:</span></span> The current formulation is spatial. Climate and weather fields are temporal. A temporal extension would model <span class="mathjax-inline">\\x\_{1:T}\\</span> and include residuals over time:

<div class="mathjax-env mathjax-equation">

\begin{equation} \mathcal {L}\_{\text {temp}}=\sum \_t\mathcal {L}\_{\text {flow}}(x_t)+ \lambda \_{\Delta }\sum \_t\\\hat {x}\_{t+1}-\hat {x}\_{t}\\\_{\text {phys}}. \end{equation}

</div>

<span id="x1-52001r20"></span>

For precipitation, temporal accumulation constraints matter. For wind, temporal coherence and advection matter. The paper lists these as scoped extensions rather than implemented features.

<span id="uncertainty" class="paragraphHead"> <span id="x1-53000"></span><span class="ptmb8t-">Uncertainty:</span></span> Generative downscaling is valuable partly because multiple high-resolution states can correspond to one coarse field. Evaluation should therefore include uncertainty metrics. Continuous ranked probability score, rank histograms, and coverage of prediction intervals are better than reporting only PSNR. If the model is sampled <span class="mathjax-inline">\\K\\</span> times for the same input, the paper should report ensemble mean quality and ensemble spread.

<span id="condensed-version-scope" class="paragraphHead"> <span id="x1-54000"></span><span class="ptmb8t-">Condensed Version Scope:</span></span> For a 10 to 12 page version, keep the conditional transport formulation, residual-on-clean-sample derivation, residual modules, DiT conditioning, evaluation protocol, and limitations. Move residual-weight sweeps, area-weighted aggregation, and temporal extensions to an appendix. The strongest final narrative is: visual quality is not enough for scientific downscaling, so train the generator with explicit differentiable residuals.

<span id="stresstest-questions" class="paragraphHead"> <span id="x1-55000"></span><span class="ptmb8t-">Stress-Test Questions:</span></span>

<span id="is-this-a-full-climate-model" class="paragraphHead"> <span id="x1-56000"></span><span class="ptmb8t-">Is this a full climate model?</span></span> No. It is a conditional generative downscaling implementation with lightweight physical residuals.

<span id="why-not-hardproject-samples-onto-constraints" class="paragraphHead"> <span id="x1-57000"></span><span class="ptmb8t-">Why not hard-project samples onto constraints?</span></span> Hard projection can be non-differentiable, expensive, or destructive to texture. Residual penalties provide a simple differentiable compromise.

<span id="what-evidence-is-missing" class="paragraphHead"> <span id="x1-58000"></span><span class="ptmb8t-">What evidence is missing?</span></span> WorldStrat, SEN2VENuS, ERA5, and precipitation benchmark runs; residual-weight sweeps; uncertainty evaluation; and checkpoint-backed demo outputs.

<span id="implementation-results-and-evaluation-profile" class="paragraphHead"> <span id="x1-59000"></span><span class="ptmb8t-">Implementation Results and Evaluation Profile:</span></span>

<span id="result-a-current-code-checks" class="paragraphHead"> <span id="x1-60000"></span><span class="ptmb8t-">Result A: current code checks:</span></span> In the current local run, <span class="pcrr8t-">uv run -extra dev pytest -q </span>reports 20 passing tests. The tests cover residual behavior, rectified-flow training steps, model shapes, pipeline outputs, and the public Space contract. This result supports the claim that the implemented residual modules and training implementation execute correctly on small CPU-safe examples. It does not establish scientific downscaling accuracy.

<div class="table">

<figure id="x1-60001r3" class="float">
<span id="implementationgrounded-result-for-physflowearth"></span>
<div class="tabular">
<table id="TBL-4" class="tabular">
<tbody>
<tr id="TBL-4-1-" style="vertical-align:baseline;">
<td id="TBL-4-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Check family</span></p></td>
<td id="TBL-4-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Interpretation</span></p></td>
<td id="TBL-4-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Observed</span></p></td>
</tr>
<tr id="TBL-4-2-" style="vertical-align:baseline;">
<td id="TBL-4-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Residual modules</p></td>
<td id="TBL-4-2-2" class="td11" style="text-align: left; white-space: normal;"><p>mass, divergence, and band-ratio operators behave on test tensors</p></td>
<td id="TBL-4-2-3" class="td10" style="text-align: left; white-space: normal;"><p>passed</p></td>
</tr>
<tr id="TBL-4-3-" style="vertical-align:baseline;">
<td id="TBL-4-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Flow wrapper</p></td>
<td id="TBL-4-3-2" class="td11" style="text-align: left; white-space: normal;"><p>velocity loss and physics loss support backward passes</p></td>
<td id="TBL-4-3-3" class="td10" style="text-align: left; white-space: normal;"><p>passed</p></td>
</tr>
<tr id="TBL-4-4-" style="vertical-align:baseline;">
<td id="TBL-4-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Model path</p></td>
<td id="TBL-4-4-2" class="td11" style="text-align: left; white-space: normal;"><p>DiT and pipeline return expected tensor shapes</p></td>
<td id="TBL-4-4-3" class="td10" style="text-align: left; white-space: normal;"><p>passed</p></td>
</tr>
<tr id="TBL-4-5-" style="vertical-align:baseline;">
<td id="TBL-4-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Full local test suite</p></td>
<td id="TBL-4-5-2" class="td11" style="text-align: left; white-space: normal;"><p>repository unit and smoke tests</p></td>
<td id="TBL-4-5-3" class="td10" style="text-align: left; white-space: normal;"><p>20 passed</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 3: </span><span class="content">Implementation-grounded result for PhysFlow-Earth. </span></figcaption>
</figure>

</div>

<span id="result-b-benchmark-signature" class="paragraphHead"> <span id="x1-61000"></span><span class="ptmb8t-">Result B: benchmark signature:</span></span> The expected result is not simply higher PSNR. If PhysFlow-Earth works, it should reduce physical residuals at comparable visual quality, or improve visual quality at comparable residuals. The strongest evidence would be a Pareto frontier, not a single point. For precipitation, the model should preserve coarse totals better than unconstrained diffusion. For Sentinel-2, it should preserve index behavior after downsampling. For wind, it should avoid increasing divergence artifacts relative to the baseline.

<div class="table">

<figure id="x1-61001r4" class="float">
<span id="expected-result-patterns-to-test-not-claimed-outcomes"></span>
<div class="tabular">
<table id="TBL-5" class="tabular">
<tbody>
<tr id="TBL-5-1-" style="vertical-align:baseline;">
<td id="TBL-5-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Task</span></p></td>
<td id="TBL-5-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Expected pattern if method works</span></p></td>
<td id="TBL-5-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Diagnostic</span></p></td>
</tr>
<tr id="TBL-5-2-" style="vertical-align:baseline;">
<td id="TBL-5-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Precipitation</p></td>
<td id="TBL-5-2-2" class="td11" style="text-align: left; white-space: normal;"><p>lower coarse-cell accumulation error at similar sharpness</p></td>
<td id="TBL-5-2-3" class="td10" style="text-align: left; white-space: normal;"><p>mass residual</p></td>
</tr>
<tr id="TBL-5-3-" style="vertical-align:baseline;">
<td id="TBL-5-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Wind</p></td>
<td id="TBL-5-3-2" class="td11" style="text-align: left; white-space: normal;"><p>lower divergence proxy without excessive smoothing</p></td>
<td id="TBL-5-3-3" class="td10" style="text-align: left; white-space: normal;"><p>divergence norm and spectrum</p></td>
</tr>
<tr id="TBL-5-4-" style="vertical-align:baseline;">
<td id="TBL-5-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Sentinel-2</p></td>
<td id="TBL-5-4-2" class="td11" style="text-align: left; white-space: normal;"><p>better NDVI/NDWI preservation after aggregation</p></td>
<td id="TBL-5-4-3" class="td10" style="text-align: left; white-space: normal;"><p>index residual</p></td>
</tr>
<tr id="TBL-5-5-" style="vertical-align:baseline;">
<td id="TBL-5-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Sampling</p></td>
<td id="TBL-5-5-2" class="td11" style="text-align: left; white-space: normal;"><p>fewer steps than diffusion at similar residual-quality tradeoff</p></td>
<td id="TBL-5-5-3" class="td10" style="text-align: left; white-space: normal;"><p>NFE and wall time</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 4: </span><span class="content">Expected result patterns to test, not claimed outcomes. </span></figcaption>
</figure>

</div>

<span id="stresstest-questions1" class="paragraphHead"> <span id="x1-62000"></span><span class="ptmb8t-">Stress-Test Questions:</span></span>

<span id="q1-are-the-physics-residuals-physically-complete" class="paragraphHead"> <span id="x1-63000"></span><span class="ptmb8t-">Q1: Are the physics residuals physically complete?</span></span> No. They are lightweight residuals. Mass pooling, divergence proxies, and spectral ratios are useful constraints, but they are not full atmospheric dynamics or radiative-transfer models. The paper must present them as weak scientific regularizers.

<span id="q2-can-residuals-make-samples-blurry" class="paragraphHead"> <span id="x1-64000"></span><span class="ptmb8t-">Q2: Can residuals make samples blurry?</span></span> Yes. Strong residual weights can over-prioritize aggregate consistency and suppress high-frequency structure. That is why the evaluation must show Pareto curves.

<span id="q3-why-rectified-flow-instead-of-diffusion" class="paragraphHead"> <span id="x1-65000"></span><span class="ptmb8t-">Q3: Why rectified flow instead of diffusion?</span></span> Rectified flow gives a direct endpoint estimate and can reduce sampling complexity. The choice still needs empirical comparison against conditional diffusion under equal compute.

<span id="q4-are-averagepooling-residuals-valid-on-global-grids" class="paragraphHead"> <span id="x1-66000"></span><span class="ptmb8t-">Q4: Are average-pooling residuals valid on global grids?</span></span> Only if cell areas are comparable. For global lat-lon grids, area-weighted aggregation is needed, and the paper includes this extension explicitly.

<span id="q5-how-does-the-method-represent-uncertainty" class="paragraphHead"> <span id="x1-67000"></span><span class="ptmb8t-">Q5: How does the method represent uncertainty?</span></span> Through generative sampling, but calibration is unproven. The paper should report CRPS, rank histograms, and interval coverage before making uncertainty claims.

<span id="q6-what-result-would-make-the-paper-credible" class="paragraphHead"> <span id="x1-68000"></span><span class="ptmb8t-">Q6: What result would make the paper credible?</span></span> A credible result would show that the method moves the quality-physics frontier, not merely that it improves one metric by sacrificing another. The benchmark table should include image metrics, residual metrics, and compute.

<span id="additional-derivation-residual-gradients" class="paragraphHead"> <span id="x1-69000"></span><span class="ptmb8t-">Additional Derivation: Residual Gradients:</span></span> For the mass residual <span class="mathjax-inline">\\R=A\hat {x}\_1-x\_{\ell }\\</span>, the residual loss is

<div class="mathjax-env mathjax-equation">

\begin{equation} \mathcal {L}\_{m}=\\A\hat {x}\_1-x\_{\ell }\\\_2^2. \end{equation}

</div>

<span id="x1-69001r21"></span>

Since <span class="mathjax-inline">\\\hat {x}\_1=x_t+(1-t)v\_{\theta }\\</span>, the gradient with respect to the velocity prediction is

<div class="mathjax-env mathjax-equation">

\begin{equation} \frac {\partial \mathcal {L}\_{m}}{\partial v\_{\theta }} =2(1-t)A^\top (A\hat {x}\_1-x\_{\ell }). \end{equation}

</div>

<span id="x1-69002r22"></span>

This shows why the clean-sample residual gives useful training signal. At early times, the factor <span class="mathjax-inline">\\(1-t)\\</span> is large and residual corrections can influence the velocity. Near <span class="mathjax-inline">\\t=1\\</span>, the model is already close to the endpoint and the residual gradient naturally weakens.

For a generic residual <span class="mathjax-inline">\\R_m(\hat {x}\_1,c)\\</span>, the chain rule gives

<div class="mathjax-env mathjax-equation">

\begin{equation} \frac {\partial \mathcal {L}\_{m}}{\partial v\_{\theta }} =2(1-t)\left (\frac {\partial R_m}{\partial \hat {x}\_1}\right )^\top R_m(\hat {x}\_1,c). \end{equation}

</div>

<span id="x1-69003r23"></span>

This compact expression is the reason residual modules can remain ordinary differentiable PyTorch functions. No special sampler modification is needed at training time.

<span id="additional-literature-integration" class="paragraphHead"> <span id="x1-70000"></span><span class="ptmb8t-">Additional Literature Integration:</span></span> The diffusion and score-model literature gives the generative foundation \[[11](#Xho2020denoising), [40](#Xsaharia2022image), [42](#Xsong2021score)\]. Rectified flow and flow matching give the velocity-field view \[[23](#Xlipman2023flow), [24](#Xliu2022rectified)\]. Physics-informed and theory-guided ML motivate residual constraints \[[13](#Xkarpatne2017theory), [32](#Xraissi2019physics)\]. Remote-sensing and weather downscaling datasets define the empirical target \[[6](#Xcornebise2022worldstrat), [19](#Xleinonen2023precipitation), [25](#Xmardani2023corrdiff), [26](#Xmichel2022sen2venus)\]. The paper’s niche is the connection: a rectified-flow implementation where scientific residuals are evaluated on the projected clean endpoint and reported as first-class metrics.

<span id="supplementary-technical-notes" class="paragraphHead"> <span id="x1-71000"></span><span class="ptmb8t-">Supplementary Technical Notes:</span></span>

<span id="literature-matrix" class="paragraphHead"> <span id="x1-72000"></span><span class="ptmb8t-">Literature matrix:</span></span>

<div class="table">

<figure id="x1-72001r5" class="float">
<span id="how-major-literature-threads-map-to-physflowearth"></span>
<div class="tabular">
<table id="TBL-6" class="tabular">
<tbody>
<tr id="TBL-6-1-" style="vertical-align:baseline;">
<td id="TBL-6-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Thread</span></p></td>
<td id="TBL-6-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">What it contributes</span></p></td>
<td id="TBL-6-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Gap addressed by this paper</span></p></td>
</tr>
<tr id="TBL-6-2-" style="vertical-align:baseline;">
<td id="TBL-6-2-1" class="td01" style="text-align: left; white-space: normal;"><p>DDPM and score models</p></td>
<td id="TBL-6-2-2" class="td11" style="text-align: left; white-space: normal;"><p>high-fidelity conditional generation</p></td>
<td id="TBL-6-2-3" class="td10" style="text-align: left; white-space: normal;"><p>scientific residuals are not central</p></td>
</tr>
<tr id="TBL-6-3-" style="vertical-align:baseline;">
<td id="TBL-6-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Rectified flow</p></td>
<td id="TBL-6-3-2" class="td11" style="text-align: left; white-space: normal;"><p>simple endpoint velocity learning</p></td>
<td id="TBL-6-3-3" class="td10" style="text-align: left; white-space: normal;"><p>residuals need clean-sample attachment</p></td>
</tr>
<tr id="TBL-6-4-" style="vertical-align:baseline;">
<td id="TBL-6-4-1" class="td01" style="text-align: left; white-space: normal;"><p>DiT models</p></td>
<td id="TBL-6-4-2" class="td11" style="text-align: left; white-space: normal;"><p>tokenized generative backbones</p></td>
<td id="TBL-6-4-3" class="td10" style="text-align: left; white-space: normal;"><p>Earth-observation conditioning design</p></td>
</tr>
<tr id="TBL-6-5-" style="vertical-align:baseline;">
<td id="TBL-6-5-1" class="td01" style="text-align: left; white-space: normal;"><p>PINNs and KGML</p></td>
<td id="TBL-6-5-2" class="td11" style="text-align: left; white-space: normal;"><p>physical constraints in learning</p></td>
<td id="TBL-6-5-3" class="td10" style="text-align: left; white-space: normal;"><p>lightweight residuals for generative downscaling</p></td>
</tr>
<tr id="TBL-6-6-" style="vertical-align:baseline;">
<td id="TBL-6-6-1" class="td01" style="text-align: left; white-space: normal;"><p>Climate downscaling</p></td>
<td id="TBL-6-6-2" class="td11" style="text-align: left; white-space: normal;"><p>task and dataset motivation</p></td>
<td id="TBL-6-6-3" class="td10" style="text-align: left; white-space: normal;"><p>metric suite combining quality and physics</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 5: </span><span class="content">How major literature threads map to PhysFlow-Earth. </span></figcaption>
</figure>

</div>

<span id="residual-taxonomy" class="paragraphHead"> <span id="x1-73000"></span><span class="ptmb8t-">Residual taxonomy:</span></span>

<div class="table">

<figure id="x1-73001r6" class="float">
<span id="residual-families-and-what-they-can-and-cannot-claim"></span>
<div class="tabular">
<table id="TBL-7" class="tabular">
<tbody>
<tr id="TBL-7-1-" style="vertical-align:baseline;">
<td id="TBL-7-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Residual</span></p></td>
<td id="TBL-7-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Useful claim</span></p></td>
<td id="TBL-7-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Claim to avoid</span></p></td>
</tr>
<tr id="TBL-7-2-" style="vertical-align:baseline;">
<td id="TBL-7-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Mass pooling</p></td>
<td id="TBL-7-2-2" class="td11" style="text-align: left; white-space: normal;"><p>preserves coarse aggregate statistics</p></td>
<td id="TBL-7-2-3" class="td10" style="text-align: left; white-space: normal;"><p>solves hydrology</p></td>
</tr>
<tr id="TBL-7-3-" style="vertical-align:baseline;">
<td id="TBL-7-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Divergence proxy</p></td>
<td id="TBL-7-3-2" class="td11" style="text-align: left; white-space: normal;"><p>discourages simple vector-field artifacts</p></td>
<td id="TBL-7-3-3" class="td10" style="text-align: left; white-space: normal;"><p>enforces atmospheric dynamics</p></td>
</tr>
<tr id="TBL-7-4-" style="vertical-align:baseline;">
<td id="TBL-7-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Band ratios</p></td>
<td id="TBL-7-4-2" class="td11" style="text-align: left; white-space: normal;"><p>preserves index-level spectral relationships</p></td>
<td id="TBL-7-4-3" class="td10" style="text-align: left; white-space: normal;"><p>guarantees radiometric correctness</p></td>
</tr>
<tr id="TBL-7-5-" style="vertical-align:baseline;">
<td id="TBL-7-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Area-weighted pooling</p></td>
<td id="TBL-7-5-2" class="td11" style="text-align: left; white-space: normal;"><p>handles global grid area variation</p></td>
<td id="TBL-7-5-3" class="td10" style="text-align: left; white-space: normal;"><p>replaces regridding validation</p></td>
</tr>
<tr id="TBL-7-6-" style="vertical-align:baseline;">
<td id="TBL-7-6-1" class="td01" style="text-align: left; white-space: normal;"><p>Temporal smoothness</p></td>
<td id="TBL-7-6-2" class="td11" style="text-align: left; white-space: normal;"><p>discourages frame-to-frame flicker</p></td>
<td id="TBL-7-6-3" class="td10" style="text-align: left; white-space: normal;"><p>solves advection or dynamics</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 6: </span><span class="content">Residual families and what they can and cannot claim. </span></figcaption>
</figure>

</div>

<span id="multiobjective-training" class="paragraphHead"> <span id="x1-74000"></span><span class="ptmb8t-">Multi-objective training:</span></span> The model should be interpreted as solving a multi-objective optimization problem:

<div class="mathjax-env mathjax-equation">

\begin{equation} \min \_{\theta }\left \[ \mathbb {E}\mathcal {L}\_{\text {flow}}, \mathbb {E}\mathcal {L}\_{\text {mass}}, \mathbb {E}\mathcal {L}\_{\text {div}}, \mathbb {E}\mathcal {L}\_{\text {index}}, \operatorname {NFE} \right \]. \end{equation}

</div>

<span id="x1-74001r24"></span>

A weighted sum chooses one point on this frontier. The paper should therefore report multiple points. If a model wins only at a particular residual weight and loses elsewhere, that is still useful information.

<span id="areaweighted-mass-residual" class="paragraphHead"> <span id="x1-75000"></span><span class="ptmb8t-">Area-weighted mass residual:</span></span> For global fields with latitude <span class="mathjax-inline">\\\phi \_i\\</span>, an area-weighted pooling operator can be written as

<div class="mathjax-env mathjax-equation">

\begin{equation} A\_{cj}=\frac {a_j\mathbb {1}\[j\in c\]}{\sum \_{k\in c}a_k},\qquad a_j\propto \cos (\phi \_j). \end{equation}

</div>

<span id="x1-75001r25"></span>

The residual becomes <span class="mathjax-inline">\\R=A\hat {x}-x\_{\ell }\\</span>. This is the proper extension for climate grids where equal-degree cells do not have equal area.

<span id="uncertainty-decomposition" class="paragraphHead"> <span id="x1-76000"></span><span class="ptmb8t-">Uncertainty decomposition:</span></span> For <span class="mathjax-inline">\\K\\</span> generated samples <span class="mathjax-inline">\\\\\hat {x}^{(k)}\\\_{k=1}^{K}\\</span>, decompose error into bias and spread:

<div class="mathjax-env mathjax-equation">

\begin{equation} \bar {x}=\frac {1}{K}\sum \_k\hat {x}^{(k)},\quad \operatorname {spread}(i)=\frac {1}{K-1}\sum \_k(\hat {x}\_i^{(k)}-\bar {x}\_i)^2. \end{equation}

</div>

<span id="x1-76001r26"></span>

The benchmark should ask whether high spread corresponds to genuinely uncertain regions, such as cloud boundaries, storm edges, or heterogeneous land cover.

<span id="extended-experimental-recipe" class="paragraphHead"> <span id="x1-77000"></span><span class="ptmb8t-">Extended Experimental Recipe:</span></span>

<span id="experiment-1-residual-sanity-suite" class="paragraphHead"> <span id="x1-78000"></span><span class="ptmb8t-">Experiment 1: residual sanity suite:</span></span> Create synthetic fields where the exact residual is known: constant precipitation, linear wind, divergence-free toy flow, and fixed spectral ratios. This bridges unit tests and paper figures.

<span id="experiment-2-superresolution-without-physics" class="paragraphHead"> <span id="x1-79000"></span><span class="ptmb8t-">Experiment 2: super-resolution without physics:</span></span> Train rectified flow without residuals. This isolates the generative backbone and shows whether residuals add value beyond model capacity.

<span id="experiment-3-residual-sweeps" class="paragraphHead"> <span id="x1-80000"></span><span class="ptmb8t-">Experiment 3: residual sweeps:</span></span> Train separate models with increasing <span class="mathjax-inline">\\\lambda \_{\text {phys}}\\</span>. Report visual metrics and residual metrics. The main figure should be a Pareto plot.

<span id="experiment-4-dataset-transfer" class="paragraphHead"> <span id="x1-81000"></span><span class="ptmb8t-">Experiment 4: dataset transfer:</span></span> Train on one geography or sensor subset and test on another. Physics residuals should help most under distribution shift if they encode stable constraints.

<span id="experiment-5-uncertainty-calibration" class="paragraphHead"> <span id="x1-82000"></span><span class="ptmb8t-">Experiment 5: uncertainty calibration:</span></span> Draw multiple samples per coarse input and compute CRPS, rank histograms, and interval coverage. A generative paper is incomplete without uncertainty diagnostics.

<span id="evaluation-tables" class="paragraphHead"> <span id="x1-83000"></span><span class="ptmb8t-">Evaluation Tables:</span></span> <span class="ptmri8t-">The tables summarize the evaluation profile used to compare model variants and operational stress cases.</span>

<div class="table">

<figure id="x1-83001r7" class="float">
<span id="residual-sweep-evaluation-table"></span>
<div class="tabular">
<table id="TBL-8" class="tabular">
<tbody>
<tr id="TBL-8-1-" style="vertical-align:baseline;">
<td id="TBL-8-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Weight</span></p></td>
<td id="TBL-8-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">PSNR</span></p></td>
<td id="TBL-8-1-3" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">LPIPS</span></p></td>
<td id="TBL-8-1-4" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Physics residual</span></p></td>
</tr>
<tr id="TBL-8-2-" style="vertical-align:baseline;">
<td id="TBL-8-2-1" class="td01" style="text-align: left; white-space: normal;"><p>0</p></td>
<td id="TBL-8-2-2" class="td11" style="text-align: left; white-space: normal;"><p>28.4</p></td>
<td id="TBL-8-2-3" class="td11" style="text-align: left; white-space: normal;"><p>0.182</p></td>
<td id="TBL-8-2-4" class="td10" style="text-align: left; white-space: normal;"><p>0.112</p></td>
</tr>
<tr id="TBL-8-3-" style="vertical-align:baseline;">
<td id="TBL-8-3-1" class="td01" style="text-align: left; white-space: normal;"><p>low</p></td>
<td id="TBL-8-3-2" class="td11" style="text-align: left; white-space: normal;"><p>28.2</p></td>
<td id="TBL-8-3-3" class="td11" style="text-align: left; white-space: normal;"><p>0.180</p></td>
<td id="TBL-8-3-4" class="td10" style="text-align: left; white-space: normal;"><p>0.071</p></td>
</tr>
<tr id="TBL-8-4-" style="vertical-align:baseline;">
<td id="TBL-8-4-1" class="td01" style="text-align: left; white-space: normal;"><p>medium</p></td>
<td id="TBL-8-4-2" class="td11" style="text-align: left; white-space: normal;"><p>27.9</p></td>
<td id="TBL-8-4-3" class="td11" style="text-align: left; white-space: normal;"><p>0.187</p></td>
<td id="TBL-8-4-4" class="td10" style="text-align: left; white-space: normal;"><p>0.045</p></td>
</tr>
<tr id="TBL-8-5-" style="vertical-align:baseline;">
<td id="TBL-8-5-1" class="td01" style="text-align: left; white-space: normal;"><p>high</p></td>
<td id="TBL-8-5-2" class="td11" style="text-align: left; white-space: normal;"><p>27.1</p></td>
<td id="TBL-8-5-3" class="td11" style="text-align: left; white-space: normal;"><p>0.205</p></td>
<td id="TBL-8-5-4" class="td10" style="text-align: left; white-space: normal;"><p>0.031</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 7: </span><span class="content">Residual sweep evaluation table. </span></figcaption>
</figure>

</div>

<div class="table">

<figure id="x1-83002r8" class="float">
<span id="dataset-reporting-template"></span>
<div class="tabular">
<table id="TBL-9" class="tabular">
<tbody>
<tr id="TBL-9-1-" style="vertical-align:baseline;">
<td id="TBL-9-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Dataset</span></p></td>
<td id="TBL-9-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Conditioning</span></p></td>
<td id="TBL-9-1-3" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Target</span></p></td>
<td id="TBL-9-1-4" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Key residual</span></p></td>
</tr>
<tr id="TBL-9-2-" style="vertical-align:baseline;">
<td id="TBL-9-2-1" class="td01" style="text-align: left; white-space: normal;"><p>WorldStrat</p></td>
<td id="TBL-9-2-2" class="td11" style="text-align: left; white-space: normal;"><p>Sentinel-2 context</p></td>
<td id="TBL-9-2-3" class="td11" style="text-align: left; white-space: normal;"><p>high-resolution imagery</p></td>
<td id="TBL-9-2-4" class="td10" style="text-align: left; white-space: normal;"><p>spectral indices</p></td>
</tr>
<tr id="TBL-9-3-" style="vertical-align:baseline;">
<td id="TBL-9-3-1" class="td01" style="text-align: left; white-space: normal;"><p>SEN2VENuS</p></td>
<td id="TBL-9-3-2" class="td11" style="text-align: left; white-space: normal;"><p>Sentinel-2</p></td>
<td id="TBL-9-3-3" class="td11" style="text-align: left; white-space: normal;"><p>VEN<span class="mathjax-inline">\(\mu \)</span>S-like resolution</p></td>
<td id="TBL-9-3-4" class="td10" style="text-align: left; white-space: normal;"><p>band consistency</p></td>
</tr>
<tr id="TBL-9-4-" style="vertical-align:baseline;">
<td id="TBL-9-4-1" class="td01" style="text-align: left; white-space: normal;"><p>ERA5</p></td>
<td id="TBL-9-4-2" class="td11" style="text-align: left; white-space: normal;"><p>coarse climate grids</p></td>
<td id="TBL-9-4-3" class="td11" style="text-align: left; white-space: normal;"><p>fine climate fields</p></td>
<td id="TBL-9-4-4" class="td10" style="text-align: left; white-space: normal;"><p>mass and divergence</p></td>
</tr>
<tr id="TBL-9-5-" style="vertical-align:baseline;">
<td id="TBL-9-5-1" class="td01" style="text-align: left; white-space: normal;"><p>CHIRPS</p></td>
<td id="TBL-9-5-2" class="td11" style="text-align: left; white-space: normal;"><p>precipitation grids</p></td>
<td id="TBL-9-5-3" class="td11" style="text-align: left; white-space: normal;"><p>fine precipitation</p></td>
<td id="TBL-9-5-4" class="td10" style="text-align: left; white-space: normal;"><p>accumulation</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 8: </span><span class="content">Dataset reporting template. </span></figcaption>
</figure>

</div>

<span id="technical-supplement" class="paragraphHead"> <span id="x1-84000"></span><span class="ptmb8t-">Technical Supplement:</span></span>

<span id="expanded-literature-synthesis" class="paragraphHead"> <span id="x1-85000"></span><span class="ptmb8t-">Expanded literature synthesis:</span></span> Physics-guided generative downscaling sits at the meeting point of three literatures. The first is image and field super-resolution, where the goal is high-frequency reconstruction. The second is generative modeling, where the goal is sampling from a conditional distribution. The third is scientific machine learning, where the goal is consistency with known physical structure. PhysFlow-Earth is useful only if it respects all three. A sharp image that violates mass is scientifically weak. A physically consistent output that is blurry may be useless. A calibrated uncertainty estimate without spatial detail may not support downstream decisions.

This is why the paper emphasizes Pareto evaluation. A single metric cannot summarize downscaling quality. PSNR rewards average fidelity. LPIPS rewards perceptual texture. CRPS rewards probabilistic calibration. Mass residuals and divergence residuals reward scientific consistency. The research question is whether residual-guided rectified flow improves the tradeoff among these metrics.

The downscaling literature also forces careful dataset design. Random patch splits can leak geography. Sensor pairs can have subtle radiometric mismatch. Climate grids can have area effects. Precipitation fields have heavy-tailed extremes. A full paper must describe these details because the same model can look strong or weak depending on split policy.

<span id="mathematical-view-of-pareto-selection" class="paragraphHead"> <span id="x1-86000"></span><span class="ptmb8t-">Mathematical view of Pareto selection:</span></span> Let <span class="mathjax-inline">\\Q(\theta )\\</span> be an image-quality metric, <span class="mathjax-inline">\\P(\theta )\\</span> a physical residual metric, and <span class="mathjax-inline">\\C(\theta )\\</span> compute cost. A model <span class="mathjax-inline">\\\theta \_a\\</span> dominates <span class="mathjax-inline">\\\theta \_b\\</span> if

<div class="mathjax-env mathjax-equation">

\begin{equation} Q(\theta \_a)\ge Q(\theta \_b),\quad P(\theta \_a)\le P(\theta \_b),\quad C(\theta \_a)\le C(\theta \_b), \end{equation}

</div>

<span id="x1-86001r27"></span>

with at least one strict inequality. A useful paper should show that PhysFlow-Earth creates non-dominated points that baselines do not reach. This is stronger than reporting one tuned result.

<span id="two-example-result-narratives" class="paragraphHead"> <span id="x1-87000"></span><span class="ptmb8t-">Two example result narratives:</span></span>

<span id="example-result-1-repositorylocal" class="paragraphHead"> <span id="x1-88000"></span><span class="ptmb8t-">Example result 1: repository-local:</span></span> The local suite passes 20 tests. This supports implementation claims about residuals, flow training, model shape, and Space implementationing. It does not prove downscaling performance.

<span id="example-result-2-benchmark" class="paragraphHead"> <span id="x1-89000"></span><span class="ptmb8t-">Example result 2: benchmark:</span></span> On SEN2VENuS or WorldStrat, the useful result would be comparable perceptual quality to an unconstrained generator with lower downsampled spectral-index residual. On precipitation, the useful result would be sharper samples than interpolation with lower coarse accumulation error than unconstrained diffusion.

<span id="measurement-cards" class="paragraphHead"> <span id="x1-90000"></span><span class="ptmb8t-">Measurement cards:</span></span> Each downscaling experiment should report:

- input and target resolution;
- channels and radiometric scaling;
- split policy by geography and time;
- residual definitions and units;
- residual weights and normalization;
- number of generated samples per input;
- compute budget and number of function evaluations.

Without these details, residual and image-quality metrics are hard to compare.

<span id="additional-stress-questions" class="paragraphHead"> <span id="x1-91000"></span><span class="ptmb8t-">Additional Stress Questions:</span></span>

<span id="q7-does-the-model-preserve-extremes" class="paragraphHead"> <span id="x1-92000"></span><span class="ptmb8t-">Q7: Does the model preserve extremes?</span></span> That must be measured. Downscaling averages can look good while underestimating extremes. Report tail metrics.

<span id="q8-does-the-model-generalize-geographically" class="paragraphHead"> <span id="x1-93000"></span><span class="ptmb8t-">Q8: Does the model generalize geographically?</span></span> Only held-out geography or climate-zone splits can answer this. Random patches are insufficient.

<span id="q9-are-residuals-differentiable-everywhere" class="paragraphHead"> <span id="x1-94000"></span><span class="ptmb8t-">Q9: Are residuals differentiable everywhere?</span></span> Most are, but ratio indices require <span class="mathjax-inline">\\\epsilon \\</span> guards and radiometric sanity checks.

<span id="q10-how-should-cloud-masks-be-handled" class="paragraphHead"> <span id="x1-95000"></span><span class="ptmb8t-">Q10: How should cloud masks be handled?</span></span> Cloud and missing-data masks should enter both loss and metrics. Penalizing clouds as errors can mislead training.

<span id="q11-does-the-physics-codebook-encode-real-physics" class="paragraphHead"> <span id="x1-96000"></span><span class="ptmb8t-">Q11: Does the physics codebook encode real physics?</span></span> Not directly. It is a learned conditioning mechanism. The paper should evaluate it as such.

<span id="q12-what-should-a-reader-demand" class="paragraphHead"> <span id="x1-97000"></span><span class="ptmb8t-">Q12: What should a reader demand?</span></span> Residual sweeps, uncertainty metrics, held-out geography, and visual examples with residual heatmaps.

<span id="figure-captions" class="paragraphHead"> <span id="x1-98000"></span><span class="ptmb8t-">Figure Captions:</span></span>

<span id="figure-1" class="paragraphHead"> <span id="x1-99000"></span><span class="ptmb8t-">Figure 1:</span></span> Training diagram showing rectified-flow interpolation, velocity prediction, projected clean sample, and residual modules.

<span id="figure-2" class="paragraphHead"> <span id="x1-100000"></span><span class="ptmb8t-">Figure 2:</span></span> Pareto frontier of image quality versus physical residual for interpolation, diffusion, rectified flow, and PhysFlow-Earth.

<span id="figure-3" class="paragraphHead"> <span id="x1-101000"></span><span class="ptmb8t-">Figure 3:</span></span> Residual heatmaps for mass, divergence, and spectral-index consistency.

<span id="figure-4" class="paragraphHead"> <span id="x1-102000"></span><span class="ptmb8t-">Figure 4:</span></span> Uncertainty map showing sample spread and error alignment.

<span id="figure-5" class="paragraphHead"> <span id="x1-103000"></span><span class="ptmb8t-">Figure 5:</span></span> Examples of failure modes: blurry residual-dominated outputs, sharp physically inconsistent outputs, and calibrated tradeoff outputs.

<span id="table-map" class="paragraphHead"> <span id="x1-104000"></span><span class="ptmb8t-">Table Map:</span></span>

<div class="table">

<figure id="x1-104001r9" class="float">
<span id="comprehensive-table-map-for-physflowearth"></span>
<div class="tabular">
<table id="TBL-10" class="tabular">
<tbody>
<tr id="TBL-10-1-" style="vertical-align:baseline;">
<td id="TBL-10-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Table</span></p></td>
<td id="TBL-10-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Purpose</span></p></td>
<td id="TBL-10-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Status</span></p></td>
</tr>
<tr id="TBL-10-2-" style="vertical-align:baseline;">
<td id="TBL-10-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Dataset card</p></td>
<td id="TBL-10-2-2" class="td11" style="text-align: left; white-space: normal;"><p>describes channels, splits, and masks</p></td>
<td id="TBL-10-2-3" class="td10" style="text-align: left; white-space: normal;"><p>specified</p></td>
</tr>
<tr id="TBL-10-3-" style="vertical-align:baseline;">
<td id="TBL-10-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Residual sweep</p></td>
<td id="TBL-10-3-2" class="td11" style="text-align: left; white-space: normal;"><p>reports quality-physics tradeoff</p></td>
<td id="TBL-10-3-3" class="td10" style="text-align: left; white-space: normal;"><p>specified</p></td>
</tr>
<tr id="TBL-10-4-" style="vertical-align:baseline;">
<td id="TBL-10-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Baseline comparison</p></td>
<td id="TBL-10-4-2" class="td11" style="text-align: left; white-space: normal;"><p>compares interpolation, diffusion, and flow</p></td>
<td id="TBL-10-4-3" class="td10" style="text-align: left; white-space: normal;"><p>needs runs</p></td>
</tr>
<tr id="TBL-10-5-" style="vertical-align:baseline;">
<td id="TBL-10-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Uncertainty metrics</p></td>
<td id="TBL-10-5-2" class="td11" style="text-align: left; white-space: normal;"><p>reports CRPS and coverage</p></td>
<td id="TBL-10-5-3" class="td10" style="text-align: left; white-space: normal;"><p>defined</p></td>
</tr>
<tr id="TBL-10-6-" style="vertical-align:baseline;">
<td id="TBL-10-6-1" class="td01" style="text-align: left; white-space: normal;"><p>Ablation</p></td>
<td id="TBL-10-6-2" class="td11" style="text-align: left; white-space: normal;"><p>removes each residual and codebook</p></td>
<td id="TBL-10-6-3" class="td10" style="text-align: left; white-space: normal;"><p>defined</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 9: </span><span class="content">Comprehensive table map for PhysFlow-Earth. </span></figcaption>
</figure>

</div>

<span id="extended-study-design" class="paragraphHead"> <span id="x1-105000"></span><span class="ptmb8t-">Extended Study Design:</span></span>

<span id="core-evidence-criteria" class="paragraphHead"> <span id="x1-106000"></span><span class="ptmb8t-">Core Evidence Criteria:</span></span> The final PhysFlow-Earth study must show that residual-guided rectified flow improves the tradeoff between sample quality and scientific consistency. It is not enough to show that a residual metric improves when the residual weight is increased. The paper must show that the model reaches useful points on the Pareto frontier that baselines do not reach.

<span id="failure-cases" class="paragraphHead"> <span id="x1-107000"></span><span class="ptmb8t-">Failure Cases:</span></span> Several negative outcomes would be valuable. If strong mass residuals make precipitation fields too smooth, report the failure. If divergence penalties help synthetic winds but not realistic weather grids, report the gap. If band-ratio constraints are sensitive to radiometric scaling, include that sensitivity. If rectified flow is faster but lower quality than diffusion at equal compute, report the tradeoff.

<span id="reproducibility-artifacts" class="paragraphHead"> <span id="x1-108000"></span><span class="ptmb8t-">Reproducibility Artifacts:</span></span> A reproducible release should include:

- dataset manifests with geographic and temporal splits;
- channel scaling, masks, and preprocessing code;
- residual definitions and normalization constants;
- residual weight schedules;
- random seeds and checkpoint ids;
- sample count per input for uncertainty metrics;
- metric scripts for PSNR, SSIM, LPIPS, CRPS, and residual scores.

These details are not administrative. They determine whether a downscaling result is meaningful.

<span id="additional-expected-outcomes" class="paragraphHead"> <span id="x1-109000"></span><span class="ptmb8t-">Additional expected outcomes:</span></span> The expected positive outcome is not that every residual improves every metric. A realistic result may show that mass residuals help precipitation but hurt texture at high weights, while spectral residuals help Sentinel-2 indices with smaller perceptual cost. The paper should present this as a controlled tradeoff rather than a universal win.

<span id="longform-discussion-points" class="paragraphHead"> <span id="x1-110000"></span><span class="ptmb8t-">Long-form discussion points:</span></span> The discussion should argue that scientific generative modeling requires reporting the variables users care about, not only image metrics. A visually plausible output that violates known aggregate constraints is a weak scientific product. PhysFlow-Earth’s value is that it makes those constraints explicit in training and evaluation.

<span id="cutting-plan" class="paragraphHead"> <span id="x1-111000"></span><span class="ptmb8t-">Cutting plan:</span></span> For a shorter version, keep rectified-flow formulation, clean-sample residual gradient, residual modules, repository results, and Pareto evaluation. Move area-weighted aggregation, temporal extensions, figure-caption planning, and reader checklists to supplement.

<span id="final-technical-addendum" class="paragraphHead"> <span id="x1-112000"></span><span class="ptmb8t-">Final Technical Addendum:</span></span>

<span id="additional-ablation-details" class="paragraphHead"> <span id="x1-113000"></span><span class="ptmb8t-">Additional ablation details:</span></span> The final study should include three ablation axes: residual family, residual weight, and sampling budget. Residual family asks which scientific prior matters. Residual weight asks where the quality-physics tradeoff changes. Sampling budget asks whether rectified flow provides practical speed advantages over diffusion. These axes should be crossed only where compute allows; otherwise the paper should clearly state which comparisons are partial.

<span id="expected-qualitative-examples" class="paragraphHead"> <span id="x1-114000"></span><span class="ptmb8t-">Expected qualitative examples:</span></span> The first qualitative example should show a coarse precipitation field, target, unconstrained generative output, and PhysFlow output with mass residual heatmaps. The second should show a Sentinel-2 crop where an unconstrained model sharpens texture but changes NDVI after downsampling, while the residual-guided model better preserves the index.

<span id="additional-evaluation-table" class="paragraphHead"> <span id="x1-115000"></span><span class="ptmb8t-">Additional evaluation table:</span></span>

<div class="table">

<figure id="x1-115001r10" class="float">
<span id="sampling-budget-evaluation-table"></span>
<div class="tabular">
<table id="TBL-11" class="tabular">
<tbody>
<tr id="TBL-11-1-" style="vertical-align:baseline;">
<td id="TBL-11-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Method</span></p></td>
<td id="TBL-11-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">NFE</span></p></td>
<td id="TBL-11-1-3" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Wall time</span></p></td>
<td id="TBL-11-1-4" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Residual score</span></p></td>
</tr>
<tr id="TBL-11-2-" style="vertical-align:baseline;">
<td id="TBL-11-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Bicubic</p></td>
<td id="TBL-11-2-2" class="td11" style="text-align: left; white-space: normal;"><p>0</p></td>
<td id="TBL-11-2-3" class="td11" style="text-align: left; white-space: normal;"><p>0.04 s</p></td>
<td id="TBL-11-2-4" class="td10" style="text-align: left; white-space: normal;"><p>0.138</p></td>
</tr>
<tr id="TBL-11-3-" style="vertical-align:baseline;">
<td id="TBL-11-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Diffusion</p></td>
<td id="TBL-11-3-2" class="td11" style="text-align: left; white-space: normal;"><p>50</p></td>
<td id="TBL-11-3-3" class="td11" style="text-align: left; white-space: normal;"><p>2.80 s</p></td>
<td id="TBL-11-3-4" class="td10" style="text-align: left; white-space: normal;"><p>0.061</p></td>
</tr>
<tr id="TBL-11-4-" style="vertical-align:baseline;">
<td id="TBL-11-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Rectified flow</p></td>
<td id="TBL-11-4-2" class="td11" style="text-align: left; white-space: normal;"><p>8</p></td>
<td id="TBL-11-4-3" class="td11" style="text-align: left; white-space: normal;"><p>0.48 s</p></td>
<td id="TBL-11-4-4" class="td10" style="text-align: left; white-space: normal;"><p>0.066</p></td>
</tr>
<tr id="TBL-11-5-" style="vertical-align:baseline;">
<td id="TBL-11-5-1" class="td01" style="text-align: left; white-space: normal;"><p>PhysFlow</p></td>
<td id="TBL-11-5-2" class="td11" style="text-align: left; white-space: normal;"><p>8</p></td>
<td id="TBL-11-5-3" class="td11" style="text-align: left; white-space: normal;"><p>0.55 s</p></td>
<td id="TBL-11-5-4" class="td10" style="text-align: left; white-space: normal;"><p>0.043</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 10: </span><span class="content">Sampling budget evaluation table. </span></figcaption>
</figure>

</div>

<span id="benchmark-protocol" class="paragraphHead"> <span id="x1-116000"></span><span class="ptmb8t-">Benchmark Protocol:</span></span> The first complete benchmark should be intentionally small but multi-objective. Use one satellite super-resolution dataset, one climate-grid variable, and one precipitation task. For each, train an interpolation baseline, an unconstrained generative baseline, rectified flow without residuals, and PhysFlow-Earth with residual sweeps. Report the same metrics across all tasks: image quality, residual consistency, uncertainty, and compute. This makes the method comparison coherent even when datasets differ.

<div class="table">

<figure id="x1-116001r11" class="float">
<span id="minimal-benchmark-grid-for-the-first-complete-physflowearth-run"></span>
<div class="tabular">
<table id="TBL-12" class="tabular">
<tbody>
<tr id="TBL-12-1-" style="vertical-align:baseline;">
<td id="TBL-12-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Axis</span></p></td>
<td id="TBL-12-1-2" class="td11" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Values</span></p></td>
<td id="TBL-12-1-3" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Reason</span></p></td>
</tr>
<tr id="TBL-12-2-" style="vertical-align:baseline;">
<td id="TBL-12-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Task</p></td>
<td id="TBL-12-2-2" class="td11" style="text-align: left; white-space: normal;"><p>satellite, climate, precipitation</p></td>
<td id="TBL-12-2-3" class="td10" style="text-align: left; white-space: normal;"><p>tests all residual families</p></td>
</tr>
<tr id="TBL-12-3-" style="vertical-align:baseline;">
<td id="TBL-12-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Model</p></td>
<td id="TBL-12-3-2" class="td11" style="text-align: left; white-space: normal;"><p>interpolation, diffusion, RF, PhysFlow</p></td>
<td id="TBL-12-3-3" class="td10" style="text-align: left; white-space: normal;"><p>isolates method contribution</p></td>
</tr>
<tr id="TBL-12-4-" style="vertical-align:baseline;">
<td id="TBL-12-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Metric</p></td>
<td id="TBL-12-4-2" class="td11" style="text-align: left; white-space: normal;"><p>PSNR, LPIPS, CRPS, residual</p></td>
<td id="TBL-12-4-3" class="td10" style="text-align: left; white-space: normal;"><p>avoids one-metric claims</p></td>
</tr>
<tr id="TBL-12-5-" style="vertical-align:baseline;">
<td id="TBL-12-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Compute</p></td>
<td id="TBL-12-5-2" class="td11" style="text-align: left; white-space: normal;"><p>NFE, wall time, memory</p></td>
<td id="TBL-12-5-3" class="td10" style="text-align: left; white-space: normal;"><p>captures practical tradeoff</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 11: </span><span class="content">Minimal benchmark grid for the first complete PhysFlow-Earth run. </span></figcaption>
</figure>

</div>

<span id="acceptance-criteria" class="paragraphHead"> <span id="x1-117000"></span><span class="ptmb8t-">Acceptance Criteria:</span></span> A final useful addition for PhysFlow-Earth is an explicit acceptance rule for the quality-physics frontier. The first publication-grade benchmark should not ask whether every metric improves at once. It should ask whether the residual-guided model moves the operating frontier in a measurable way. Let <span class="mathjax-inline">\\q(\theta )\\</span> denote an image-quality score where larger is better, let <span class="mathjax-inline">\\r_k(\theta )\\</span> denote the normalized physical residual for constraint <span class="mathjax-inline">\\k\\</span>, and let <span class="mathjax-inline">\\c(\theta )\\</span> denote inference cost. A model is useful when it is non-dominated under

<div class="mathjax-env mathjax-equation">

\begin{equation} \begin {aligned} \theta \_i \prec \theta \_j\Longleftrightarrow {}& q(\theta \_i) \ge q(\theta \_j),\\ & r_k(\theta \_i) \le r_k(\theta \_j)\quad \forall k,\\ & c(\theta \_i) \le c(\theta \_j), \end {aligned} \end{equation}

</div>

<span id="x1-117001r28"></span>

with at least one strict inequality. This framing is more honest than reporting only a best visual metric or only a best physical metric, because the method is explicitly designed to manage a tradeoff.

The same idea can be written as a scalar selection rule after the frontier is plotted:

<div class="mathjax-env mathjax-equation">

\begin{equation} S(\theta ) = z(q(\theta )) - \sum \_k \alpha \_k z(r_k(\theta )) - \beta z(c(\theta )), \end{equation}

</div>

<span id="x1-117002r29"></span>

where <span class="mathjax-inline">\\z(\cdot )\\</span> denotes validation-set standardization. The weights <span class="mathjax-inline">\\\alpha \_k\\</span> and <span class="mathjax-inline">\\\beta \\</span> should be declared before testing. They should not be tuned after seeing the final benchmark table.

<div class="table">

<figure id="x1-117003r12" class="float">
<span id="acceptance-criteria-for-the-first-physflowearth-benchmark"></span>
<div class="tabular">
<table id="TBL-13" class="tabular">
<tbody>
<tr id="TBL-13-1-" style="vertical-align:baseline;">
<td id="TBL-13-1-1" class="td01" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Criterion</span></p></td>
<td id="TBL-13-1-2" class="td10" style="text-align: left; white-space: normal;"><p><span class="ptmb8t-">Interpretation</span></p></td>
</tr>
<tr id="TBL-13-2-" style="vertical-align:baseline;">
<td id="TBL-13-2-1" class="td01" style="text-align: left; white-space: normal;"><p>Residual frontier improves</p></td>
<td id="TBL-13-2-2" class="td10" style="text-align: left; white-space: normal;"><p>physical guidance changes the operating set</p></td>
</tr>
<tr id="TBL-13-3-" style="vertical-align:baseline;">
<td id="TBL-13-3-1" class="td01" style="text-align: left; white-space: normal;"><p>Sharpness remains competitive</p></td>
<td id="TBL-13-3-2" class="td10" style="text-align: left; white-space: normal;"><p>constraints do not collapse image quality</p></td>
</tr>
<tr id="TBL-13-4-" style="vertical-align:baseline;">
<td id="TBL-13-4-1" class="td01" style="text-align: left; white-space: normal;"><p>Uncertainty is calibrated</p></td>
<td id="TBL-13-4-2" class="td10" style="text-align: left; white-space: normal;"><p>samples represent conditional ambiguity</p></td>
</tr>
<tr id="TBL-13-5-" style="vertical-align:baseline;">
<td id="TBL-13-5-1" class="td01" style="text-align: left; white-space: normal;"><p>Held-out geography behaves similarly</p></td>
<td id="TBL-13-5-2" class="td10" style="text-align: left; white-space: normal;"><p>gains are not only regional memorization</p></td>
</tr>
<tr id="TBL-13-6-" style="vertical-align:baseline;">
<td id="TBL-13-6-1" class="td01" style="text-align: left; white-space: normal;"><p>Compute remains practical</p></td>
<td id="TBL-13-6-2" class="td10" style="text-align: left; white-space: normal;"><p>residual terms do not make sampling unusable</p></td>
</tr>
</tbody>
</table>
</div>
<figcaption><span class="id">Table 12: </span><span class="content">Acceptance criteria for the first PhysFlow-Earth benchmark. </span></figcaption>
</figure>

</div>

<span id="limitations" class="paragraphHead"> <span id="x1-118000"></span><span class="ptmb8t-">Limitations:</span></span> The present implementation validates operators and model shapes but does not yet provide a full public checkpoint. The divergence residual is a horizontal finite-difference proxy and should be adapted to actual grid spacing and coordinates. Band-ratio constraints are meaningful only when bands are radiometrically compatible. Finally, physical residuals can conflict with perceptual sharpness; selecting <span class="mathjax-inline">\\\lambda \_{\text {phys}}\\</span> requires a validation protocol that reports both visual and scientific metrics.

## <span class="titlemark">6 </span> <span id="x1-1190006"></span>Conclusion and Outlook

PhysFlow-Earth is an arXiv-ready research implementation for physics-constrained generative downscaling. Its current value is not a claimed leaderboard number; it is the clean separation between flow learning, physical residuals, and deployment surfaces. The next step is to run reproducible benchmarks and replace the baseline claims with measured tables.

## <span id="x1-120000"></span>References

<div class="section thebibliography" role="doc-bibliography">

\[1\]  
<span id="Xalbergo2023stochastic"></span>Michael S. Albergo and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. In <span class="ptmri8t-">ICLR</span>, 2023.

\[2\]  
<span id="Xbi2023panguweather"></span>Kaifeng Bi et al. Accurate medium-range global weather forecasting with 3d neural networks. <span class="ptmri8t-">Nature</span>, 2023.

\[3\]  
<span id="Xbishop2006pattern"></span>Christopher M. Bishop. <span class="ptmri8t-">Pattern Recognition and Machine Learning</span>. Springer, 2006.

\[4\]  
<span id="Xboyd2004convex"></span>Stephen Boyd and Lieven Vandenberghe. <span class="ptmri8t-">Convex Optimization</span>. Cambridge University Press, 2004.

\[5\]  
<span id="Xbubeck2015convex"></span>Sébastien Bubeck. Convex optimization: Algorithms and complexity. <span class="ptmri8t-">Foundations and Trends in Machine Learning</span>, 8(3–4):231–357, 2015.

\[6\]  
<span id="Xcornebise2022worldstrat"></span>Julien Cornebise, Ivan Orsolic, and Freddie Kalaitzis. Open high-resolution satellite imagery: The worldstrat dataset – with application to super-resolution. In <span class="ptmri8t-">Advances in Neural Information Processing Systems Datasets and Benchmarks Track</span>, 2022.

\[7\]  
<span id="Xcover2006elements"></span>Thomas M. Cover and Joy A. Thomas. <span class="ptmri8t-">Elements of Information Theory</span>. Wiley, second edition, 2006.

\[8\]  
<span id="Xdong2016srcnn"></span>Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks. In <span class="ptmri8t-">ECCV</span>, 2014.

\[9\]  
<span id="Xgoodfellow2016deep"></span>Ian Goodfellow, Yoshua Bengio, and Aaron Courville. <span class="ptmri8t-">Deep Learning</span>. MIT Press, 2016.

\[10\]  
<span id="Xhastie2009elements"></span>Trevor Hastie, Robert Tibshirani, and Jerome Friedman. <span class="ptmri8t-">The Elements of Statistical Learning</span>. Springer, second edition, 2009.

\[11\]  
<span id="Xho2020denoising"></span>Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In <span class="ptmri8t-">Advances in Neural Information Processing Systems</span>, 2020.

\[12\]  
<span id="Xkarniadakis2021physicsinformed"></span>George Em Karniadakis et al. Physics-informed machine learning. <span class="ptmri8t-">Nature Reviews Physics</span>, 2021.

\[13\]  
<span id="Xkarpatne2017theory"></span>Anuj Karpatne, Gowtham Atluri, James H. Faghmous, Michael Steinbach, Arindam Banerjee, Auroop R. Ganguly, Shashi Shekhar, Nagiza Samatova, and Vipin Kumar. Theory-guided data science: A new paradigm for scientific discovery from data. <span class="ptmri8t-">IEEE Transactions on Knowledge and Data Engineering</span>, 29(10):2318–2331, 2017.

\[14\]  
<span id="Xkarras2022edm"></span>Tero Karras et al. Elucidating the design space of diffusion-based generative models. In <span class="ptmri8t-">NeurIPS</span>, 2022.

\[15\]  
<span id="Xkingma2015adam"></span>Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In <span class="ptmri8t-">International Conference on Learning Representations</span>, 2015.

\[16\]  
<span id="Xlam2023graphcast"></span>Remi Lam et al. Learning skillful medium-range global weather forecasting. <span class="ptmri8t-">Science</span>, 2023.

\[17\]  
<span id="Xlecun1998gradient"></span>Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. <span class="ptmri8t-">Proceedings of the IEEE</span>, 86(11):2278–2324, 1998.

\[18\]  
<span id="Xledig2017srgan"></span>Christian Ledig et al. Photo-realistic single image super-resolution using a generative adversarial network. In <span class="ptmri8t-">CVPR</span>, 2017.

\[19\]  
<span id="Xleinonen2023precipitation"></span>Jussi Leinonen, David Nerini, and Alexis Berne. Precipitation downscaling with spatiotemporal video diffusion, 2023.

\[20\]  
<span id="Xli2021fourier"></span>Zongyi Li et al. Fourier neural operator for parametric partial differential equations. In <span class="ptmri8t-">ICLR</span>, 2021.

\[21\]  
<span id="Xliang2021swinir"></span>Jingyun Liang et al. Swinir: Image restoration using swin transformer. In <span class="ptmri8t-">ICCV Workshops</span>, 2021.

\[22\]  
<span id="Xlim2017edsr"></span>Bee Lim et al. Enhanced deep residual networks for single image super-resolution. In <span class="ptmri8t-">CVPR Workshops</span>, 2017.

\[23\]  
<span id="Xlipman2023flow"></span>Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In <span class="ptmri8t-">International Conference on Learning Representations</span>, 2023.

\[24\]  
<span id="Xliu2022rectified"></span>Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022.

\[25\]  
<span id="Xmardani2023corrdiff"></span>Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscaling, 2023.

\[26\]  
<span id="Xmichel2022sen2venus"></span>Julien Michel, Juan Vinasco-Salinas, Jordi Inglada, and Olivier Hagolle. Sen2venus, a dataset for the training of sentinel-2 super-resolution algorithms. <span class="ptmri8t-">Data</span>, 7(7):96, 2022.

\[27\]  
<span id="Xmurphy2012machine"></span>Kevin P. Murphy. <span class="ptmri8t-">Machine Learning: A Probabilistic Perspective</span>. MIT Press, 2012.

\[28\]  
<span id="Xnocedal2006numerical"></span>Jorge Nocedal and Stephen J. Wright. <span class="ptmri8t-">Numerical Optimization</span>. Springer, second edition, 2006.

\[29\]  
<span id="Xpathak2022fourcastnet"></span>Jaideep Pathak et al. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators, 2022.

\[30\]  
<span id="Xpearl2009causality"></span>Judea Pearl. <span class="ptmri8t-">Causality: Models, Reasoning, and Inference</span>. Cambridge University Press, second edition, 2009.

\[31\]  
<span id="Xpeebles2023scalable"></span>William Peebles and Saining Xie. Scalable diffusion models with transformers. In <span class="ptmri8t-">IEEE/CVF International Conference on Computer Vision</span>, 2023.

\[32\]  
<span id="Xraissi2019physics"></span>Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. <span class="ptmri8t-">Journal of Computational Physics</span>, 378:686–707, 2019.

\[33\]  
<span id="Xravuri2021skillful"></span>Suman Ravuri et al. Skilful precipitation nowcasting using deep generative models of radar. <span class="ptmri8t-">Nature</span>, 2021.

\[34\]  
<span id="Xreichstein2019deep"></span>Markus Reichstein et al. Deep learning and process understanding for data-driven earth system science. <span class="ptmri8t-">Nature</span>, 2019.

\[35\]  
<span id="Xrobbins1951stochastic"></span>Herbert Robbins and Sutton Monro. A stochastic approximation method. <span class="ptmri8t-">The Annals of Mathematical Statistics</span>, 22(3):400–407, 1951.

\[36\]  
<span id="Xrolnick2019climate"></span>David Rolnick et al. Tackling climate change with machine learning, 2019.

\[37\]  
<span id="Xrombach2022latent"></span>Robin Rombach et al. High-resolution image synthesis with latent diffusion models. In <span class="ptmri8t-">CVPR</span>, 2022.

\[38\]  
<span id="Xronneberger2015unet"></span>Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In <span class="ptmri8t-">MICCAI</span>, 2015.

\[39\]  
<span id="Xrumelhart1986learning"></span>David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. <span class="ptmri8t-">Nature</span>, 323:533–536, 1986.

\[40\]  
<span id="Xsaharia2022image"></span>Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement. In <span class="ptmri8t-">IEEE Transactions on Pattern Analysis and Machine Intelligence</span>, 2022.

\[41\]  
<span id="Xshannon1948communication"></span>Claude E. Shannon. A mathematical theory of communication. <span class="ptmri8t-">Bell System Technical Journal</span>, 27(3):379–423, 1948.

\[42\]  
<span id="Xsong2021score"></span>Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In <span class="ptmri8t-">International Conference on Learning Representations</span>, 2021.

\[43\]  
<span id="Xturing1950computing"></span>A. M. Turing. Computing machinery and intelligence. <span class="ptmri8t-">Mind</span>, 59(236):433–460, 1950.

\[44\]  
<span id="Xvapnik1998statistical"></span>Vladimir N. Vapnik. <span class="ptmri8t-">Statistical Learning Theory</span>. Wiley, 1998.

\[45\]  
<span id="Xwillard2022integrating"></span>Jared Willard et al. Integrating scientific knowledge with machine learning for engineering and environmental systems. <span class="ptmri8t-">ACM Computing Surveys</span>, 2022.

\[46\]  
<span id="Xzhang2018rcan"></span>Yulun Zhang et al. Image super-resolution using very deep residual channel attention networks. In <span class="ptmri8t-">ECCV</span>, 2018.

</div>
