MARS Policy

Generative policies are expressive — but slow. Deterministic ones are fast — but mode-averaging.

Generative

Flow-matching and diffusion policies model behaviour as a stochastic process. They capture multi-modal action distributions, but every inference call pays for stochastic noise initialisation and an iterative denoising loop.

Deterministic

Action-to-action regressors are an order of magnitude cheaper, yet they collapse to the mean whenever the demonstration data covers several equally valid behaviours, silently degrading rollout quality.

MARS: Inject a proper amount of noise only at the proper time.

Concept figure: expert trajectories, generative policy, deterministic policy, MARS policy on a 2D navigation task

FIG. 01 · 2D Navigation

An illustrative 2D navigation task. (a) Expert trajectories exhibit two valid paths around the obstacle. (b) A generative policy retains both modes but pays a heavy denoising cost everywhere. (c) A deterministic policy collapses the two modes into a single, often colliding path. (d) MARS keeps multimodality only where it matters — the colour map shows the per-step multimodality degree rising near the branch point and falling to zero elsewhere.

0.00% ↓ latency

vs. flow-matching at inference

0.00% ↑ success

average gain on real-world manipulation tasks

0tasks tested

8 simulated · 4 real-world · Franka & Galaxea R1 Lite

§ 02 — Real-World

Four manipulation tasks. Two robots.

Pick Cup

A coffee mug exposes two equally valid grasps — by the handle or by the rim. MARS samples each grasp in turn.

Galaxea R1 Lite

MARSours

Click a mode — MARS commits to either grasp from the same start.

Flow Matching

Generative, but jitterier.

A2A Deterministic — commits to a single grasp every rollout.

Pick Vegetable

Carrot and daikon sit on opposite sides of a mango at every held-out evaluation; either pick is a success. MARS preserves the modal balance.

Galaxea R1 Lite

MARSours

Click a mode — MARS commits to either pick, run to run.

Flow Matching

Generative, but jitterier.

A2A Deterministic — always the same pick.

Push-T

From a fixed start pose, the T-shaped block can be aligned by going around it from either side. MARS samples a different path on each rollout while still landing the T inside the target.

Franka

MARS · Upper route Sweeps over the upper edge of the T before nudging it in.

MARS · Lower route Same policy, same start — approaches from below and rotates it in.

Block Push

From an identical initial cube pose, two distinct goal regions are equally valid. The dataset is roughly balanced; only a policy that preserves modal balance can land in each.

Franka

MARS · Upper goal Pushes the cube toward the upper target region.

MARS · Lower goal Same start state — picks the lower target instead. A2A would average both and miss.

Real-world success rates, modal balance γ, and inference latency analysis.

Eight tasks, Two environments.

We separate simulated tasks by whether the demonstrations are genuinely multimodal or essentially unimodal. MARS wins both — and on unimodal tasks it actually trains faster than the deterministic baseline by modelling residual action diversity that strict regression discards.

Push Cube benchmark — bimodal left/right push directions; MARS keeps both modes while the deterministic baseline collapses to one.

Grasp Eyeglass benchmark — bimodal grasp poses; flow-matching drifts toward a single pose while MARS stays expressive.

Collision Avoidance benchmark — bimodal slow/fast speeds; MARS sustains both strategies where baselines oscillate or collapse.

FIG. 04 · Multimodal benchmarks

Strategically multimodal tasks.

(a) Push Cube · bimodal directions. (b) Grasp Eyeglass · bimodal grasp poses. (c) Collision Avoidance · bimodal speeds. Each panel: success rate (top) and modal balance γ (bottom) vs training epochs.

Legend: Generative, Deterministic, MARS.

Learning curves on strategically unimodal simulated tasks.

FIG. 05 · Unimodal benchmarks

Strategically unimodal tasks.

(a) Close Box · RLBench. (b) Stack Cube · ManiSkill. (c) Pick Cube · ManiSkill. (d) Close Drawer · LIBERO. Even in tasks that appear strategically unimodal yet exhibit nuanced trajectory variations, MARS policy exhibits superior training efficiency over the deterministic one.

How MARS works.

A small modal scheduling network reads the recent action history and predicts where multimodality is needed. Its output then governs both the source distribution of the flow and the number of denoising steps at inference.

Modal scheduling

A lightweight head predicts from the recent action context — one weight per action dimension. High where demonstrations branch; near zero where they don't.
Adaptive source flow

Instead of pure Gaussian noise, the flow starts from a hybrid:
Diversity-aware loss & adaptive steps

Training jointly minimises flow-matching, reconstruction and a diversity term matching source spread to target spread. At inference the ODE budget scales with :

BibTeX

@misc{jia2026marspolicymultimodalitymatters,
  title         = {MARS Policy: Multimodality Only When It Matters},
  author        = {Jindou Jia and Tuo An and Yuxuan Hu and Gen Li and
                   Jingliang Li and Bohan Hou and Xiangyu Chen and Jiaqi Bai and
                   Bofan Lyu and Jianfei Yang},
  year          = {2026},
  eprint        = {2605.29766},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2605.29766}
}

MARS Policy

Generative policies are expressive — but slow. Deterministic ones are fast — but mode-averaging.

Generative

Deterministic

Four manipulation tasks. Two robots.

Pick Cup

Pick Vegetable

Push-T

Block Push

Success, modal balance, and latency across all four tasks.

Eight tasks, Two environments.

Strategically multimodal tasks.

Strategically unimodal tasks.

How MARS works.

Modal scheduling

Adaptive source flow

Diversity-aware loss & adaptive steps

Eight policies, one navigation task.

Deterministic regression averages into walls.

Adding reconstruction loss kills expressivity.

Why pure Gaussian noise is the wrong default.

Modal balance under stress.

BibTeX