Any Resolution Any Geometry: From Multi-View To Multi-Patch

CVPR 2026

Wenqing Cui1, Zhenyu Li1,*, Mykola Lavreniuk2,*, Jian Shi1, Ramzi Idoughi1, Xiangjun Tang1, Peter Wonka1
1KAUST, 2Space Research Institute NASU-SSAU
*Equal contribution

On in-the-wild high-resolution images, our method produces depth and normal maps with sharp boundaries and globally consistent geometry. Hover over the images to zoom in, and use the magnification slider above to adjust the magnification level and inspect the improved fine-detail preservation and depth–normal consistency across all predictions.

Magnification:1.5×
Drag to adjust zoom level
Input RGB
Input RGB · Scene 1 (8K)
Our Depth Prediction
Our Depth Prediction · Scene 1 (8K)
Our Normal Prediction
Our Normal Prediction · Scene 1 (8K)

👀 Interactive Comparison

Drag for interactive comparison.



In-the-Wild Samples

Depth Estimation



RGB Ours Depth-Anything V2

Surface Normal Estimation



RGB Ours Metric3D V2



In-Domain Samples from UnrealStereo4K

Depth Estimation



RGB Ours Depth-Anything V2


Surface Normal Estimation



RGB Ours Metric3D V2


Method


We introduce a multi-patch framework for high-resolution monocular geometry estimation, delivering sharp and globally consistent depth and surface normals at any resolution (e.g., 2K, 4K, 8K) from a single RGB image.
The main ideas are:

  1. Reformulating high-resolution prediction as a multi-patch refinement task: we divide the input image into spatial patches, augment each patch with coarse depth and normal priors, and process all patches jointly with a unified transformer backbone.
  2. Employing cross-patch attention with global positional encoding to propagate information across distant regions, enforcing seamless boundaries and coherent geometry across the entire image.
  3. Introducing a Variable Multi-Patch Training (GridMix) strategy that samples different patch-grid configurations during training, improving robustness to image resolution and spatial layout and yielding strong zero-shot performance on real-world benchmarks.


Framework diagram

Framework