A Real-world Display Inverse Rendering Dataset¶

Conference: ICCV 2025 arXiv: 2508.14411 Code: https://michaelcsj.github.io/DIR/ Area: Computer Vision / Inverse Rendering Keywords: Inverse Rendering, Display-Camera System, OLAT Illumination, Polarization Imaging, Photometric Stereo

TL;DR¶

This paper presents the first real-world inverse rendering dataset built upon an LCD display-camera system, comprising stereo polarization images of 16 objects with diverse materials captured under OLAT illumination patterns alongside high-precision geometric ground truth. A simple yet effective display inverse rendering baseline is proposed, outperforming existing inverse rendering methods.

Background & Motivation¶

Background: Inverse rendering aims to recover geometry and reflectance from images. Existing methods rely on different imaging systems — light stages offer high-quality multi-light samples but are costly and bulky; flash photography requires repeated camera movement; displays, by contrast, are programmable, high-resolution, and compact.
Unique Advantages of Display-Camera Systems: Each pixel can serve as a programmable point light source; LCD panels emit polarized light, naturally enabling diffuse/specular separation.
Key Challenge: Despite their clear advantages, no publicly available real-world dataset exists for display-based systems. All existing inverse rendering datasets are collected using light stages, robotic rigs, or natural illumination, making it impossible to evaluate challenges unique to display systems (near-field lighting, low SNR, limited light-view angle sampling, etc.).
Goal: To fill this gap by constructing the system, collecting data, providing benchmarks, and validating methods.

Method¶

Overall Architecture¶

The work consists of four components: (1) constructing and calibrating an LCD display plus stereo polarization camera system; (2) capturing polarization images of 16 objects under 144 OLAT illumination conditions; (3) providing geometric ground truth from structured-light scanning; (4) proposing a baseline inverse rendering method and evaluating existing approaches.

Key Designs¶

Display-Camera Imaging System:
- Function: Constructs an imaging system consisting of a Samsung Odyssey Ark LCD display and two FLIR polarization RGB cameras.
- Core Parameters: Maximum display brightness is 600 cd/m²; per-pixel output is only 0.06 mcd. Display pixels are grouped into 144 superpixels (\(16 \times 9\)), each comprising \(240 \times 240\) display pixels.
- Calibration: (a) Spatially varying backlight \(B_i\) modeling; (b) nonlinear response \(\gamma\) calibration; (c) camera intrinsic and extrinsic calibration; (d) superpixel relative position estimation.
- Radiance Model: \(L_i = s(P_i + B_i)^\gamma\), where \(s\) is a global scaling factor.
Data Acquisition and Processing:
- Function: Captures polarization images of 16 objects with diverse materials under OLAT illumination and obtains geometric ground truth.
- Material Coverage: Resin, ceramic, metallic paint, wood, clay, plastic, bronze, plaster, etc.
- Polarization Processing: Converts four-angle polarization images into Stokes vectors \(s_0, s_1, s_2\), separating specular reflection \(I_{\text{specular}} = \sqrt{s_1^2 + s_2^2}\) and diffuse reflection \(I_{\text{diffuse}} = s_0 - I_{\text{specular}}\).
- Geometric Ground Truth: Obtained using an EinScan SP V2 high-precision 3D scanner (accuracy 0.05 mm); scanned meshes are aligned with images via mutual information.
Image Formation Model and Arbitrary Illumination Synthesis:
- Function: Exploits the linear superposition property of incoherent light transport to support image synthesis under arbitrary display patterns.
- Core Formula: \(I(\mathcal{P}) = \text{clip}(\sum_{i=1}^{N} I_i \cdot s(P_i + B_i)^\gamma + \epsilon)\)
- Design Motivation: Researchers can synthesize images under arbitrary illumination patterns and adjust noise levels without re-capturing data.
Baseline Inverse Rendering Method:
- Function: Proposes a simple and effective baseline for display inverse rendering.
- Pipeline: (a) Estimate normal maps using analytical RGB photometric stereo; (b) estimate depth maps using RAFT Stereo; (c) iteratively optimize normals and reflectance via differentiable rendering with a Cook-Torrance BRDF basis representation.
- Key Technique: Models spatially varying BRDFs as weighted sums of basis BRDFs to handle limited light-view angle sampling.
- Runtime: Optimization completes in only 150 seconds.

Loss & Training¶

Baseline Method: Minimizes RMSE between rendered and input images.
Display Calibration: Optimizes global scalar \(s\), spatially varying backlight \(B_i\), and nonlinearity exponent \(\gamma\) to match rendered OLAT images with captured images.

Key Experimental Results¶

Main Results (Inverse Rendering Evaluation)¶

Method	Illumination	PSNR ↑	SSIM ↑	MAE (Normal) ↓
SRSH	OLAT	41.28	0.9895	25.25°
DPIR	OLAT	34.30	0.9790	41.09°
IIR	OLAT	38.20	0.9850	38.38°
Ours	OLAT	39.33	0.9821	20.94°
Ours	Multiplexed	37.27	0.9766	23.97°

Ablation Study (Photometric Stereo Evaluation — Normal Reconstruction MAE)¶

Method	Type	Elephant	Owl	Cat	Pig	Mean
Woodham	Calibrated	27.02	26.60	21.05	17.02	~23°
PS-FCN	Calibrated	20.26	15.17	10.61	15.80	~15°
SDM-UniPS	Uncalibrated	18.83	14.37	9.70	15.33	~15°
UniPS	Uncalibrated	25.14	17.34	19.69	25.77	~22°

Key Findings¶

SDM-UniPS achieves the best performance under OLAT illumination; 144 OLAT images provide sufficient information for normal reconstruction.
The proposed baseline consistently outperforms existing methods for display inverse rendering, effectively handling near-field lighting and backlight challenges.
Using only 2 multiplexed illumination patterns yields reasonable normal reconstruction, though inverse rendering accuracy remains inferior to that achieved with 144 OLAT images.
Applying polarization-based diffuse/specular separation improves normal reconstruction accuracy, though the degree of improvement varies across methods.
Modeling light attenuation is critical — omitting it causes PSNR to drop from 39.78 to 37.43.

Highlights & Insights¶

Pioneering Contribution: The first real-world inverse rendering dataset targeting display-camera systems, filling an important research gap.
Thorough System Calibration: Comprehensive calibration of backlight, nonlinearity, and geometry makes the dataset highly usable.
Polarization Separation: Leveraging LCD polarization properties to separate diffuse and specular reflectance provides a unique advantage for downstream research.
Light-View Sampling Analysis: Analysis in Rusinkiewicz coordinates reveals that the display system provides sufficient sampling along \(\theta_h\) but limited coverage along \(\theta_d\).
Synthesis Capability: Based on the linearity of light transport, the dataset supports image synthesis under arbitrary illumination patterns and noise levels.

Limitations & Future Work¶

Per-pixel display luminance is extremely low (0.06 mcd), necessitating superpixel grouping at the cost of reduced illumination resolution.
When superpixels are smaller than \(240 \times 240\) pixels, captured images are too dark to be usable.
Light-view angle sampling coverage is limited, particularly along the \(\theta_d\) direction.
The current setup supports only single-viewpoint OLAT capture; simultaneous multi-view acquisition is not supported.
The number of objects (16) and material diversity can be further expanded.

vs. Light Stage Datasets: Light stages provide far-field uniform illumination and denser angular sampling but are expensive and bulky; display systems are compact and low-cost yet face near-field effects.
vs. DiLiGenT and Similar Photometric Stereo Datasets: Existing datasets use point light sources and do not address display-specific challenges such as backlight and near-field illumination.
vs. DDPS (Differentiable Display Photometric Stereo): DDPS uses 3D-printed objects with limited material diversity; this work captures real objects.

Rating¶

Novelty: ⭐⭐⭐⭐ — First display inverse rendering dataset, filling an important gap.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Evaluates a broad range of photometric stereo and inverse rendering methods with multi-dimensional ablations.
Writing Quality: ⭐⭐⭐⭐ — System construction and dataset description are thorough and detailed.
Value: ⭐⭐⭐⭐ — Provides a standardized benchmark for display-based inverse rendering research.