ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering¶

Conference: ICML 2025
arXiv: 2506.13814
Code: https://ubc-aamodt-group.github.io/reframe-layer-caching/
Area: Image Generation
Keywords: Layer Caching, Real-Time Rendering, U-Net, Temporal Consistency, Inference Acceleration

TL;DR¶

Extends the intermediate layer caching technique (DeepCache) from diffusion models to U-Net/U-Net++ networks in real-time rendering pipelines, achieving an average of 1.4× inference speedup with negligible image quality degradation through a frame-difference adaptive caching strategy.

Background & Motivation¶

Background: Real-time rendering (e.g., DLSS 4.0) heavily relies on U-Net-style neural networks for tasks such as denoising, super-resolution, and frame extrapolation, with network inference accounting for a significant portion of the rendering pipeline.

Limitations of Prior Work: (a) High temporal correlation exists between rendered frames, yet full inference is performed for every frame; (b) methods utilizing inter-frame differences such as DeltaCNN are difficult to accelerate on current GPUs due to sparse computation; (c) DeepCache is designed solely for multi-step inference in diffusion models.

Key Challenge: In rendering, every frame must individually yield high-quality outputs (unlike diffusion models, which can tolerate approximations in intermediate steps), even though inter-frame features change slowly.

Goal: How to leverage temporal redundancy under the strict quality constraints of real-time rendering?

Key Insight: Cache deep intermediate features of U-Net to skip most of the encoder-decoder computation, recomputing only the shallow layers (which are sensitive to changes in new inputs).

Core Idea: Slow temporal variations in features during real-time rendering \(\rightarrow\) cache deep features \(\rightarrow\) adaptive refresh strategy \(\rightarrow\) training-free acceleration.

Method¶

Overall Architecture¶

In U-Net, the deep feature \(C_t\) is cached during full inference; subsequent frames only compute the first layer \(X^0\) and the final layer \(X^n\), replacing intermediate layers with \(C_t\). For U-Net++, all skip connection branches except the first layer are cached.

Key Designs¶

Layer Caching Mechanism:
- Function: Skip deep computation of the encoder-decoder
- Mechanism: Cache \(C_t = X^{n-1}(\text{concat}(\ldots))\), and for subsequent frames, compute \(O = X^n(\text{concat}(C_t, X^0(I)))\)
- Design Motivation: Deep features capture high-level semantics and vary the slowest across frames, while shallow features capture low-level details and are most sensitive to new inputs
Frame-Difference Adaptive Strategy (Frame Deltas):
- Function: Determine whether to refresh the cache based on the degree of change in the input
- Mechanism: Calculate the SMAPE between the current input and the cached frame, refreshing the cache if it exceeds a threshold \(\tau\). Two tiers are provided: high sensitivity (Delta_H) and low sensitivity (Delta_L)
- Design Motivation: A fixed Every-N strategy cannot adapt to unpredictable scene changes in rendering (e.g., rapid camera movement vs. static scenes)
Motion Vector Threshold Strategy:
- Function: Utilize existing motion vectors from the rendering pipeline to determine whether to refresh
- Mechanism: Refresh the cache when the average motion exceeds a threshold \(\tau\)
- Design Motivation: No additional storage overhead (as motion vectors are already pre-existing bi-products of the rendering pipeline)

Loss & Training¶

Fully training-free: Network weights are not modified; caching logic is added solely during inference.
Can be combined with orthogonal techniques such as quantization and pruning.

Key Experimental Results¶

Main Results¶

Task	Scene	Strategy	Speedup ↑	FLIP ↓	SSIM ↑
Frame Extrapolation	Sun Temple	Delta_H	1.42×	0.017	0.994
Frame Extrapolation	Sun Temple	Delta_L	1.72×	0.033	0.984
Super-Resolution	Sun Temple	Delta_H	1.30×	0.049	0.970
Super-Resolution	Sun Temple	Delta_L	1.85×	0.118	0.930
Image Synthesis	Garden Chair	Delta_H	1.05×	0.001	1.000

Ablation Study¶

Strategy	Avg. Frame Skip Rate	Avg. Speedup	FLIP	Description
Every-2	50%	~1.4×	Medium	Fixed interval
Every-4	75%	~1.7×	High	Quality drops drastically during fast motion
Delta_H	30-50%	1.1-1.4×	Lowest	Adaptive, preserving quality
Delta_L	60-80%	1.5-1.9×	Low	Adaptive balance

Key Findings¶

FLIP < 0.05 is considered an acceptable quality loss in the rendering domain (reference values range from 0.05 to 0.28).
Frame extrapolation tasks benefit the most (as continuous frame transitions are smoothest), followed by super-resolution.
Adaptive strategies prevent the drastic image quality drop that fixed strategies experience during fast camera movements.

Highlights & Insights¶

Training-free + Universality: No network retraining is required; it can be applied to any encoder-decoder network possessing skip connections.
Extension to U-Net++: For the first time, the caching technology is expanded from U-Net to U-Net++.
Reinvesting Saved Computation into Rendering: Saved inference time can be utilized to increase the ray-tracing sample rate, which may overall improve rendering quality.

vs DeepCache: DeepCache targets the fixed-step inference of diffusion models, while ReFrame addresses the single-frame output requirement of rendering; adaptive refreshing is the key difference.
vs DeltaCNN: DeltaCNN utilizes pixel-level differences to sparsify computations, but existing GPU hardware struggles to accelerate sparse operations; the layer caching in ReFrame is fully compatible with existing hardware.
vs DLSS: DLSS natively utilizes U-Net, and ReFrame can serve as a component to further accelerate it.
The caching strategy and DeltaCNN can theoretically be combined: cached frames only compute shallow layers, which are then sparsified internally using deltas.

Limitations & Future Work¶

Evaluated on an RTX 2080 Ti; verification of benefits on the latest GPUs (e.g., RTX 4090/5090) is lacking.
The thresholds for the adaptive strategy require task-specific tuning, lacking an automated determination method.
Only three rendering tasks and five scenes were evaluated, which limits the scope.
Handling of rapid scene cuts (e.g., teleportation in games) is not discussed.
Challenges regarding cache consistency in multiplayer synchronized rendering are not considered.

Rating¶

Novelty: ⭐⭐⭐ The transfer of DeepCache to rendering is relatively direct; the adaptive strategy is an incremental contribution.
Experimental Thoroughness: ⭐⭐⭐ The number of tasks and scenes is relatively small.
Writing Quality: ⭐⭐⭐⭐ Clear and systematic, with rich illustrations.
Value: ⭐⭐⭐⭐ Holds practical significance for optimizing rendering pipelines.