What the fovea can't see, the bitstream doesn't need to carry.
Fovea is a Just-Noticeable Difference (JND) model that predicts, for every pixel, the minimum luminance change detectable by the human visual system (HVS). Coefficients whose reconstruction error falls below this threshold are perceptually invisible and can be discarded by a codec — yielding 20–30 % bitrate savings at equal perceptual quality.
The name references the fovea centralis: the 1.5 mm central pit of the retina where cone density peaks at ~150 000 cells/mm². Fovea-the-library models what that pit (and the surrounding visual pathway) actually notices.
For a frame
combining luminance adaptation
Weber's law,
-
Dark (DeVries–Rose regime): shot noise dominates, so
$\Delta I \propto \sqrt{I}$ and the threshold decreases as$I$ rises out of black. - Bright (saturation): receptor response flattens; thresholds rise again.
Fovea therefore uses a piecewise fit calibrated to Chou & Li [1]:
Background luminance
Computed in
The curve passes through
Texture masking reflects the HVS's reduced sensitivity to error inside high-frequency, high-contrast regions. Fovea operates in the 8×8 DCT domain (matching H.26x coefficient blocks).
The type-II DCT of an 8×8 block
with
Following the power-law fit of Yang et al. [2]:
The exponent
Moving content hides spatial error: the HVS integrates over ~100 ms, so high inter-frame motion reduces effective acuity. Fovea estimates motion per 4×4 block via Mean Absolute Difference (MAD):
The multiplicative boost follows a saturating exponential (cf. Girod [3]):
Asymptotic behaviour:
The half-saturation motion is
Given per-pixel reconstruction errors
A first-order bitrate-savings estimate (assuming
Caveat. This is a proxy. Real H.264/H.265 cost is entropy-coded (CABAC / CAVLC) with context modelling, so
$|e|\mapsto\text{bits}$ is sub-linear and context-dependent. A calibrated-rate version is in the roadmap.
Let
| Component | Work | Notes |
|---|---|---|
| Luminance | Integral-image build |
|
| Texture |
|
|
| Temporal | One MAD pass | |
| Max + multiply | SIMD-friendly | |
| Total | Texture dominates ≈ 56 % of wall time |
At 1080p (
| Model | ms/frame | fps |
|---|---|---|
| LuminanceMap | 8.6 | ~116 |
| TextureMap | 30.1 | ~33 |
| TemporalMap | 6.9 | ~145 |
| CombinedJND | 53.9 | ~18.5 |
DCT is the bottleneck. The in-tree C header (csrc/fovea.h) sketches an
AVX2 path via _mm256_madd_epi16 on the row/column butterflies and
_mm256_sad_epu8 for 4×4 MAD — a 4–6× headroom to reach real-time
1080p60 without GPU.
# Build
CGO_ENABLED=0 go build ./cmd/fovea/
CGO_ENABLED=0 go test ./...
# Analyze a single frame (static JND, no temporal boost)
fovea analyze frame.jpg
# Compare two successive frames (full model, with motion)
fovea compare frame1.jpg frame2.jpg
# Benchmark at 1080p
fovea bench
# HTTP API (POST /analyze, POST /compare, GET /healthz)
fovea serve :8080
curl -X POST -T frame.jpg http://localhost:8080/analyzeSource layout:
internal/
luminance/ Weber-Fechner piecewise, integral-image box mean
texture/ 8×8 type-II DCT, AC-variance power law
temporal/ 4×4 MAD + saturating exponential + history decay
model/ Combine: max(T_L, T_C) * B_M
api/ HTTP API
cmd/fovea/ CLI
csrc/fovea.h C reference implementation with SSE2/AVX2 hooks
Ordered by expected impact on model quality or throughput:
-
SIMD DCT + parallel blocks — AVX2 row/col butterflies with
_mm256_madd_epi16;goroutine-per-block-row. Target 4–6× on the texture path → real-time 1080p60 on CPU. - Empirical 20–30 % bitrate claim — end-to-end pipeline on UVG / Netflix open content: encode with/without sub-JND pruning in x265; report BD-rate vs VMAF at matched quality.
-
VMAF-calibrated constants — refit
$(\alpha, \beta, \gamma, \tau)$ by minimising $| \text{VMAF}{\text{pruned}} - \text{VMAF}{\text{ref}} |_2$ on LIVE / CSIQ / BVI-HD. - Chroma JND — separate sensitivity curves for Cb / Cr, exploiting HVS chromatic-acuity asymmetry (≈ 0.5× of luma in the mid-frequencies).
-
Scene-cut-aware temporal — reset
$H_t$ on SAD jumps above threshold; prevents spurious masking on the first frame of a shot. -
Proper entropy-coded bit model — replace the linear
$|e|\mapsto\text{bits}$ proxy with a CABAC-context look-up. -
Foveated JND — centre-weighted
$T$ for PTZ, drones, VR where gaze is constrained. -
Codec integration — x265 patch exposing
--fovea-pruneas a post-quantisation coefficient killer.
- Chou, C.-H. & Li, Y.-C. (1995). A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Trans. Circuits Syst. Video Technol., 5(6):467–476.
- Yang, X.-K., Ling, W.-S., Lu, Z.-K., Ong, E.-P. & Yao, S.-S. (2005). Just noticeable distortion model and its applications in video coding. Signal Processing: Image Communication, 20(7):662–680.
- Girod, B. (1993). What's wrong with mean-squared error? In: Watson, A. B. (Ed.), Digital Images and Human Vision, MIT Press, pp. 207–220.
- Watson, A. B. (1993). DCTune: A technique for visual optimization of DCT quantization matrices for individual images. SID Digest of Technical Papers, pp. 946–949.
- Daly, S. (1993). The Visible Differences Predictor: an algorithm for the assessment of image fidelity. In Digital Images and Human Vision, MIT Press, pp. 179–206.
- Legge, G. E. & Foley, J. M. (1980). Contrast masking in human vision. Journal of the Optical Society of America, 70(12):1458–1471.
- Weber, E. H. (1834). De pulsu, resorptione, auditu et tactu. Köhler, Leipzig. (Original statement of the Weber ratio.)
- Fechner, G. T. (1860). Elemente der Psychophysik. Breitkopf & Härtel, Leipzig.
Apache-2.0 — see LICENSE and NOTICE. Patent grant included, which matters in the heavily-patented video-coding space.
