D4RT: Dynamic 4D Reconstruction and Tracking

Implementation of Google DeepMind's D4RT - a unified transformer model for 4D scene reconstruction from video.

What This Can Do

D4RT is a single model that handles multiple 4D vision tasks:

Task	Description
Point Tracking	Track sparse points through video in 3D
Dense Tracking	Track all pixels with full scene flow
Depth Estimation	Reconstruct per-frame depth maps
Point Cloud	Generate 3D point clouds with colors/normals
Long-term Prediction	Predict positions beyond video length

Installation

git clone https://github.com/MasahiroOgawa/D4RT_MasImpl.git
cd D4RT_MasImpl

# Using uv (recommended)
uv sync
source .venv/bin/activate

# Or using pip
pip install -e .

Quick Start

Training

python scripts/train.py --config configs/training/train_paper_arch.yaml

Inference

from d4rt.models import build_d4rt_model
from d4rt.inference import PointTracker, DepthReconstructor

# Load model
model = build_d4rt_model(config)
model.load_state_dict(torch.load("checkpoint.pth"))

# Track points
tracker = PointTracker(model)
trajectories = tracker.track_points(video, query_points)

# Reconstruct depth
depth_recon = DepthReconstructor(model)
depth = depth_recon.reconstruct(video, frame_idx=0)

Evaluation

python scripts/evaluate.py \
    --config configs/model/vit_b_d4rt.yaml \
    --checkpoint checkpoints/checkpoint.pth \
    --data_dir data/kubric/val

Project Structure

d4rt/
├── models/          # Model architectures
├── losses/          # Loss functions
├── inference/       # Tracking, depth, point cloud
├── data/            # Dataset loaders
└── training/        # Training loop

configs/             # Configuration files
scripts/             # Training and evaluation scripts
tests/               # Unit and integration tests
doc/                 # Detailed documentation

Documentation

Document	Description
Architecture	Detailed model architecture (Figure 7)
Training Guide	Training loop, loss functions, hyperparameters
Inference Guide	Point tracking, depth, pose estimation
Implementation Notes	Differences from paper, fixes, lessons learned

Loss Functions

This implementation combines the paper's loss formulation with practical improvements for depth learning.

Loss Weights

Loss	Weight	Description
3D L1	1.0	Primary 3D position loss (DUSt3R-style normalization)
2D L1	0.1	Image-space coordinate loss
Visibility	0.1	Binary cross-entropy for occlusion
Confidence	0.2	Penalty term `-log(c)` for honest confidence
Normal	0.5	Surface normal cosine loss
Motion	0.1	Temporal motion consistency
Depth	1.0	Direct L1 depth loss (see below)

3D Loss Normalization

Two normalization modes are available (configurable via norm_mode):

Mode	Formula	Description
`dust3r` (default)	Joint normalization by combined 3D distance	Both pred and GT normalized by same scale factor
`paper`	`pred / pred_mean`, `gt / gt_mean` + log transform	Paper's scale-invariant formulation

Direct Depth Loss (Key Addition)

To prevent depth variance collapse (model predicting near-constant depth), we add a direct L1 depth loss:

L_depth = λ_depth * |pred_z - gt_z|

Property	Description
Weight	1.0 (configurable via `depth` in loss config)
Not scale-invariant	Provides absolute depth supervision
Prevents variance collapse	Penalizes when predictions cluster near mean

This is critical because scale-invariant losses alone allow the model to minimize loss by predicting all depths near the mean value.

Model Variants

Model	Parameters	Config
ViT-B	~230M	`configs/model/vit_b_d4rt.yaml`
ViT-L	~451M	`configs/model/vit_l.yaml`
ViT-g	~1.1B	`configs/model/vit_g.yaml`

Testing

pytest tests/ -v

References

Paper: D4RT: Unified, Fast 4D Scene Reconstruction & Tracking
Project Page: https://d4rt-paper.github.io/
Original Authors: Google DeepMind

License

GNU General Public License v3.0 - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
configs		configs
d4rt		d4rt
doc		doc
figure		figure
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
check_conversion_status.sh		check_conversion_status.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D4RT: Dynamic 4D Reconstruction and Tracking

What This Can Do

Installation

Quick Start

Training

Inference

Evaluation

Project Structure

Documentation

Loss Functions

Loss Weights

3D Loss Normalization

Direct Depth Loss (Key Addition)

Model Variants

Testing

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

D4RT: Dynamic 4D Reconstruction and Tracking

What This Can Do

Installation

Quick Start

Training

Inference

Evaluation

Project Structure

Documentation

Loss Functions

Loss Weights

3D Loss Normalization

Direct Depth Loss (Key Addition)

Model Variants

Testing

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages