Dataset / Reconstructions / Evaluation Artifacts: https://drive.switch.ch/index.php/s/WNluDrafwA0cZp1
This repository provides an integrated, reproducible and extensible framework for reconstructing 3D pollen geometry from multiâview photographic and holographic inputs. It consolidates classical geometric methods (Visual Hull) with modern neural reconstruction paradigms (Pix2Vox, PixelNeRF, Pixel2Mesh++, diffusion / generative methods via Hunyuan3Dâ2) and offers scalable Blender / VTK based preprocessing pipelines. Experiment management is orchestrated through Hydra; training and evaluation leverage PyTorch Lightning with consistent metric reporting.
- Objective and Scope
- Core Contributions
- Directory Overview and Rationale
- Model Families and Training Interfaces (Visual Hull | Pix2Vox (Hydra) | PixelNeRF | Pixel2Mesh++ | Hunyuan3Dâ2 | Optional External Methods)
- Data Architecture and Preprocessing Pipelines
- Hydra Configuration & Experiment Design
- Training Modalities (Containers / SLURM Recommended; Local Development Optional)
- Evaluation and Metrics Infrastructure
- Reproducibility, Logging and Experiment Tracking
- Curated Script Index
- Planned Extensions
- Minimal Local Quickstart (Fallback Only)
- FAQ
- Citation and Attribution
- Contact
Sequoia aims to establish a rigorous comparative substrate for heterogeneous 3D reconstruction techniques applied to pollen morphology. Emphasis is placed on methodological comparability, dataset modularity (synthetic multiâview renders, holographic acquisitions) and transparent experiment lifecycle management. The design priorities are: (i) structural clarity, (ii) deterministic preprocessing, (iii) orthogonal configuration overrides, and (iv) separation of concerns between data generation, model definition, and evaluation.
- Unified training abstraction (Lightning + Hydra) spanning volumetric, implicit, voxelâgrid, mesh refinement and generative paradigms.
- Visual Hull baseline (deterministic boolean volume carving) for geometric reference.
- Pix2Vox integration with granular control over pretrained loading, staged module activation (merger/refiner kickâin), and selective freezing.
- PixelNeRF integration (multiâview conditional radiance field) via dedicated SLURM scripts and container isolation.
- Pixel2Mesh++ multiâstage mesh refinement pipeline backed by bespoke preprocessing conforming to its data contract.
- Hunyuan3Dâ2 accelerated inference suite (octree / step budgets) for generative comparison.
- Two complementary augmentation frameworks (broad vs. compact deformation sets) enabling structural and morphological perturbations.
- Cohesive evaluation layer sharing metric mixins across models for directly comparable quantitative outputs.
configs/ Hydra configurations (data, model, experiment, trainer, callbacks)
core/ Training orchestration, metric mixins, model registry
data/ Dataset classes, augmentation entry points, utilities
data/preprocessing/ Blender & VTK pipelines (ShapeNet style renders, Pixel2Mesh converters, augmentation variants)
Pixel_Nerf/ Upstream / adapted PixelNeRF implementation and scripts
Pixel2MeshPlusPlus/ Pixel2Mesh++ codebase (cfgs, modified templates)
Hunyuan3D-2/ Hunyuan3Dâ2 inference utilities and examples
notebooks/ Exploratory analysis, ablation studies, qualitative comparisons
scripts/ SLURM / submission / sweep scripts grouped by model family
evaluation_pipeline/ Programmatic runners for standardized evaluation (incl. holography)
evaluation_results/ Persisted quantitative and qualitative outputs
appendix/ Auxiliary external methods (InstantMesh, SparseFusion, SplatterImage)
docker/ Container build assets
gen_images/ Diagrams and illustrative assets
Key artefacts:
configs/train.yamlâ primary Hydra entrypoint (default composition; model switched via override).configs/model/(*.yaml) â modelâlevel hyperparameters and pretrained weight references.configs/experiment/(*.yaml) â composable scenario definitions (e.g.vh_2img.yaml,pix2vox_aug_4img.yaml).core/train.pyâ canonical training pipeline (freezing logic, pretrained weight injection, metrics bootstrap, W&B logging).core/models/visual_hull.pyâ reference implementation using structured backâprojection carving.core/models/pix2vox/â modular encoder / decoder / merger / refiner components.data/augmentation.py&data/preprocessing/create_augmentations/augmentation.pyâ broad vs. compact augmentation frameworks.data/preprocessing/pixel2mesh/(*.py) â conversion, normalization and multiâview rendering for Pixel2Mesh++ datasets.data/preprocessing/blender_pipeline/Shape_Net_Pipeline/â ShapeNetâstyle camera sphere rendering (PixelNeRF preparation).notebooks/animated_mesh_comparison.ipynbâ consolidated qualitative benchmarking across model families.scripts/(/.sbatch) â curated SLURM launchers isolating variant hyperparameters.Hunyuan3D-2/(fast_shape_gen_pollen_orthogonal_*.py) â inference acceleration footprints (varying step count and octree depth).
Deterministic baseline producing a boolean occupancy volume through multiâview silhouette intersection. Configured via configs/model/visual_hull.yaml and experiment overrides configs/experiment/vh_*img.yaml (1â6 views). Example:
python -m core.train experiment=vh_2img model=visual_hull data.default n_images=2No optimizer state is persisted (pure geometric operator); only metrics are logged.
Configuration at configs/model/pix2vox.yaml supports: learning rate, pretrained weights, staged activation thresholds (merger_kickin, refiner_kickin), dropout and explicit module freezing. Experiments differentiate augmentation usage and view cardinality (e.g. pix2vox_aug_1img.yaml ⊠pix2vox_aug_6img.yaml, holography transfer variants). Example:
python -m core.train experiment=pix2vox_aug_4img model=pix2vox data.default n_images=4 data.include_augmentations=trueOnâtheâfly freezing: model.frozen="[encoder,decoder]".
Resides under Pixel_Nerf/. Launch is containerized to stabilize dependency stacks and CUDA alignment. SLURM submissions (scripts/slurm_sbatch_experiments/ train_pixelnerf*.sbatch and scripts/pixelnerf/) expose variations in encoder depth, view counts, fine/coarse sampling and loss choices. Example invocation segment:
singularity exec --nv \
--bind /path/checkpoints:/container/checkpoints \
--bind /path/sequoia/Pixel_Nerf/:/code \
--pwd /code pixelnerf_new.sif \
python3 train/org_train.py -n pollen_256_4_4 -c conf/exp/pollen.conf -D /code/pollen --nviews 4 --resumeHosted in Pixel2MeshPlusPlus/ with modified configuration templates (Pixel2MeshPlusPlus/cfgs/). Preprocessing scripts (data/preprocessing/pixel2mesh/to_pixel2mesh_dir_original.py, data/preprocessing/pixel2mesh/to_pixel2mesh_dir_original_augments.py) produce canonical multiâview inputs (8 fixed azimuths at 30° elevation) and normalized meshes. SLURM scripts in scripts/pixel2mesh/ encode a matrix of (view count Ă prior type Ă training regime), including mean/special/spherical priors, freeze vs. fineâtune.
Inferenceâcentric generative pipeline (no training loop here). Scripts parameterize octree resolution and sampling steps (fast_shape_gen_pollen_orthogonal_{5,10,50}steps.py, _octree32.py, _octree128.py). Provides comparative generative baselines and rapid prototyping for diffusionâbased reconstructions.
Supplementary approaches located in appendix/ (e.g. SparseFusion, InstantMesh, SplatterImage). These are included for qualitative or contextual benchmarking and may maintain independent requirement sets. Containerization is strongly advised to prevent crossâcontamination of dependencies.
Raw STL Mesh â (Cleaning / Repair via data/mesh_cleaner.py, data/mesh_analyzer.py) â Normalization (centering + scale invariance) â Multiâview rendering (Blender) producing RGB + silhouettes â (Optional) augmentation (geometric + morphological) â Modelâspecific dataset assembly (voxel grids, multiâimage tuples, mesh refinement inputs) â Training & evaluation artifacts.
Location: data/preprocessing/blender_pipeline/Shape_Net_Pipeline/ and variant under data/preprocessing/pixelnerf/Shape_Net_Pipeline/. Principal scripts: shapenet_spherical_renderer.py, parallel.py, augmentation.py, blender_interface.py. Example (Windows, legacy Blender 2.7):
"C:\\Program Files\\Blender2.7\\blender.exe" --background --python shapenet_spherical_renderer.py --addons io_mesh_stl -- \
--mesh_dir C:/path/meshes_obj/ --output_dir C:/out/shapenet_views --num_observations 128Outputs consistent camera pose enumerations consumed by PixelNeRF (imageâpose pairing).
Silhouettes are derived from the same multiâview rendering stage. Visual Hull uses fixed azimuth angles embedded in VisualHull.angles_deg (see core/models/visual_hull.py); ensure renderer azimuth parity. Pix2Vox leverages identical view subsets controlled through n_images in the data configuration or experiment overrides.
Augmented vs. base variants differentiated through experiments (pix2vox_aug_*.yaml) toggling data.include_augmentations=true. Pretrained weights (Pix2Vox-A-ShapeNet.pth) specified in configs/model/pix2vox.yaml.
Scripts:
- Original:
data/preprocessing/pixel2mesh/to_pixel2mesh_dir_original.py - Augmented:
data/preprocessing/pixel2mesh/to_pixel2mesh_dir_original_augments.pyProcessing stages: VTK normalization, mesh simplification (quadric decimation), canonical orientation, multiâview rendering (8 fixed viewpoints) and persistence of mesh + normals + auxiliary attributes. Output directory structure mirrors instance granularity (pixel2mesh_original/,pixel2mesh_original_augmented/).
- Comprehensive (root:
data/augmentation.py) â Deformations: swelling, shriveling, twisting, stretching, spikify, groove, wrinkle, asymmetry, full_combo. Progress recorded viaprogress.jsonenabling resumable execution. - Compact (
data/preprocessing/create_augmentations/augmentation.py) â Deformations: twisting, stretching, groove, asymmetry, full_combo, radical_reshape, irregular. Designed for rapid, orthogonally composable perturbations. Example:
"C:\\Program Files\\Blender2.7\\blender.exe" --background --python data/augmentation.py --addons io_mesh_stl -- \
--mesh_dir data/processed/meshes_repaired --output_dir data/processed/augmentation --num_augmentations 5Augmented outputs feed downstream into Pixel2Mesh++ or Pix2Vox (if augmentation inclusion is activated via Hydra).
Holographic preprocessing and domain adaptation support are configured via configs/data/holo.yaml (includes ripple removal transform sequence). Experiments such as pix2vox_aug_holo_test.yaml and vh_2img_holo_test.yaml facilitate zeroâshot or adaptation studies.
Layered composition:
- Data layer (
configs/data/*.yaml):PollenDataModule,HolographicPolenoDataModulewith parameters (n_images, augmentation toggles, batch size, transforms). - Model layer (
configs/model/*.yaml): hyperparameters, pretrained asset paths, module scheduling. - Experiment layer (
configs/experiment/*.yaml): curated override bundles (e.g. model swap, number of images, holography variants). Execution example:
python -m core.train experiment=pix2vox_aug_3img model.pix2vox.lr=5e-5 seed=1234Hyperparameter sweeps employ Optuna through scripts/submit_optuna_sweep.sh invoking multiârun Hydra mode (-m +sweep=pix2vox_optuna).
Authoritative images: https://hub.docker.com/repositories/etiir Justification: deterministic dependency graphs, GPU driver abstraction, trivial SLURM integration, avoidance of crossâlibrary ABI mismatches (Torch / VTK / Open3D / Blender).
Pix2Vox (4 views augmented):
sbatch --job-name=pix2vox_aug_4img <<'EOF'
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --mem=64G
#SBATCH --cpus-per-task=4
export WANDB_API_KEY=YOUR_KEY
singularity exec --nv pix2vox.sif \
python -m core.train experiment=pix2vox_aug_4img
EOFVisual Hull (lightweight CPU/GPU optional):
python -m core.train experiment=vh_4img data.default n_images=4PixelNeRF, Pixel2Mesh++ and Hunyuan3Dâ2 rely on their respective SLURM scripts under scripts/ for consistent binding of code, checkpoints and environment.
PixelNeRF, Pixel2Mesh++ and Hunyuan3Dâ2 rely on their respective SLURM scripts under scripts/ for consistent binding of code, checkpoints and environment.
Potential friction points: legacy Blender 2.7 API, VTK wheel platform specificity, CUDA / Torch ABI alignment. Containers remain the canonical path for reproducible results.
Unified metric registration occurs via core/metrics.py and MetricsMixin, automatically invoked in Lightning training_step / validation_step / test flows. Analytical notebooks (animated_mesh_comparison.ipynb, exp_5_number_of_views.ipynb, holo_explorer.py) provide qualitative triangulation. Programmatic evaluation orchestrators: evaluation_pipeline/runner.py, evaluation_pipeline/runner_holo.py. Outputs are archived in evaluation_results/ segregated by domain context.
- WandB integration (
WandbLogger), project namespacereconstruction. - Full Hydra configuration (resolved) persisted with each run for auditability.
- Determinism seeded via
pl.seed_everything(cfg.seed, workers=True)ifseedis defined. - Checkpoints stored under
checkpoints/(override capable).
| Purpose | Path |
|---|---|
| Visual Hull configurations | configs/experiment/ (vh_*img.yaml) |
| Pix2Vox view count variants | configs/experiment/ (pix2vox_aug_{1,3,4,5,6}img.yaml) |
| Pix2Vox frozen enc/dec | configs/experiment/pix2vox_aug_frozen_encdec.yaml |
| Holography transfer tests | configs/experiment/pix2vox_aug_holo_test.yaml, configs/experiment/vh_2img_holo_test.yaml |
| Pixel2Mesh++ preprocessing (original) | data/preprocessing/pixel2mesh/to_pixel2mesh_dir_original.py |
| Pixel2Mesh++ preprocessing (augmented) | data/preprocessing/pixel2mesh/to_pixel2mesh_dir_original_augments.py |
| Augmentation (comprehensive) | data/augmentation.py |
| Augmentation (compact) | data/preprocessing/create_augmentations/augmentation.py |
| ShapeNet rendering | data/preprocessing/blender_pipeline/Shape_Net_Pipeline/shapenet_spherical_renderer.py |
| Pix2Vox model modules | core/models/pix2vox/ |
| Visual Hull model | core/models/visual_hull.py |
| Hydra training entry | core/train.py |
| Optuna sweep submission | scripts/submit_optuna_sweep.sh |
- Unified CLI façade spanning all model families.
- Automated camera pose JSON export collocated with rendered images.
- Crossâmodel evaluation CLI for batch metric synthesis across checkpoints.
- Mesh quality metrics (watertightness, genus, selfâintersection) integrated into training loop.
- Preprocessing unit tests (normalization invariants, face count thresholds, scaling correctness).
Containers and provided SLURM scripts remain the authoritative path; local setup increases variance and failure surface.
- Python 3.11 (root project) / Python 3.12 (Hunyuan3Dâ2 subâenvironment)
- CUDAâcapable GPU with compatible drivers
- (Legacy augmentation) Blender 2.7x; consider migration plan to â„3.x
Root environment:
uv syncActivation (PowerShell):
./.venv/Scripts/Activate.ps1Hunyuan3Dâ2 (separate lock + Python 3.12):
cd Hunyuan3D-2
uv syncPixelNeRF (isolated virtual environment example):
cd Pixel_Nerf
python -m venv .venv
source .venv/bin/activate # or .venv/Scripts/Activate.ps1 on Windows
pip install -r requirements.txtPixel2Mesh++ follows the same isolation principle.
Pixel2MeshPlusPlus/ follows the same isolation principle.
python -m core.train experiment=vh_1imgExpected: metrics logged (WandB optional), rapid termination without GPU memory pressure.
Ensure data/processed/ contains requisite mesh / augmentation outputs. Absent structures should trigger reruns of the relevant preprocessing scripts above.
Ensure data/processed/ contains requisite mesh / augmentation outputs. Absent structures should trigger reruns of the relevant preprocessing scripts above.
| Issue | Cause | Mitigation |
|---|---|---|
| Missing Blender Python modules | Incorrect binary path | Supply absolute Blender path, validate --python target |
| Torch / torchvision ABI mismatch | Divergent CUDA builds | Prefer container; else install matched wheel set |
| VTK ImportError | Platform wheel unavailability | Favor container execution |
| Headless Open3D rendering | No GUI backend | Utilize pyglet<2, fallback to offâscreen rendering or disable visualization |
For benchmarkâquality results always rely on provided Docker / Singularity images and SLURM scripts; local variance undermines comparability.
Why containers? Heterogeneous dependency surface (Blender 2.7 API, VTK, Open3D, PyTorch) produces brittle local stacks; containers encode stable, shareable environments.
Which view azimuths does Visual Hull assume? [0, 90, 180, 270, 45, 135, 225, 315] degrees (see VisualHull.angles_deg in core/models/visual_hull.py).
How do I freeze Pix2Vox encoder / decoder? Add model.frozen="[encoder,decoder]" to the Hydra command line.
Augmentation framework differences? Root variant offers a broader morphological spectrum (swelling / spikify / wrinkle), compact variant emphasizes composable transformations for rapid diversification.
Please acknowledge upstream projects (PixelNeRF, Pix2Vox, Pixel2Mesh++, Hunyuan3Dâ2, and others) in accordance with their respective licenses. This repository serves as an integrative orchestration layer and does not supersede original intellectual property claims.
For clarification, feature proposals, or issue reporting, file a ticket in the issue tracker or reach out to the maintainers listed in the project metadata.
Last Updated: 2025â08â13