Egocentric 3D Reconstruction

From egocentric video to a renderable 3D scene

This walkthrough follows the reconstruction preparation path: sample frames, export calibration, align poses, generate COLMAP commands, then move toward NeRF or 3D Gaussian Splatting.

Sample Outputs

8frames extracted
200SLAM poses exported
4,089SLAM points

Camera Path Intuition

Extracted frames linked to nearest SLAM poses Contact sheet of extracted frames and nearest SLAM poses
SLAM path + sparse points
Pose

A camera position and orientation at one timestamp.

Point cloud

Sparse 3D points reconstructed from visual tracking.

Novel view

A synthesized camera view rendered from the learned scene.

Frames

Frame extraction controls the tradeoff between runtime and visual overlap. Too few frames make matching brittle; too many frames slow down downstream optimization.

python scripts/reconstruction_demo.py --frame-stride 180 --max-frames 24

Command Map

Prepare

Extract frames and export calibration from the sample episode.

Reconstruct

Run COLMAP feature extraction, matching, mapping, and undistortion.

Render

Use NeRFStudio or 3DGS templates after poses and images are ready.

Concept Lab

These are the main ideas behind the reconstruction stack. Click each one to see the intuition before running heavy tools.

Frame extraction

Frame extraction turns a long egocentric video into a set of images. The spacing matters: too sparse means weak overlap; too dense means slower reconstruction.

Why First-Person Reconstruction Fails

Motion blur

Fast head or hand movement weakens feature matches.

Dynamic objects

Hands, kettle, dripper, and water move independently from the room.

Fisheye distortion

A wide field of view needs the correct camera model or undistortion step.

Sparse overlap

Frames may not share enough common texture for stable matching.

Bundle adjustment drift

Bad matches can pull camera poses away from the real trajectory.

Reflective surfaces

Glass, water, and metal can create inconsistent appearance.

Run This Locally

1. Set DATA_ROOT.
2. Install pip install -e ".[dev]".
3. Prepare frames and poses.
4. Parse COLMAP when available.
ego-recon-demo --data-root "$DATA_ROOT" --output-dir outputs/sample_demo --frame-stride 180 --max-frames 24
ego-recon-demo --parse-colmap-only --colmap-model outputs/sample_demo/colmap/sparse/0 --output-dir outputs/sample_demo

COLMAP Summary Preview

{
  "num_registered_images": 24,
  "num_sparse_points": 1200,
  "mean_reprojection_error": 0.62
}