Synthetic Data Pipeline
Build a production-grade pipeline that procedurally generates driving scenes, renders multi-sensor data, auto-labels every frame with pixel-perfect annotations, and exports nuScenes-compliant datasets ready for perception model training.
Track: B -- Synthetic Data & Sensor Sim | Level: Intermediate | Total Time: ~20 hours
Overview
In this project you will build an end-to-end synthetic data generation pipeline for autonomous driving perception. Starting from a programmable scene graph, you will compose realistic driving environments with roads, vehicles, pedestrians, and traffic infrastructure. You will then render those scenes through simulated cameras and lidar sensors, automatically extract ground-truth labels (3D bounding boxes, 2D projections, semantic segmentation, depth maps), and export everything in the nuScenes dataset format so it can be consumed directly by standard perception training frameworks.
Synthetic data is one of the highest-leverage tools available to autonomous driving teams. Real-world data collection costs $5--10 per labeled frame, takes weeks to annotate, and still suffers from class imbalance -- rare actors like scooters, construction workers, or animals appear in less than 2% of drives. Simulation flips every one of those constraints: frames cost fractions of a cent in GPU time, labels are instantaneous and mathematically perfect, and you can spawn any object in any configuration on demand. Companies like Waymo, Cruise, and Applied Intuition generate billions of synthetic frames per year to supplement real data, stress-test perception models on long-tail scenarios, and validate safety-critical behaviors that would be dangerous to stage on public roads.
By the end of this project you will have a working Python package that takes a scene specification (or generates one randomly), renders multi-camera and lidar data, produces a complete set of annotations, and writes a valid nuScenes dataset to disk. You will also build a domain randomization engine that varies lighting, weather, textures, and actor placement so the generated data is diverse enough to improve real-world model generalization. The final deliverable is a CLI tool that can batch-generate hundreds of annotated scenes with a single command.
Learning Objectives
After completing this project, you will be able to:
- Design and manipulate a 3D scene graph -- create hierarchical object trees, attach transforms, and query spatial relationships between actors in a driving environment.
- Implement sensor simulation from first principles -- build a pinhole camera projection pipeline and a raycasting-based lidar simulator, understanding intrinsics, extrinsics, and noise models.
- Generate pixel-perfect auto-labels -- extract 3D bounding boxes, 2D projected boxes, semantic segmentation maps, instance segmentation, and depth maps directly from the scene graph without any manual annotation.
- Write nuScenes-compliant dataset exports -- produce the full relational database structure (scenes, samples, sample_data, ego_pose, calibrated_sensor, annotations) that loads cleanly with the official nuscenes-devkit.
- Apply domain randomization -- systematically vary environmental parameters (lighting, weather, textures, actor density) to maximize dataset diversity and reduce the sim-to-real domain gap.
- Validate synthetic data quality -- build checks for annotation coverage, class distribution balance, calibration consistency, and format compliance.
- Orchestrate batch generation -- combine all components into a configurable pipeline that scales from single-scene debugging to large-scale dataset production.
Prerequisites
Required
- Python proficiency -- comfortable with classes, file I/O, NumPy array operations, and building CLI tools.
- 3D coordinate systems -- understanding of 3D Cartesian coordinates, rotation representations (Euler angles, quaternions, rotation matrices), and rigid-body transforms.
- JSON and data formats -- ability to read/write structured data, work with nested schemas, and validate against expected structures.
Recommended
- Basic rendering concepts -- familiarity with projection, rasterization, and the difference between a mesh and a point cloud.
- nuScenes format familiarity -- having browsed the nuScenes schema documentation or loaded a mini dataset.
- Open3D basics -- prior exposure to Open3D for 3D visualization helps but is not strictly necessary.
Deep Dive Reading
Before starting, read the companion deep dive for theoretical background:
- Synthetic Data for AD Perception Training -- covers domain randomization theory, domain adaptation techniques (FDA, CyCADA), mixed training strategies, and cost/benefit analysis.
Key Concepts
Scene Composition
A driving scene is built from four layers of content:
- Road layout -- lane geometry, road boundaries, intersections, and drivable surfaces. In a minimal pipeline this can be a textured ground plane or a set of parameterized road segments.
- Static infrastructure -- buildings, trees, traffic signs, lane markings, barriers. These objects are placed once and do not move between frames.
- Dynamic actors -- vehicles, pedestrians, cyclists. Each has a pose trajectory over time (position, heading per timestep).
- Traffic signals -- lights with state (green/yellow/red) that change according to a programmed schedule.
Asset placement follows spatial constraints: vehicles stay in lanes, pedestrians on sidewalks, signs at road edges. A good scene composition system enforces these constraints while still allowing randomization within valid bounds.
Scene Graph Structure:
=====================
SceneRoot
+-- RoadNetwork
| +-- Lane_0 (geometry, markings)
| +-- Lane_1
| +-- Intersection_0
+-- StaticObjects
| +-- Building_0 (mesh, pose)
| +-- Tree_0
| +-- TrafficSign_0
+-- DynamicActors
| +-- Vehicle_0 (mesh, trajectory, class="car")
| +-- Vehicle_1 (mesh, trajectory, class="truck")
| +-- Pedestrian_0 (mesh, trajectory, class="pedestrian")
+-- TrafficSignals
| +-- Light_0 (state_schedule, pose)
+-- EgoPose
+-- frame_0: (x, y, z, qw, qx, qy, qz)
+-- frame_1: ...
Coordinate Systems and Transforms
Three coordinate frames matter in the pipeline:
| Frame | Origin | Convention | Use |
|---|---|---|---|
| World | Arbitrary map origin | Right-handed, Z-up | Scene graph, object placement |
| Ego | Center of ego vehicle rear axle | X-forward, Y-left, Z-up | Sensor mounting, annotations |
| Sensor | Sensor optical center | Varies by sensor type | Rendering, raw data |
Converting between frames requires rigid-body transforms (rotation + translation). In homogeneous coordinates:
import numpy as np
def make_transform(rotation_matrix: np.ndarray, translation: np.ndarray) -> np.ndarray:
"""Build a 4x4 homogeneous transform from R (3x3) and t (3,)."""
T = np.eye(4)
T[:3, :3] = rotation_matrix
T[:3, 3] = translation
return T
# Transform a point from sensor frame to world frame:
# p_world = T_world_ego @ T_ego_sensor @ p_sensor
T_world_ego = make_transform(R_ego, t_ego) # ego pose in world
T_ego_sensor = make_transform(R_sensor, t_sensor) # sensor extrinsics
p_sensor_h = np.array([x, y, z, 1.0]) # homogeneous point
p_world_h = T_world_ego @ T_ego_sensor @ p_sensor_h
Camera intrinsics map 3D points in the camera frame to 2D pixel coordinates:
K = [[fx, 0, cx],
[ 0, fy, cy],
[ 0, 0, 1]]
[u, v, 1]^T = (1/z) * K @ [x, y, z]^T
Camera extrinsics define where the camera sits relative to the ego vehicle (rotation + translation from ego frame to camera frame).
Sensor Rendering
Camera rendering follows the classical pinhole model. For each 3D point visible to the camera: (1) transform from world to camera frame, (2) project through the intrinsic matrix, (3) check that the point lies within the image bounds and in front of the camera (z > 0). For image rendering, you can use Open3D's offscreen renderer or a simple Z-buffer rasterizer.
Lidar rendering works via raycasting. For each beam direction in the lidar's scan pattern, cast a ray from the sensor origin into the scene. The first intersection with a mesh surface gives the range measurement. Typical configurations:
| Parameter | Velodyne VLP-32C | Typical Sim Config |
|---|---|---|
| Beams | 32 | 32--64 |
| Horizontal FoV | 360 deg | 360 deg |
| Vertical FoV | -25 to +15 deg | -30 to +10 deg |
| Range | 200 m | 100 m |
| Points/frame | ~70,000 | 50,000--100,000 |
After raycasting, add realistic noise: Gaussian range noise (sigma ~ 0.02 m), random dropouts (1--5% of points), and intensity values derived from surface material properties.
Auto-Labeling
This is where synthetic data shines. Because you control the scene graph, every label is derived analytically:
- 3D bounding boxes: Read directly from each object's pose (center x, y, z), dimensions (length, width, height), and heading. No estimation needed.
- 2D bounding boxes: Project the 8 corners of each 3D box through the camera intrinsics/extrinsics, take the axis-aligned bounding rectangle of the projected points.
- Semantic segmentation: During rendering, assign each pixel the class ID of the object it belongs to. If using a rasterizer, this comes from the material/object ID buffer.
- Instance segmentation: Same as semantic segmentation but with unique per-object IDs instead of per-class IDs.
- Depth maps: Store the Z-buffer value (distance from camera plane) for each pixel during rendering.
def project_3d_box_to_2d(box_corners_3d, T_cam_world, K):
"""Project 8 corners of a 3D box to get a 2D bounding box."""
# box_corners_3d: (8, 3) array of corner positions in world frame
corners_h = np.hstack([box_corners_3d, np.ones((8, 1))]) # (8, 4)
corners_cam = (T_cam_world @ corners_h.T).T # (8, 4) in camera frame
corners_cam = corners_cam[:, :3] # drop homogeneous
# Filter points behind camera
valid = corners_cam[:, 2] > 0
if not np.any(valid):
return None
corners_2d = (K @ corners_cam[valid].T).T # (N, 3)
corners_2d = corners_2d[:, :2] / corners_2d[:, 2:3] # perspective divide
x_min, y_min = corners_2d.min(axis=0)
x_max, y_max = corners_2d.max(axis=0)
return [x_min, y_min, x_max, y_max]
Domain Randomization
Domain randomization is a technique for training models on synthetic data that transfer well to the real world. The core insight, formalized by Tobin et al. (2017), is: if a model sees enough variation during training, the real world becomes "just another variation."
Key axes of randomization for driving scenes:
| Axis | Parameters | Range |
|---|---|---|
| Lighting | Sun elevation, azimuth, intensity | 10--80 deg elevation, full azimuth |
| Weather | Rain droplets, fog density, wet road reflections | Clear / light rain / heavy rain / fog |
| Time of day | Ambient light color temperature, shadow length | Dawn / noon / dusk / night |
| Textures | Road surface, building facades, vehicle paint | Texture pool with 10+ variants each |
| Actor density | Number of vehicles, pedestrians per scene | 3--30 vehicles, 0--20 pedestrians |
| Actor placement | Lane position offset, lateral jitter | +/- 0.3 m lateral, +/- 2 m longitudinal |
The randomization should be controlled by a configuration file so experiments are reproducible:
randomization_config = {
"lighting": {
"sun_elevation_range": [10, 80], # degrees
"sun_azimuth_range": [0, 360], # degrees
"intensity_range": [0.6, 1.0], # normalized
},
"weather": {
"options": ["clear", "light_rain", "heavy_rain", "fog"],
"weights": [0.5, 0.2, 0.15, 0.15],
},
"actors": {
"num_vehicles_range": [3, 30],
"num_pedestrians_range": [0, 20],
"lateral_jitter_m": 0.3,
},
"textures": {
"road_variants": 12,
"vehicle_color_pool": ["white", "black", "silver", "red", "blue", "grey"],
},
}
Data Formats (nuScenes)
The nuScenes dataset format uses a relational database stored as JSON files. Understanding its schema is essential for writing compliant exports.
nuScenes Schema (simplified):
==============================
scene 1 --- N sample (a scene contains N keyframes)
sample 1 --- N sample_data (each keyframe has data from each sensor)
sample_data N --- 1 calibrated_sensor (each data record links to a sensor config)
sample_data N --- 1 ego_pose (each data record has an ego pose)
sample 1 --- N sample_annotation (each keyframe has object annotations)
sample_annotation N - 1 instance (annotations track objects over time)
instance N --- 1 category (each object has a class label)
Core JSON tables you need to produce:
| Table | Key Fields | Purpose |
|---|---|---|
scene.json | name, description, first/last sample token | Top-level scene metadata |
sample.json | timestamp, scene token, prev/next links | Individual keyframes (2 Hz typical) |
sample_data.json | sample token, ego_pose token, calibrated_sensor token, filename | Sensor data file reference |
ego_pose.json | timestamp, translation, rotation (quaternion) | Vehicle pose at each sensor reading |
calibrated_sensor.json | sensor token, translation, rotation, camera_intrinsic | Sensor mounting and calibration |
sensor.json | channel name, modality (camera/lidar) | Sensor definition |
sample_annotation.json | sample token, instance token, category, translation, size, rotation | 3D object annotations |
instance.json | category token, number of annotations | Object identity across frames |
category.json | name (e.g., "vehicle.car") | Class taxonomy |
attribute.json | name (e.g., "vehicle.moving") | Annotation attributes |
A minimal nuScenes-compliant annotation record looks like:
{
"token": "a1b2c3d4e5f6...",
"sample_token": "f6e5d4c3b2a1...",
"instance_token": "1a2b3c4d5e6f...",
"attribute_tokens": ["moving_token_123"],
"visibility_token": "4",
"translation": [100.5, 200.3, 1.2],
"size": [4.5, 1.9, 1.6],
"rotation": [0.707, 0.0, 0.0, 0.707],
"prev": "",
"next": "next_annotation_token",
"num_lidar_pts": 342,
"num_radar_pts": 5,
"category_name": "vehicle.car"
}
Step-by-Step Implementation Guide
Step 1: Environment Setup (45 min)
Goal: Set up dependencies, project structure, and verify everything works.
1.1 Create the project
mkdir -p synthetic-data-pipeline/{synth_data,configs,assets,output,notebooks,tests}
cd synthetic-data-pipeline
python -m venv .venv
source .venv/bin/activate
1.2 Install dependencies
pip install numpy scipy open3d Pillow pyquaternion nuscenes-devkit matplotlib tqdm
| Package | Purpose |
|---|---|
numpy, scipy | Linear algebra, spatial transforms |
open3d | 3D mesh loading, rendering, point cloud ops |
Pillow | Image I/O |
pyquaternion | Quaternion math for rotations |
nuscenes-devkit | Validate exported datasets |
matplotlib | Visualization |
tqdm | Progress bars for batch generation |
1.3 Project structure
synthetic-data-pipeline/
synth_data/
__init__.py
scene_graph.py # Scene graph and asset management
transforms.py # 3D transform utilities
camera.py # Camera model and rendering
lidar.py # Lidar simulation
auto_label.py # Label generation
nuscenes_export.py # nuScenes format writer
randomization.py # Domain randomization engine
pipeline.py # Orchestration
cli.py # Command-line interface
configs/
default.yaml # Default generation parameters
assets/
meshes/ # 3D mesh files (.obj, .ply)
textures/ # Surface textures
output/ # Generated datasets go here
notebooks/ # Jupyter notebooks for exercises
tests/ # Unit tests
1.4 Verify setup
# verify_setup.py
import numpy as np
import open3d as o3d
from pyquaternion import Quaternion
from PIL import Image
print(f"NumPy: {np.__version__}")
print(f"Open3D: {o3d.__version__}")
print(f"Pillow: {Image.__version__}")
# Quick transform test
q = Quaternion(axis=[0, 0, 1], angle=np.pi / 4)
print(f"Quaternion: {q}")
print(f"Rotation mat:\n{q.rotation_matrix}")
# Quick mesh test
mesh = o3d.geometry.TriangleMesh.create_box(4.5, 1.9, 1.6)
print(f"Box mesh: {len(mesh.vertices)} vertices, {len(mesh.triangles)} triangles")
print("\nAll checks passed.")
Step 2: Scene Graph and Asset Management (2.5 hours)
Goal: Build the data structures that represent a 3D driving scene.
2.1 Transform3D utility class
# synth_data/transforms.py
import numpy as np
from pyquaternion import Quaternion
from typing import Optional
class Transform3D:
"""Rigid-body transform (rotation + translation) in SE(3)."""
def __init__(
self,
translation: np.ndarray = np.zeros(3),
rotation: Optional[Quaternion] = None,
):
self.translation = np.asarray(translation, dtype=np.float64)
self.rotation = rotation or Quaternion()
@property
def matrix(self) -> np.ndarray:
"""Return 4x4 homogeneous transform matrix."""
T = np.eye(4)
T[:3, :3] = self.rotation.rotation_matrix
T[:3, 3] = self.translation
return T
@property
def inverse(self) -> "Transform3D":
"""Return the inverse transform."""
R_inv = self.rotation.inverse
t_inv = -(R_inv.rotation_matrix @ self.translation)
return Transform3D(translation=t_inv, rotation=R_inv)
def __matmul__(self, other: "Transform3D") -> "Transform3D":
"""Compose two transforms: T_a @ T_b = T_ab."""
new_rotation = self.rotation * other.rotation
new_translation = (
self.rotation.rotation_matrix @ other.translation + self.translation
)
return Transform3D(translation=new_translation, rotation=new_rotation)
def apply(self, points: np.ndarray) -> np.ndarray:
"""Transform an (N, 3) array of points."""
return (self.rotation.rotation_matrix @ points.T).T + self.translation
def __repr__(self):
return f"Transform3D(t={self.translation}, q={self.rotation})"
2.2 Scene graph nodes
Design a SceneNode base class and specialized subclasses:
# synth_data/scene_graph.py
from dataclasses import dataclass, field
from typing import List, Dict, Optional
import numpy as np
import open3d as o3d
from .transforms import Transform3D
@dataclass
class SceneNode:
"""Base node in the scene graph."""
name: str
transform: Transform3D = field(default_factory=Transform3D)
children: List["SceneNode"] = field(default_factory=list)
def world_transform(self, parent_transform: Optional[Transform3D] = None) -> Transform3D:
"""Compute world-frame transform by chaining parent transforms."""
if parent_transform is None:
return self.transform
return parent_transform @ self.transform
@dataclass
class MeshObject(SceneNode):
"""A node with associated 3D geometry."""
mesh: Optional[o3d.geometry.TriangleMesh] = None
class_name: str = "unknown"
instance_id: int = 0
dimensions: np.ndarray = field(default_factory=lambda: np.array([1.0, 1.0, 1.0]))
@dataclass
class DynamicActor(MeshObject):
"""An object with a trajectory over time."""
trajectory: List[Transform3D] = field(default_factory=list)
def pose_at_frame(self, frame_idx: int) -> Transform3D:
"""Get pose at a specific frame, clamping to available range."""
idx = min(frame_idx, len(self.trajectory) - 1)
return self.trajectory[idx]
@dataclass
class SceneGraph:
"""Top-level container for a driving scene."""
name: str
static_objects: List[MeshObject] = field(default_factory=list)
dynamic_actors: List[DynamicActor] = field(default_factory=list)
ego_trajectory: List[Transform3D] = field(default_factory=list)
num_frames: int = 20
2.3 Asset loading and management
Write helpers to load OBJ/PLY meshes and create simple procedural assets (box cars, cylinder trees) for testing when you do not have a full asset library:
def create_box_vehicle(length=4.5, width=1.9, height=1.6, color=[0.3, 0.3, 0.8]):
"""Create a simple box mesh representing a vehicle."""
mesh = o3d.geometry.TriangleMesh.create_box(length, width, height)
mesh.translate([-length / 2, -width / 2, 0]) # center at base
mesh.paint_uniform_color(color)
mesh.compute_vertex_normals()
return mesh
def create_cylinder_tree(radius=0.3, trunk_height=3.0, canopy_radius=1.5, canopy_height=2.0):
"""Create a simple tree from cylinder trunk + sphere canopy."""
trunk = o3d.geometry.TriangleMesh.create_cylinder(radius, trunk_height)
trunk.paint_uniform_color([0.4, 0.25, 0.1])
canopy = o3d.geometry.TriangleMesh.create_sphere(canopy_radius)
canopy.translate([0, 0, trunk_height + canopy_radius * 0.5])
canopy.paint_uniform_color([0.1, 0.6, 0.1])
return trunk + canopy
2.4 Randomized scene generation
Build a function that generates a random scene given constraints:
def generate_random_scene(config: dict, rng: np.random.Generator) -> SceneGraph:
"""Generate a randomized driving scene."""
scene = SceneGraph(name=f"scene_{rng.integers(0, 100000):05d}")
scene.num_frames = config.get("num_frames", 20)
# Generate ego trajectory (straight road with slight curvature)
ego_speed = rng.uniform(5.0, 15.0) # m/s
dt = 0.5 # seconds between frames
for i in range(scene.num_frames):
x = ego_speed * dt * i
y = 0.5 * np.sin(0.05 * x) # gentle curve
yaw = np.arctan2(np.cos(0.05 * x) * 0.5 * 0.05, 1.0)
pose = Transform3D(
translation=np.array([x, y, 0.0]),
rotation=Quaternion(axis=[0, 0, 1], angle=yaw),
)
scene.ego_trajectory.append(pose)
# Place random vehicles
num_vehicles = rng.integers(
config["actors"]["num_vehicles_range"][0],
config["actors"]["num_vehicles_range"][1],
)
for v in range(num_vehicles):
# ... (place in nearby lanes with trajectories)
pass # Implementation details in notebook
return scene
Step 3: Camera Rendering Pipeline (3 hours)
Goal: Render RGB images from the 3D scene using a pinhole camera model.
3.1 Camera model
# synth_data/camera.py
import numpy as np
from dataclasses import dataclass
from .transforms import Transform3D
@dataclass
class CameraIntrinsics:
"""Pinhole camera intrinsic parameters."""
fx: float # focal length x (pixels)
fy: float # focal length y (pixels)
cx: float # principal point x (pixels)
cy: float # principal point y (pixels)
width: int # image width
height: int # image height
@property
def matrix(self) -> np.ndarray:
return np.array([
[self.fx, 0.0, self.cx],
[0.0, self.fy, self.cy],
[0.0, 0.0, 1.0],
])
@dataclass
class Camera:
"""A camera with intrinsics and extrinsic pose."""
name: str
intrinsics: CameraIntrinsics
extrinsics: Transform3D # Transform from ego frame to camera frame
def project_points(self, points_world: np.ndarray, T_world_ego: Transform3D) -> np.ndarray:
"""Project (N, 3) world points to (N, 2) pixel coordinates.
Returns (N, 3) array where columns are [u, v, depth].
Points behind camera or outside image get depth = -1.
"""
# World -> ego -> camera
T_cam_world = self.extrinsics.inverse @ T_world_ego.inverse
points_cam = T_cam_world.apply(points_world)
# Filter behind camera
mask = points_cam[:, 2] > 0.1
result = np.full((len(points_world), 3), -1.0)
if not np.any(mask):
return result
pts = points_cam[mask]
K = self.intrinsics.matrix
proj = (K @ pts.T).T
uv = proj[:, :2] / proj[:, 2:3]
# Bounds check
in_bounds = (
(uv[:, 0] >= 0) & (uv[:, 0] < self.intrinsics.width) &
(uv[:, 1] >= 0) & (uv[:, 1] < self.intrinsics.height)
)
valid = np.where(mask)[0][in_bounds]
result[valid, :2] = uv[in_bounds]
result[valid, 2] = pts[in_bounds, 2]
return result
3.2 Multi-camera setup
Define a standard six-camera rig (similar to nuScenes):
def create_nuscenes_camera_rig() -> List[Camera]:
"""Create a six-camera rig matching nuScenes sensor layout."""
intrinsics = CameraIntrinsics(fx=1266.4, fy=1266.4, cx=816.3, cy=491.5,
width=1600, height=900)
cameras = []
configs = [
("CAM_FRONT", [1.7, 0.0, 1.5], 0.0),
("CAM_FRONT_LEFT", [1.5, 0.5, 1.5], 55.0),
("CAM_FRONT_RIGHT", [1.5, -0.5, 1.5], -55.0),
("CAM_BACK", [-0.3, 0.0, 1.5], 180.0),
("CAM_BACK_LEFT", [-0.3, 0.5, 1.5], 110.0),
("CAM_BACK_RIGHT", [-0.3, -0.5, 1.5], -110.0),
]
for name, translation, yaw_deg in configs:
yaw = np.radians(yaw_deg)
rotation = Quaternion(axis=[0, 0, 1], angle=yaw)
extrinsics = Transform3D(
translation=np.array(translation),
rotation=rotation,
)
cameras.append(Camera(name=name, intrinsics=intrinsics, extrinsics=extrinsics))
return cameras
3.3 Rendering with Open3D
Use Open3D's offscreen renderer to produce RGB images:
import open3d as o3d
def render_camera_image(scene_meshes, camera, ego_pose, width=1600, height=900):
"""Render an RGB image from the given camera viewpoint."""
renderer = o3d.visualization.rendering.OffscreenRenderer(width, height)
renderer.scene.set_background([0.6, 0.75, 0.9, 1.0]) # sky blue
# Add all scene meshes
for i, mesh in enumerate(scene_meshes):
mat = o3d.visualization.rendering.MaterialRecord()
mat.shader = "defaultLit"
renderer.scene.add_geometry(f"obj_{i}", mesh, mat)
# Set camera from intrinsics/extrinsics
T_cam_world = camera.extrinsics.inverse @ ego_pose.inverse
# ... configure renderer camera from T_cam_world and intrinsics
img = renderer.render_to_image()
return np.asarray(img)
Step 4: Lidar Simulation (2.5 hours)
Goal: Generate synthetic lidar point clouds by raycasting into the scene.
4.1 Lidar configuration
# synth_data/lidar.py
from dataclasses import dataclass
import numpy as np
@dataclass
class LidarConfig:
"""Configuration for a spinning lidar sensor."""
num_beams: int = 32
horizontal_fov: float = 360.0 # degrees
vertical_fov_up: float = 10.0 # degrees above horizon
vertical_fov_down: float = 30.0 # degrees below horizon
horizontal_resolution: float = 0.2 # degrees between horizontal samples
max_range: float = 100.0 # meters
min_range: float = 0.5 # meters
range_noise_std: float = 0.02 # meters, Gaussian noise
dropout_rate: float = 0.02 # fraction of points to drop
mount_position: np.ndarray = None # [x, y, z] in ego frame
def __post_init__(self):
if self.mount_position is None:
self.mount_position = np.array([0.0, 0.0, 1.8]) # roof-mounted
4.2 Ray generation
def generate_ray_directions(config: LidarConfig) -> np.ndarray:
"""Generate unit ray direction vectors for all lidar beams.
Returns: (N, 3) array of ray directions in sensor frame.
"""
v_angles = np.linspace(
np.radians(-config.vertical_fov_down),
np.radians(config.vertical_fov_up),
config.num_beams,
)
num_h = int(config.horizontal_fov / config.horizontal_resolution)
h_angles = np.linspace(0, np.radians(config.horizontal_fov), num_h, endpoint=False)
directions = []
for v in v_angles:
for h in h_angles:
dx = np.cos(v) * np.cos(h)
dy = np.cos(v) * np.sin(h)
dz = np.sin(v)
directions.append([dx, dy, dz])
return np.array(directions)
4.3 Raycasting with Open3D
def simulate_lidar(scene_mesh, config: LidarConfig, ego_pose: Transform3D) -> np.ndarray:
"""Cast rays into scene and return point cloud in ego frame.
Returns: (N, 4) array of [x, y, z, intensity].
"""
# Create Open3D raycasting scene
ray_scene = o3d.t.geometry.RaycastingScene()
mesh_t = o3d.t.geometry.TriangleMesh.from_legacy(scene_mesh)
ray_scene.add_triangles(mesh_t)
# Generate rays in world frame
directions = generate_ray_directions(config)
sensor_pos_world = ego_pose.apply(
config.mount_position.reshape(1, 3)
).flatten()
origins = np.tile(sensor_pos_world, (len(directions), 1))
dirs_world = ego_pose.rotation.rotation_matrix @ directions.T
dirs_world = dirs_world.T
rays = np.hstack([origins, dirs_world]).astype(np.float32)
rays_tensor = o3d.core.Tensor(rays)
# Cast rays
result = ray_scene.cast_rays(rays_tensor)
t_hit = result['t_hit'].numpy()
# Filter valid hits
valid = (t_hit > config.min_range) & (t_hit < config.max_range)
# Add noise
rng = np.random.default_rng()
noise = rng.normal(0, config.range_noise_std, t_hit.shape)
t_hit_noisy = t_hit + noise
# Apply dropout
dropout_mask = rng.random(len(t_hit)) > config.dropout_rate
valid = valid & dropout_mask
# Compute hit points in world frame
hit_points = origins[valid] + dirs_world[valid] * t_hit_noisy[valid, np.newaxis]
# Transform to ego frame
hit_ego = ego_pose.inverse.apply(hit_points)
# Intensity (simplified: based on distance)
intensity = 1.0 - (t_hit_noisy[valid] / config.max_range)
return np.column_stack([hit_ego, intensity])
Step 5: Auto-Labeling System (3 hours)
Goal: Generate ground-truth annotations automatically from the scene graph.
5.1 3D bounding boxes
# synth_data/auto_label.py
from dataclasses import dataclass
from typing import List, Optional
import numpy as np
@dataclass
class BoundingBox3D:
"""3D bounding box annotation."""
center: np.ndarray # (3,) center in world frame
dimensions: np.ndarray # (3,) [length, width, height]
rotation: "Quaternion" # orientation
class_name: str
instance_id: int
num_lidar_pts: int = 0
def extract_3d_boxes(scene: "SceneGraph", frame_idx: int) -> List[BoundingBox3D]:
"""Extract 3D bounding boxes for all actors at a given frame."""
boxes = []
for actor in scene.dynamic_actors:
pose = actor.pose_at_frame(frame_idx)
box = BoundingBox3D(
center=pose.translation + np.array([0, 0, actor.dimensions[2] / 2]),
dimensions=actor.dimensions,
rotation=pose.rotation,
class_name=actor.class_name,
instance_id=actor.instance_id,
)
boxes.append(box)
for obj in scene.static_objects:
if obj.class_name in ("building", "ground"):
continue # skip non-annotatable static objects
box = BoundingBox3D(
center=obj.transform.translation + np.array([0, 0, obj.dimensions[2] / 2]),
dimensions=obj.dimensions,
rotation=obj.transform.rotation,
class_name=obj.class_name,
instance_id=obj.instance_id,
)
boxes.append(box)
return boxes
5.2 2D box projection and visibility
def project_boxes_to_2d(
boxes_3d: List[BoundingBox3D],
camera: "Camera",
ego_pose: "Transform3D",
) -> List[Optional[List[float]]]:
"""Project 3D boxes to 2D axis-aligned bounding boxes in camera image.
Returns list of [x_min, y_min, x_max, y_max] or None if not visible.
"""
results = []
for box in boxes_3d:
corners_3d = get_box_corners(box.center, box.dimensions, box.rotation)
projected = camera.project_points(corners_3d, ego_pose)
visible = projected[:, 2] > 0 # depth > 0 means in front of camera
if not np.any(visible):
results.append(None)
continue
uv = projected[visible, :2]
x_min, y_min = np.clip(uv.min(axis=0), 0,
[camera.intrinsics.width - 1, camera.intrinsics.height - 1])
x_max, y_max = np.clip(uv.max(axis=0), 0,
[camera.intrinsics.width - 1, camera.intrinsics.height - 1])
# Minimum box size filter
if (x_max - x_min) < 5 or (y_max - y_min) < 5:
results.append(None)
continue
results.append([float(x_min), float(y_min), float(x_max), float(y_max)])
return results
def get_box_corners(center, dimensions, rotation):
"""Compute 8 corners of a 3D bounding box."""
l, w, h = dimensions / 2
corners_local = np.array([
[ l, w, -h], [ l, -w, -h], [-l, -w, -h], [-l, w, -h], # bottom
[ l, w, h], [ l, -w, h], [-l, -w, h], [-l, w, h], # top
])
corners_world = (rotation.rotation_matrix @ corners_local.T).T + center
return corners_world
5.3 Semantic and instance segmentation
During rendering, maintain a parallel "label buffer" alongside the RGB buffer. Each pixel stores the class ID and instance ID of the rendered object:
def generate_segmentation_maps(
scene: "SceneGraph",
camera: "Camera",
ego_pose: "Transform3D",
frame_idx: int,
) -> tuple:
"""Render semantic and instance segmentation maps.
Returns:
semantic_map: (H, W) uint8 array of class IDs
instance_map: (H, W) int32 array of instance IDs
"""
H, W = camera.intrinsics.height, camera.intrinsics.width
semantic_map = np.zeros((H, W), dtype=np.uint8)
instance_map = np.zeros((H, W), dtype=np.int32)
depth_map = np.full((H, W), np.inf, dtype=np.float32)
CLASS_TO_ID = {
"vehicle.car": 1, "vehicle.truck": 2, "vehicle.bus": 3,
"human.pedestrian": 4, "vehicle.bicycle": 5,
"static.traffic_sign": 6, "static.tree": 7,
}
# For each object, project its mesh triangles and fill the label buffer
# (simplified -- production code would use GPU rasterization)
for actor in scene.dynamic_actors:
pose = actor.pose_at_frame(frame_idx)
class_id = CLASS_TO_ID.get(actor.class_name, 0)
# ... rasterize mesh, fill semantic_map and instance_map
# where depth < existing depth_map value
return semantic_map, instance_map
5.4 Depth map generation
def generate_depth_map(
scene_mesh,
camera: "Camera",
ego_pose: "Transform3D",
) -> np.ndarray:
"""Generate a dense depth map via raycasting from the camera.
Returns: (H, W) float32 array of depths in meters. Inf where no hit.
"""
H, W = camera.intrinsics.height, camera.intrinsics.width
K_inv = np.linalg.inv(camera.intrinsics.matrix)
# Generate pixel ray directions
u, v = np.meshgrid(np.arange(W), np.arange(H))
pixels = np.stack([u, v, np.ones_like(u)], axis=-1).reshape(-1, 3)
ray_dirs_cam = (K_inv @ pixels.T).T # (H*W, 3) in camera frame
# Transform to world frame
T_world_cam = ego_pose @ camera.extrinsics
ray_dirs_world = T_world_cam.rotation.rotation_matrix @ ray_dirs_cam.T
ray_dirs_world = ray_dirs_world.T
ray_dirs_world /= np.linalg.norm(ray_dirs_world, axis=1, keepdims=True)
cam_pos_world = T_world_cam.translation
origins = np.tile(cam_pos_world, (len(ray_dirs_world), 1))
# Cast rays using Open3D
ray_scene = o3d.t.geometry.RaycastingScene()
mesh_t = o3d.t.geometry.TriangleMesh.from_legacy(scene_mesh)
ray_scene.add_triangles(mesh_t)
rays = np.hstack([origins, ray_dirs_world]).astype(np.float32)
result = ray_scene.cast_rays(o3d.core.Tensor(rays))
t_hit = result['t_hit'].numpy().reshape(H, W)
return t_hit
Step 6: nuScenes Format Export (3 hours)
Goal: Write a complete nuScenes-compliant dataset to disk.
6.1 Token generation
nuScenes uses 32-character hex tokens as primary keys:
import hashlib
import uuid
def generate_token() -> str:
"""Generate a unique 32-char hex token."""
return uuid.uuid4().hex
def deterministic_token(seed_string: str) -> str:
"""Generate a deterministic token from a string (for reproducibility)."""
return hashlib.md5(seed_string.encode()).hexdigest()
6.2 NuScenesExporter class
# synth_data/nuscenes_export.py
import json
import os
from pathlib import Path
from typing import List, Dict
import numpy as np
class NuScenesExporter:
"""Exports synthetic data to nuScenes format."""
def __init__(self, output_dir: str, version: str = "v1.0-synth"):
self.output_dir = Path(output_dir)
self.version = version
self.tables = {
"scene": [],
"sample": [],
"sample_data": [],
"sample_annotation": [],
"instance": [],
"category": [],
"attribute": [],
"sensor": [],
"calibrated_sensor": [],
"ego_pose": [],
"log": [],
"map": [],
"visibility": [],
}
self._setup_directories()
self._init_static_tables()
def _setup_directories(self):
"""Create nuScenes directory structure."""
dirs = [
self.output_dir / self.version,
self.output_dir / "samples" / "CAM_FRONT",
self.output_dir / "samples" / "CAM_FRONT_LEFT",
self.output_dir / "samples" / "CAM_FRONT_RIGHT",
self.output_dir / "samples" / "CAM_BACK",
self.output_dir / "samples" / "CAM_BACK_LEFT",
self.output_dir / "samples" / "CAM_BACK_RIGHT",
self.output_dir / "samples" / "LIDAR_TOP",
self.output_dir / "sweeps",
]
for d in dirs:
d.mkdir(parents=True, exist_ok=True)
def _init_static_tables(self):
"""Initialize category, attribute, visibility, sensor tables."""
# Categories
categories = [
"vehicle.car", "vehicle.truck", "vehicle.bus",
"human.pedestrian.adult", "human.pedestrian.child",
"vehicle.bicycle", "vehicle.motorcycle",
"movable_object.barrier", "movable_object.trafficcone",
]
for cat in categories:
self.tables["category"].append({
"token": deterministic_token(cat),
"name": cat,
"description": "",
})
# Visibility levels
for level, desc in [(1, "0-40%"), (2, "40-60%"), (3, "60-80%"), (4, "80-100%")]:
self.tables["visibility"].append({
"token": str(level),
"level": desc,
"description": f"Visibility {desc}",
})
# Sensors
sensor_configs = [
("CAM_FRONT", "camera"), ("CAM_FRONT_LEFT", "camera"),
("CAM_FRONT_RIGHT", "camera"), ("CAM_BACK", "camera"),
("CAM_BACK_LEFT", "camera"), ("CAM_BACK_RIGHT", "camera"),
("LIDAR_TOP", "lidar"),
]
for channel, modality in sensor_configs:
self.tables["sensor"].append({
"token": deterministic_token(channel),
"channel": channel,
"modality": modality,
})
def add_scene(self, scene: "SceneGraph", cameras, lidar_config, rendered_data):
"""Export a complete scene with all frames and annotations."""
scene_token = generate_token()
sample_tokens = []
for frame_idx in range(scene.num_frames):
sample_token = generate_token()
timestamp = int(frame_idx * 0.5 * 1e6) # microseconds
# Create ego_pose record
ego = scene.ego_trajectory[frame_idx]
ego_pose_token = generate_token()
self.tables["ego_pose"].append({
"token": ego_pose_token,
"timestamp": timestamp,
"translation": ego.translation.tolist(),
"rotation": [ego.rotation.w, ego.rotation.x,
ego.rotation.y, ego.rotation.z],
})
# Create sample_data for each sensor
# ... (camera images, lidar point clouds)
# Create sample_annotation for each visible object
# ... (3D boxes in ego frame)
sample_tokens.append(sample_token)
# Link samples with prev/next
for i, token in enumerate(sample_tokens):
# ... set prev/next pointers
# Write scene record
self.tables["scene"].append({
"token": scene_token,
"name": scene.name,
"description": "Synthetically generated scene",
"log_token": generate_token(),
"nbr_samples": scene.num_frames,
"first_sample_token": sample_tokens[0],
"last_sample_token": sample_tokens[-1],
})
def save(self):
"""Write all JSON table files to disk."""
version_dir = self.output_dir / self.version
for table_name, records in self.tables.items():
filepath = version_dir / f"{table_name}.json"
with open(filepath, "w") as f:
json.dump(records, f, indent=2)
print(f"Saved {sum(len(v) for v in self.tables.values())} "
f"records across {len(self.tables)} tables")
6.3 Validation with nuscenes-devkit
After export, verify the dataset loads correctly:
from nuscenes.nuscenes import NuScenes
def validate_export(output_dir: str, version: str = "v1.0-synth"):
"""Load the exported dataset with nuscenes-devkit and run checks."""
nusc = NuScenes(version=version, dataroot=output_dir, verbose=True)
print(f"Scenes: {len(nusc.scene)}")
print(f"Samples: {len(nusc.sample)}")
print(f"Annotations: {len(nusc.sample_annotation)}")
# Verify we can traverse the data
scene = nusc.scene[0]
sample_token = scene["first_sample_token"]
while sample_token:
sample = nusc.get("sample", sample_token)
# Check each sensor has data
for channel in ["CAM_FRONT", "LIDAR_TOP"]:
assert channel in sample["data"], f"Missing {channel} data"
sample_token = sample.get("next", "")
if not sample_token:
break
print("Validation passed.")
Step 7: Domain Randomization Engine (2 hours)
Goal: Make generated scenes diverse enough to improve model generalization.
7.1 Randomization manager
# synth_data/randomization.py
import numpy as np
from dataclasses import dataclass, field
from typing import Dict, Any, List
@dataclass
class RandomizationConfig:
"""Full configuration for domain randomization."""
seed: int = 42
lighting: Dict[str, Any] = field(default_factory=lambda: {
"sun_elevation_range": [10, 80],
"sun_azimuth_range": [0, 360],
"intensity_range": [0.6, 1.0],
})
weather: Dict[str, Any] = field(default_factory=lambda: {
"options": ["clear", "light_rain", "heavy_rain", "fog"],
"weights": [0.5, 0.2, 0.15, 0.15],
})
actors: Dict[str, Any] = field(default_factory=lambda: {
"num_vehicles_range": [3, 30],
"num_pedestrians_range": [0, 20],
"lateral_jitter_m": 0.3,
})
textures: Dict[str, Any] = field(default_factory=lambda: {
"vehicle_colors": [
[0.9, 0.9, 0.9], # white
[0.1, 0.1, 0.1], # black
[0.7, 0.7, 0.7], # silver
[0.8, 0.1, 0.1], # red
[0.1, 0.2, 0.7], # blue
],
})
class DomainRandomizer:
"""Applies domain randomization to scene generation."""
def __init__(self, config: RandomizationConfig):
self.config = config
self.rng = np.random.default_rng(config.seed)
def sample_lighting(self) -> Dict[str, float]:
cfg = self.config.lighting
return {
"sun_elevation": self.rng.uniform(*cfg["sun_elevation_range"]),
"sun_azimuth": self.rng.uniform(*cfg["sun_azimuth_range"]),
"intensity": self.rng.uniform(*cfg["intensity_range"]),
}
def sample_weather(self) -> str:
cfg = self.config.weather
return self.rng.choice(cfg["options"], p=cfg["weights"])
def sample_vehicle_color(self) -> List[float]:
colors = self.config.textures["vehicle_colors"]
idx = self.rng.integers(0, len(colors))
# Add slight per-channel jitter
color = np.array(colors[idx]) + self.rng.normal(0, 0.03, 3)
return np.clip(color, 0, 1).tolist()
def apply_weather_effects(self, image: np.ndarray, weather: str) -> np.ndarray:
"""Apply post-processing weather effects to a rendered image."""
if weather == "clear":
return image
elif weather == "fog":
fog_density = self.rng.uniform(0.3, 0.7)
fog_color = np.array([200, 200, 210], dtype=np.float32)
blended = image.astype(np.float32) * (1 - fog_density) + fog_color * fog_density
return np.clip(blended, 0, 255).astype(np.uint8)
elif weather in ("light_rain", "heavy_rain"):
# Add rain streaks and darken image
factor = 0.85 if weather == "light_rain" else 0.65
darkened = (image.astype(np.float32) * factor).astype(np.uint8)
# Add rain streaks (simplified)
num_streaks = 200 if weather == "light_rain" else 800
for _ in range(num_streaks):
x = self.rng.integers(0, image.shape[1])
y = self.rng.integers(0, image.shape[0] - 20)
length = self.rng.integers(10, 30)
darkened[y:y+length, x] = np.minimum(
darkened[y:y+length, x].astype(int) + 40, 255
).astype(np.uint8)
return darkened
return image
7.2 Configuration files
Store randomization parameters in YAML for reproducibility:
# configs/default.yaml
pipeline:
num_scenes: 50
frames_per_scene: 20
frame_rate: 2.0 # Hz
randomization:
seed: 42
lighting:
sun_elevation_range: [10, 80]
sun_azimuth_range: [0, 360]
intensity_range: [0.6, 1.0]
weather:
options: [clear, light_rain, heavy_rain, fog]
weights: [0.5, 0.2, 0.15, 0.15]
actors:
num_vehicles_range: [3, 30]
num_pedestrians_range: [0, 20]
lateral_jitter_m: 0.3
textures:
vehicle_color_pool: [white, black, silver, red, blue, grey, green]
road_surface_variants: 8
sensors:
cameras:
width: 1600
height: 900
fx: 1266.4
fy: 1266.4
lidar:
num_beams: 32
max_range: 100.0
horizontal_resolution: 0.2
noise_std: 0.02
dropout_rate: 0.02
Step 8: Pipeline Orchestration and Validation (2 hours)
Goal: Wire everything together into a batch generation tool with quality checks.
8.1 Pipeline orchestrator
# synth_data/pipeline.py
import yaml
import time
from pathlib import Path
from tqdm import tqdm
class SyntheticDataPipeline:
"""End-to-end synthetic data generation pipeline."""
def __init__(self, config_path: str, output_dir: str):
with open(config_path) as f:
self.config = yaml.safe_load(f)
self.output_dir = Path(output_dir)
self.randomizer = DomainRandomizer(
RandomizationConfig(**self.config.get("randomization", {}))
)
self.cameras = create_nuscenes_camera_rig()
self.lidar_config = LidarConfig(**self.config.get("sensors", {}).get("lidar", {}))
self.exporter = NuScenesExporter(str(self.output_dir))
self.stats = GenerationStats()
def generate(self):
"""Run the full generation pipeline."""
num_scenes = self.config["pipeline"]["num_scenes"]
print(f"Generating {num_scenes} scenes...")
for scene_idx in tqdm(range(num_scenes)):
# 1. Compose scene
scene = generate_random_scene(self.config["randomization"], self.randomizer.rng)
weather = self.randomizer.sample_weather()
# 2. Render all frames
rendered = self._render_scene(scene, weather)
# 3. Generate labels
labels = self._label_scene(scene)
# 4. Export to nuScenes format
self.exporter.add_scene(scene, self.cameras, self.lidar_config, rendered)
# 5. Update stats
self.stats.update(scene, labels)
# Finalize
self.exporter.save()
self._run_validation()
self._write_report()
def _render_scene(self, scene, weather):
"""Render all sensors for all frames in a scene."""
rendered_data = {"cameras": {}, "lidar": []}
for frame_idx in range(scene.num_frames):
ego_pose = scene.ego_trajectory[frame_idx]
# Render cameras
for cam in self.cameras:
img = render_camera_image(
self._build_scene_meshes(scene, frame_idx),
cam, ego_pose,
)
img = self.randomizer.apply_weather_effects(img, weather)
rendered_data["cameras"].setdefault(cam.name, []).append(img)
# Render lidar
pc = simulate_lidar(
self._build_combined_mesh(scene, frame_idx),
self.lidar_config, ego_pose,
)
rendered_data["lidar"].append(pc)
return rendered_data
def _label_scene(self, scene):
"""Generate all labels for a scene."""
all_labels = []
for frame_idx in range(scene.num_frames):
boxes_3d = extract_3d_boxes(scene, frame_idx)
boxes_2d = {}
for cam in self.cameras:
boxes_2d[cam.name] = project_boxes_to_2d(
boxes_3d, cam, scene.ego_trajectory[frame_idx]
)
all_labels.append({"boxes_3d": boxes_3d, "boxes_2d": boxes_2d})
return all_labels
8.2 Quality validation checks
@dataclass
class GenerationStats:
"""Track statistics across the generation run."""
total_scenes: int = 0
total_frames: int = 0
total_annotations: int = 0
class_counts: Dict[str, int] = field(default_factory=dict)
weather_counts: Dict[str, int] = field(default_factory=dict)
def update(self, scene, labels):
self.total_scenes += 1
self.total_frames += scene.num_frames
for frame_labels in labels:
for box in frame_labels["boxes_3d"]:
self.total_annotations += 1
self.class_counts[box.class_name] = (
self.class_counts.get(box.class_name, 0) + 1
)
def run_quality_checks(output_dir: str, stats: GenerationStats) -> Dict[str, bool]:
"""Run quality validation on the generated dataset."""
checks = {}
# 1. Format compliance -- does nuscenes-devkit load it?
try:
nusc = NuScenes(version="v1.0-synth", dataroot=output_dir, verbose=False)
checks["format_valid"] = True
except Exception as e:
checks["format_valid"] = False
# 2. Annotation coverage -- every frame has at least one annotation
empty_frames = 0
for sample in nusc.sample:
anns = nusc.get_sample_data(sample["data"]["LIDAR_TOP"])[2]
if len(anns) == 0:
empty_frames += 1
checks["annotation_coverage"] = empty_frames / len(nusc.sample) < 0.05
# 3. Class distribution -- no single class > 80% of annotations
total = sum(stats.class_counts.values())
max_fraction = max(stats.class_counts.values()) / total if total > 0 else 0
checks["class_balance"] = max_fraction < 0.8
# 4. Calibration consistency -- intrinsics match across samples
checks["calibration_consistent"] = True # verify programmatically
return checks
8.3 CLI interface
# synth_data/cli.py
import argparse
def main():
parser = argparse.ArgumentParser(description="Synthetic Data Generation Pipeline")
parser.add_argument("--config", type=str, default="configs/default.yaml",
help="Path to generation config YAML")
parser.add_argument("--output", type=str, default="output/nuscenes_synth",
help="Output directory for generated dataset")
parser.add_argument("--num-scenes", type=int, default=None,
help="Override number of scenes to generate")
parser.add_argument("--validate-only", action="store_true",
help="Only run validation on existing dataset")
parser.add_argument("--seed", type=int, default=None,
help="Override random seed")
args = parser.parse_args()
if args.validate_only:
validate_export(args.output)
return
pipeline = SyntheticDataPipeline(args.config, args.output)
if args.num_scenes:
pipeline.config["pipeline"]["num_scenes"] = args.num_scenes
if args.seed:
pipeline.randomizer = DomainRandomizer(
RandomizationConfig(seed=args.seed)
)
pipeline.generate()
if __name__ == "__main__":
main()
Usage:
# Generate 50 scenes with default config
python -m synth_data.cli --config configs/default.yaml --output output/nuscenes_synth
# Quick test with 3 scenes
python -m synth_data.cli --num-scenes 3 --seed 123
# Validate an existing export
python -m synth_data.cli --validate-only --output output/nuscenes_synth
Notebook Exercises
| # | Notebook | Focus | Time |
|---|---|---|---|
| 1 | 01_scene_composition.ipynb | Build a scene graph from scratch. Place vehicles and static objects. Visualize the scene in 3D with Open3D. Experiment with randomized placement. | 60 min |
| 2 | 02_sensor_rendering.ipynb | Set up a pinhole camera and render images from the scene. Configure a lidar sensor and generate point clouds. Visualize multi-camera and lidar outputs side by side. | 60 min |
| 3 | 03_auto_labeling.ipynb | Extract 3D bounding boxes from the scene graph. Project to 2D for each camera. Generate semantic segmentation and depth maps. Overlay labels on rendered images for visual validation. | 60 min |
| 4 | 04_nuscenes_export.ipynb | Export a generated scene to nuScenes format. Load and browse the dataset with nuscenes-devkit. Apply domain randomization and export a batch of diverse scenes. Run quality validation checks. | 60 min |
Each notebook includes:
- Step-by-step code cells with detailed comments
- Visualization cells showing intermediate results
- Challenge exercises for deeper exploration
- A "check your work" cell comparing output against reference values
Expected Deliverables
- Python package (
synth_data/) -- modular library with scene graph, sensors, labeling, export, and randomization modules. Installable viapip install -e .. - CLI tool --
python -m synth_data.clifor batch dataset generation with configurable parameters. - nuScenes-compliant dataset -- at least 50 scenes (1000 frames) that load and browse correctly with
nuscenes-devkit. - Visualization toolkit -- scripts/notebooks to render annotated images, point clouds with bounding boxes, and segmentation overlays.
- Diversity report -- generated summary showing class distribution histograms, weather/lighting variation coverage, and per-scene statistics.
- Unit tests -- test suite covering transform math, projection correctness, label consistency, and export format validity.
Evaluation Criteria
| Criteria | Weight | Description |
|---|---|---|
| Pipeline Completeness | 25% | The pipeline runs end-to-end: scene composition through sensor rendering through labeling through nuScenes export. All eight implementation steps are functional. |
| Label Accuracy | 25% | Auto-generated 3D boxes match scene graph object poses exactly. 2D projections are geometrically correct. Segmentation maps assign the correct class and instance IDs. |
| Format Compliance | 20% | Exported datasets load without errors in nuscenes-devkit. All required JSON tables are present and correctly linked. Sensor data files exist at referenced paths. |
| Randomization | 15% | Generated datasets show meaningful variation in weather, lighting, actor count, and object appearance. Randomization is controlled by config and reproducible with a fixed seed. |
| Code Quality | 15% | Code is modular with clear separation of concerns. Functions have docstrings and type hints. Key operations have unit tests. Configuration is externalized to YAML. |
Related Deep Dives
- Synthetic Data for AD Perception Training -- the theoretical foundation for this project: domain randomization, domain adaptation (FDA, CyCADA), mixed training strategies, and cost-benefit analysis of synthetic data.
- Sensor Simulation for Autonomous Driving -- advanced sensor modeling: physically-based camera simulation, lidar beam models, radar cross-section, and sensor fusion considerations.
Next Steps
After completing this project, consider these follow-up tracks:
-
Domain Adaptation Benchmark -- Train a perception model (e.g., PointPillars or CenterPoint) on your synthetic dataset, evaluate on real nuScenes data, then implement domain adaptation techniques (feature-level alignment, adversarial training) to close the sim-to-real gap. Measure mAP improvement from each technique.
-
Minority Class Augmentation -- Use your pipeline to specifically generate rare actors (motorcycles, construction vehicles, animals, wheelchairs) and inject them into real training datasets. Evaluate whether targeted synthetic augmentation improves per-class recall on the long tail.
-
Physics-Based Sensor Simulator -- Upgrade from simplified rendering to physically-based simulation: ray-traced camera images with realistic materials and lighting, physics-based lidar models with beam divergence and material reflectance (BRDF), and radar simulation with doppler and multi-path effects.
-
Scenario-Driven Generation -- Instead of random scenes, build a scenario specification language (e.g., "vehicle cuts in from left lane at 20 m ahead") and generate targeted scenes for safety validation and testing. Integrate with the Waymax scenario format for closed-loop evaluation.