Why Bimanual Data Collection Is Harder

In single-arm data collection, a bad demonstration affects only one arm's trajectory. You record 50 demos, discard 5 bad ones, and train on 45. In bimanual data collection, a mistake at the handoff point invalidates both arms' trajectories for that demo simultaneously. The failure modes are coupled.

This coupling has two practical implications. First, you need more demonstrations — 100 instead of 50 — because bimanual tasks have higher variance and the policy needs more examples to learn the coordination structure. Second, you need stricter consistency per demonstration. A single-arm demo that's 80% consistent trains reasonably well. A bimanual demo where one arm is consistent and the other varies teaches the policy nothing useful about coordination timing.

The workspace coverage challenge is also greater: you need both arms in frame, and the handoff point — the highest-complexity moment — must be reliably captured by at least one camera. Check your camera angles before starting and adjust if the handoff occurs outside the workspace camera's field of view.

LeRobot Bimanual Dataset Format

The DK1 integration with LeRobot extends the standard single-arm format with dual joint-state arrays. Each timestep in the dataset contains:

# Bimanual dataset observation keys per timestep: observation.joint_states.left # shape: (6,) — left follower joint angles in radians observation.joint_states.right # shape: (6,) — right follower joint angles in radians observation.gripper.left # shape: (1,) — left gripper position [0=open, 1=closed] observation.gripper.right # shape: (1,) — right gripper position observation.images.workspace # shape: (H, W, 3) — workspace overhead/front camera observation.images.wrist # shape: (H, W, 3) — primary wrist camera action.joint_states.left # shape: (6,) — target left joint angles action.joint_states.right # shape: (6,) — target right joint angles action.gripper.left # shape: (1,) action.gripper.right # shape: (1,)

The key difference from single-arm: the action space is 14-dimensional (6+6 joints + 2 grippers). ACT handles this natively — you specify the action dimension in the training config and no other changes are required.

Recording Workflow

source ~/dk1-env/bin/activate # Start a recording session — 100 episodes for the cube handoff task python -m lerobot.scripts.record \ --robot-path ~/dk1-config.yaml \ --robot-type dk1_bimanual \ --fps 50 \ --root ~/dk1-datasets \ --repo-id cube-handoff-v1 \ --num-episodes 100 \ --warmup-time-s 3 \ --episode-time-s 30 \ --reset-time-s 5 # --warmup-time-s: time after pressing record before capture starts (use this to position the cube) # --episode-time-s: max demo length — cube handoff should complete in under 20s; 30s gives buffer # --reset-time-s: time between episodes to return arms to home and reposition the cube

Run 10–15 practice demos before starting the recording session to warm up your motor memory for the task. The first 5–10 recorded demos will be your worst — that's expected. Do not stop to review them during the session; review and cull bad demos after the full 100 are recorded.

Quality Checklist for Bimanual Data

Review every demo after recording using LeRobot's replay viewer. Discard any demo that fails two or more of these criteria:

Arm sync at handoff Both arms must be within 3cm of the intended handoff point simultaneously. Async handoffs where one arm waits for the other teach the policy to pause — which transfers poorly.
Consistent start position The cube must start within 2cm of the same position for every demo. Use the tape marks from Unit 1. Variance in start position forces the policy to generalize before it has learned the core task.
Clean grasp — both arms Each arm must achieve a stable grasp before moving to the next phase. A slipping grasp mid-transfer creates a trajectory that is impossible for the policy to replicate reliably.
Home pose return Both arms must return cleanly to the home pose at the end of each demo. Demos that end mid-motion create a dataset where episode boundaries are ambiguous.
Workspace camera coverage The handoff moment must be visible in the workspace camera frame. If the robot bodies occlude the view, adjust the camera angle before continuing.
Consistent timing Episode duration should vary by no more than ±5 seconds across demos. Large timing variance indicates inconsistent execution and produces a dataset with high action-space entropy.
Target dataset size: 100 demos is the recommended minimum for bimanual ACT training. Research results suggest that bimanual tasks require roughly 2× the data of comparable single-arm tasks because the joint coordination structure is more complex and the action space is larger. If after training in Unit 5 your success rate is below 40%, collecting another 50 targeted demos is the first thing to try.

Unit 4 Complete When...

You have 100 recorded demonstrations in LeRobot format at ~/dk1-datasets/cube-handoff-v1/. After reviewing and culling, at least 90 demos pass the quality checklist. Both joint state arrays are present at 50Hz for every episode. Both camera feeds are present and show the full task sequence including the handoff moment. You have run python -m lerobot.scripts.visualize_dataset --repo-id cube-handoff-v1 and confirmed the dataset structure is valid.