RBE 3002: Unified Robotics IV: Navigation, Mapping & Planning
Worcester Polytechnic Institute
2015
01
01 Background
RBE 3002: Unified Robotics IV[1] was the fourth and final course in WPI's unified robotics sequence, and the one that shifted the whole model of how we worked. The first three unified courses were about putting hardware together and making it respond to a commanded setpoint — the robot knew where it was, what it was doing, and what it was supposed to do next. RBE 3002 removed those luxuries. The robot now operated in an unknown environment, with noisy sensors, dead-reckoning drift, and no prior map. The central question became: given only what the robot can observe from here, how does it figure out where it is, what's around it, where it can go, and how to get there?
Recommended background was RBE 3001, plus ES 3011 (systems) and MA 2621 or MA 2631 for the probability theory that underlies every estimator on a real robot. Labs ran as a single open-ended, term-long project culminating in an autonomous exploration-and-navigation challenge in an unknown poster-board room. This was, hands-down, my favorite class of the sequence.
02
02 Platform: The TurtleBot
The class platform was the first-generation WPI TurtleBot — an iRobot Create (a Roomba with the vacuum internals stripped out) as the drive base, with a stack of laser-cut deck plates carrying an onboard Ubuntu laptop, a battery pack, and a Microsoft Xbox Kinect rigidly mounted at the top. The vacuum was gone, but everything else the Roomba had to offer came out over a serial connection: bump sensors, cliff sensors, wheel encoders for odometry, motor current and voltage telemetry, even the cliff-detection IR pairs. Before the Kinect ever opened its eyes, we already had a rich low-level telemetry stream from the drive base.
The Kinect was the real perception package. It gave us a color RGB stream and a time-of-flight depth channel aligned with it, so each frame was a dense point cloud of the scene in front of the robot. We used the previous-generation Kinect (Kinect v1) for this course — I cover the sensor's depth-sensing architecture in more detail on my Computer Vision page, where I later worked with the newer Kinect v2. For RBE 3002, the Kinect's job was simpler: turn a forward-facing cone of space into a stream of “there is something at distance d in direction θ” observations that the mapping stack could fold into a world model. Unlike newer TurtleBots that ship with a 360° LIDAR and see the full surround in one scan, our first-gen Kinect only saw what was directly in front of the robot — the platform had to physically rotate to check its flanks and behind itself, which added a real coverage concern every exploration plan had to account for.
03
03 Ubuntu 14.04 & the ROS Node Graph
The other big departure from the earlier courses was the software environment. RBE 3001 had us writing bare-metal C against an Atmel microcontroller, managing interrupts and peripherals by hand. RBE 3002 ran on Ubuntu 14.04 with the Robot Operating System (ROS) — a publisher/subscriber middleware that turns a robot into a graph of independent processes (“nodes”) exchanging typed messages on named channels (“topics”). Sensor data, command velocities, coordinate-frame transforms, and map updates all flowed through this bus. Writing code meant writing a new node that subscribed to the topics it needed, did its work, and published its own results for the next node downstream.
The mental model shift was significant. Instead of reasoning about a single super-loop with interrupt handlers for each peripheral, we reasoned about a distributed system of asynchronous processes, all running at their own rates, coordinating through shared coordinate frames managed by the tf tree (map → odom → base_link → camera_link, each frame reconstructed from its parent by the appropriate transform). Getting the tf tree right turned out to be the difference between maps that closed loops cleanly and maps that spiraled into nonsense.
04
04 SLAM & Occupancy Grid Mapping
The umbrella name for what the robot is doing here is SLAM — Simultaneous Localization and Mapping. It is classically framed as a chicken-and-egg problem[3]: to localize, the robot needs a map; to build a map, the robot needs to know where it is. The resolution is to run both processes together and let each correct the other every update cycle. Odometry proposes a pose, the proposed pose projects the Kinect scan into the world, the scan refines the map and (by matching against what the map already says) the pose itself. Neither mapping nor localization is ever “done” in isolation — they advance together.
With depth frames streaming in and wheel odometry giving a rough pose, the first problem was representing what the robot had seen. We used the classical occupancy grid: discretize the world into a fixed-resolution 2D grid, and mark each cell with one of three states — free (the robot has observed space pass through this cell; value 0 in ROS's nav_msgs/OccupancyGrid), occupied (a sensor ray terminated in this cell against a solid surface; value 100), or unknown (no observation yet; value -1). Each new Kinect frame, projected through the current pose estimate, updates the cells along each depth ray: cells between the robot and the hit point become free, the cell at the hit point becomes occupied.
The grid representation makes a deliberate trade: you lose the exact geometry and continuous position of each obstacle in exchange for an enormous simplification downstream. The robot doesn't have to model a wall as a parametric line segment or a polygon; it just has to check the cell under its foot and the cells along a candidate path. Every planner, estimator, and visualizer in the stack reads from the same grid.
The heavy lifting of scan alignment and incremental map construction was handled by ROS's gmapping package — a laser-based SLAM implementation that consumed the Kinect's depth stream as a planar scan. We watched the map paint itself live in RViz, the ROS 3D visualizer, running on the TurtleBot's onboard laptop — occupancy grid growing, robot pose updating, planned paths and laser rays drawn over the top. That feedback loop was essential for debugging both the algorithms and the physical behavior of the robot during test runs.
05
05 Kalman Filter Localization
A Kalman filter is the canonical sensor-fusion algorithm for robot state estimation. It combines two (or more) noisy, imperfect measurements of the same quantity with a motion model of how the robot is expected to move, producing an estimate that is strictly better than any single source alone. It acts as a predictor-corrector — minimizing uncertainty with every update — and it's the standard way to do real-time pose estimation in robotics and the backbone of ROS's localization stack.
On the TurtleBot, the two sources being fused were wheel-encoder odometry and the Kinect depth sensor. Odometry gave a dead-reckoning estimate off the drive base that drifted linearly with translation and quadratically with rotation — accurate in the short term but unreliable over a lap around a room. The Kinect gave an independent observation by scan-matching each depth frame against the map-so-far — noisier per-frame, but not subject to drift because it was grounded in actual geometry. Neither source was trustworthy alone.
Every update cycle, the filter ran two steps. A prediction step advanced the pose estimate using the commanded motion through the robot's motion model, inflating the uncertainty ellipse as odometry drift accumulated. A correction step then pulled the prediction toward the Kinect observation, shrinking the ellipse back down. The fused output — a corrected pose — was the estimate the rest of the stack read from[2].
Kalman-filter pose fusion in one diagram — odometry prediction (top, dashed) drifts along its own curve; observation (bottom, dashed) gives an independent read against the map; the corrected pose (center) is the weighted fusion, better than either alone. [3]
06
06 Frontier Exploration
With a pose and a map-in-progress, the robot can start making decisions about where to go. In an unknown environment there is no goal cell yet — only frontiers, the boundary cells between known-free space and unknown space[3]. We identified frontier cells using a breadth-first search (an expanding-wavefront sweep) outward from the robot's current position: any unknown cell adjacent to at least one observed-free cell is a frontier candidate. Adjacent frontier cells were then clustered into contiguous groups, and the robot picked one to visit.
Picking strategy is a weighted combination of two factors — size (bigger frontier clusters reveal more unknown territory per visit) and distance (closer frontiers are cheaper to reach). The robot plans a path to the chosen frontier, drives there, and lets the Kinect fill in the newly observed region. The edges of what it just saw become new frontiers; old ones are consumed. Iterate. When no frontiers remain, exploration is complete — the robot has built a map of everything reachable from its starting pose.
We also compared frontier exploration against a drunkard's walk (random wander with occasional goal resets) — useful for quantifying how much the structured approach bought us in coverage speed. The frontier strategy was markedly better, especially in rooms with non-rectangular walls where random exploration wasted huge fractions of its motion doubling back on itself.
Frontier cells (purple arrows) — the boundary between known-free space and unexplored unknown. The robot picks one to visit, drives there, and the edges of what it discovers become new frontiers. [3]
07
07 Path Planning: A*, C-Space & Cost Maps
Before the robot can plan paths, it has to deal with the fact that it is not a point. The TurtleBot has a physical radius, a turning circle, and a couple of sensor extents that stick out past its drive base. If the planner treats the robot as a point and routes a path through a gap narrower than the robot, the robot gets wedged.
The classical workaround is configuration-space dilation: take the occupancy grid, inflate each obstacle cell by the robot's footprint radius, and let the planner continue to treat the robot as a point — but on an expanded map where it can't accidentally route through a gap its body wouldn't fit through. A further refinement layers a graduated cost on top of the hard inflation — cells close to walls are allowed but expensive, so the planner naturally prefers to center itself in hallways rather than skim one wall. That's the cost map.
On that expanded, cost-weighted grid, finding a path from the robot's pose to a goal cell (either a frontier chosen by the exploration strategy or an explicit user waypoint) is a graph-search problem. We implemented A* (A-star), the standard heuristic-guided shortest-path algorithm, with cell-to-cell traversal cost combining the hard “is this cell free in the expanded grid” test, the graduated hallway-centering cost, and Euclidean distance to the goal as the heuristic. A* produced an optimal path in the cost-weighted sense, which then got handed to the motion-control layer to actually drive it.
An A* path (solid black line) from the robot to a chosen goal (purple dot), traced through the expanded occupancy grid. [3]
08
08 Pure Pursuit & Local Obstacle Avoidance
Following an A* path means turning a sequence of grid cells into smooth, continuous wheel commands. We used pure-pursuit steering: pick a look-ahead point some fixed distance ahead of the robot along the planned path, steer the wheels toward it, slide the look-ahead point forward as the robot catches up, repeat. The look-ahead distance is a single tuning knob with a clean tradeoff — too long, and the robot cuts around sharp corners; too short, and it oscillates side-to-side as it over-corrects for tiny path errors[3].
A local obstacle-avoidance layer ran on top of pure-pursuit for safety: it swept the Kinect's forward field of view for obstacles that had entered the planned path since A* last ran, and nudged the steering away from them in real time. A*'s plan represented everything the robot knew at plan time, but the world could change between plans — this layer closed the gap between the map-at-plan-time and the world-at-drive-time.
Pure-pursuit navigation following an A* path on the cost-map-weighted grid. The path stays centered in open space instead of skimming walls — the graduated hallway-centering cost at work. [3]
Local obstacle avoidance in the Kinect's forward field of view — nearby obstacle cells (outlined) push the steering away from them in real time. [3]
09
09 Monte Carlo Localization
Kalman filters work well when you already have a rough estimate of where the robot is — they're local, incremental, and assume a Gaussian posterior. But not every situation fits that profile. If the robot just booted up and has no idea where it is on its map, or if the posterior is multimodal because several parts of the map look similar, a single Gaussian ellipse will just collapse onto the wrong answer.
Monte Carlo Localization (MCL) is the particle-filter approach to the same problem. It estimates a robot's 2D pose — position (x, y) and orientation (θ) — within a known map, without assuming the posterior has any particular shape. Hundreds of candidate poses are scattered across the map, each one a guess at where the robot might be. Every iteration, each particle is updated by the robot's commanded motion (roughly where each particle would now be if it were the true pose), then weighted by how well the current Kinect scan would match the map from that pose. The filter resamples — preferentially keeping high-weight particles and dropping low-weight ones — and over a few seconds of driving, the particle cloud collapses onto the robot's actual position and orientation.
The “kidnapped robot” problem falls out of this for free: if the pose estimate drifts inconsistent with incoming observations, the filter re-scatters and re-localizes automatically. In production systems running the ROS navigation stack against a pre-built map, MCL (or its adaptive variant, AMCL) is the primary localization layer — not just an edge-case recovery tool. In the SLAM / exploration phase itself, scan-matching inside the SLAM package (such as gmapping) handles the continuous pose update.
Monte Carlo Localization in action — hundreds of candidate pose particles scatter across the map, then collapse onto the robot's true pose after a few seconds of driving and scan-matching. [3]
10
10 The Final Challenge at WPI Robot Pits
The final project tied everything together in the WPI Robot Pits, the open bay where the robotics department ran its labs. Our TA built an abstract “room” out of three-fold poster boards — deliberately non-rectangular, with internal walls and cul-de-sacs, so the rectangular corridors that any sensible student would have optimized for were out of the question. The robot started at a known pose with no prior map, and the challenge ran in three phases: first, autonomously explore and map the entire unknown space using frontier exploration; second, return to the starting pose using the map it had just built; third, navigate to a new random goal inside the mapped space using A* on the final cost map.
My teammates for this were Beth Martino and Will Manning — the two best people I could have asked for on this project. Long debug sessions on the pit floor, laptop in lap, watching the occupancy grid paint itself in RViz while the TurtleBot whirred around our poster-board walls. It's the class that made probabilistic robotics stick for me, and the class that's stayed closest to how I think about autonomy problems ever since.
Panoramic view of the WPI Robot Pits — our test room and working area for the term-long project.
11
11 Further Watching
Kai Nakamura's five-minute walkthrough of the same RBE 3002 project in a later cohort — a clean recap of the whole concept stack.