Sensor Fusion


Localization for a car is when the position and pose of a car is contextualized in a shared co-ordinate system (map). It is important in autonomous driving because if a car is accurately localized in a map, it can get context on the relevant fixed infrastructure such as regulatory signs, relevant signal lights, and lane markings and understand their relationships to the car. It is important to localize the car in all six dimensions {x, y, z, roll, pitch, yaw}. Civil Maps projects the map data into the field of view of a real time sensor such as a camera or LiDAR to help the decision engine to correctly contextualize 3D features such as stop signs, signal lights, and lane markings.

Traditionally, other localization techniques utilize differential Global Positioning System (GPS) and Inertial Measurement Unit (IMU) to place a car in the environment. The GPS technique of localization relies on a time of the signal from multiple positioning satellites.

GPS signal has to travel over 12,000 miles of atmosphere to reach the small GPS receiver on the cars. Since the weather conditions in the atmosphere change constantly, this error between GPS readings of the same place at different times.

The error in GPS is substantial in urban environments where there are buildings that do not let the signal from the GPS satellites reach the GPS receiver directly, the error commonly referred to as multi path issue. The IMU starts drifting significantly if it does not get correction from an orthogonal approach such as GPS.

Other mapping providers mainly focus on lateral (x) and longitudinal (y) localization. Their maps are also in 2 dimensions and do not allow for 6 degrees of freedom {x, y, z, roll, pitch, yaw}. There are a few companies that provide 6 degrees of freedom in localization using SLAM. However, SLAM is not the correct approach since the amount of raw sensor data is very large. The task of performing real-time spatial transforms on two coordinate systems using large data sets (GBs/km) is very expensive to compute and requires a significant memory. In an attempt to solve this, many providers have used multiple GPUs, but these limit them to speeds of less than 40 kph.

 

Current vision-based localization techniques rely on having pre-cached sensor data of the environment. As the car enters the environment with pre-cached sensor data, the car compares the sensor data it sees in the environment to the cached sensor data, and based on the section of sensor data that matched, the car can correct its position. However, storing and comparing sensor data is computationally expensive. 1 kilometer of raw sensor data is roughly 3-5 gigabytes. To compare such high volumes of data, the autonomous car needs 7-10 GPUs to localize at a speed of 40 kph. To support these GPUs, the car needs roughly 1200 W of dedicated power.

At Civil Maps, we have developed a technique that utilizes signatures, reducing gigabytes of data into kilobytes of data. Using this approach we can then localize a car with six degrees of freedom using an off the shelf Arm Cortex processor in real time. This Arm Cortex processor is used in many iPhone and iPad devices, consuming 5-10 W of dedicated power.

Civil Maps’ signature based localization in 6 dimensions relies on compressing raw sensor data into lightweight signatures. The compression converts 3-5 gigabytes of raw sensor data per kilometer into 100 kilobytes of signature data. The small data footprint of signatures allows us to render and compare them rapidly and localize a car using a single consumer grade CPU. Besides CPU, signature based localization significantly reduces the power required for localization. In other words Civil Maps does SLAM on fingerprints (kilobytes per kilometer) while other mapping providers do SLAM on raw point cloud data (gigabytes per kilometer).

Map Projections

Once an autonomous car is locked on to it’s position and pose, it can use a priori information such as a map to annotate or place bounding boxes into the real sensor space. In other words, we can take map data that exists in a UTM coordinate system and project it into the pixel space of a camera, or into the vehicle reference frame of the LiDAR. The ability to move back and forth between the geo-reference frame and the vehicle reference frame allows for the usage of a priori information to drastically reduce the computational overhead on an autonomous car.

In our first example, Civil Maps is taking map data and localizing the car. Since the front facing camera on the car is placed on a rigid body, the relative distance between the IMU and the camera is known. The angular differences between the camera and the IMU are also known because the camera is mounted on the body of the car. Using a 4×4 translational & rotational matrix the map vectors from the IMU coordinate system can be projected into the camera’s field of view. By annotating the pixel values of the camera, the map data can be leveraged to create regions of interest around traffic signals, speed limit signs and other signage.

In the future, Civil Maps can also audit the maps in real time using this technology with LiDARs and cameras. Using the real time projections into camera and LiDAR space, Civil Maps can compare what it is seeing in real time sensor space with what is stored in a priori map. If there is a discrepancy, an update can be sent to the cloud and shared with other cars. If there is no discrepancy, the cellular bandwidth usage is saved because the car doesn’t need to transfer any information.