Training Data-Hungry AI Algorithms
Large-scale data refinement is key to bringing more sophisticated automated-driving functions to series production.
Training the algorithms of Artificial Intelligence (AI)-based systems for autonomous or highly automated driving requires enormous volumes of data to be captured and processed. The algorithms must be able to master numerous challenges so that self-driving cars can detect all essential details of their environment, make the right decisions and safely take people to their destination.
Why does training require this much data? AI-based systems enable quick progress, but this progress slows down after a certain point. It must be ensured that systems can also sensibly and reliably handle rare events. Bringing sophisticated AI-based driving functions to the road safely therefore requires a growing amount of ever higher-quality data.
In general, AI functions in production systems must cross a very high reliability threshold before they can be used in real systems. This particularly applies to automated driving because the associated safety risks are extremely high. The tragic accidents involving Tesla and Uber drivers are admonishing reminders of this. Consequently, registration authorities require ever stricter validation measures for high-quality driving functions to ensure their correct function. Since these validation measures are often based on accumulating actual road mileage/kilometers, they require large volumes of data.
A practical example
The following calculation example illustrates the data volumes that must be processed to validate a combined radar + camera sensor system. In the example, the correct functioning of the sensor system must be demonstrated over 300,000 km (about 186,400 mi) with a predefined mix of driving scenarios. The sensor system provides an object list that describes the vehicle's environment.
To check if the sensor system object list is correct, a comparison list called “ground truth” is required. This list is generated using a reference sensor set consisting of a camera and lidar sensor. The reference sensor set is installed in the vehicle as a rooftop box in addition to the sensors to be checked. The data stream of the reference sensor set is transferred to the ground truth object list by means of an annotation process. By systematically comparing the two object lists, deviations are detected and corrective measures can be derived.
The estimation of the annotation effort, which is still primarily done by humans, is based on the following assumptions regarding the parameters of the reference sensor setup.
- Lidar frequency: 10 frames/second
- Achieved average speed: 45 km/h
- Average number of objects (vehicles, pedestrians, cyclists, motorcycles) per frame: 30 objects/frame
Travel time in seconds: 300,000 km / 45 km/h = 6,666 h = 23,997,600 s
Number of frames: 23,997,600 s * 10 f/s = 239,976,000 f
Number of objects: 239,976,000 f * 30 o/f = 7,199,280,000 o
This results in an annotation effort of about 7 billion objects. State-of-the-art systems achieve a throughput of approximately 2.5 s per object in a 3-D space (Figure 4). The time required is measured end-to-end, i.e., across all annotation, review, and correction steps, from import to export.
Time in seconds: 7,199,280,000 o * 2.5 s/o = 17,998,200,000 s
Work: 17,998,200,000 s = 4,999,500 h = 624,937 t
Assuming a project duration of 3 years (at 220 working days per year), an annotation team with an average size of 950 people would be required.
The challenge therefore lies in providing a data annotation system that is capable of continuously providing 950 people with sufficient work while enabling an average processing time of 2.5 s per object for the specified target quality – across all employees and skill levels. In addition, the average processing time per object (and thus the costs) must be further reduced in the course of the project.
Production line for annotations
In order to master the described challenge for training and validation projects in a production line fully automatically, the following critical process steps must be carried out without media changes:
Data Collection: A sufficient number of vehicles must collect data on the road so that the required data volumes can be brought in and made available for further processing. The AUTERA system offered by dSPACE allows for the reliable recording of sensor and telemetry data in the vehicle and seamless data transfer.
Data Preparation: Before the recorded data can be further processed, it must be anonymized in accordance with the data protection regulation applicable to the respective region. This means that identifying features such as license plates and faces, and sometimes GPS data, must be made unrecognizable. Ideally, anonymization is fully automated and carried out with sufficient speed so it does not become a bottleneck for the entire project. For this task, understand.ai offers a UAI AnonymizerTM solution that meets these requirements and has proven reliable in numerous projects.
Data Selection: For training projects, sensible data reduction is essential. This means that exactly those scenarios must be selected that offer the greatest learning gain for the algorithms to be trained. This process is relatively simple at the beginning of the development phase, but Fig. 1 shows how the achieved progress slows down considerably after a certain point. At the beginning of the training many of the recorded scenarios are new for the algorithm or are not yet available in sufficient numbers.
Rapid progress is being made at this stage. However, as the algorithms mature, it becomes increasingly difficult to ‘find’ interesting data. It is then probably possible to record valuable new scenarios only every few thousand kilometers. This requires ever larger test fleets. At this point at the latest, the common approaches using dedicated test fleets with their own test drivers become the limiting factor. Tesla is going a different way: Thanks to sensors that are installed by default, customers gather the data with their vehicles. This allows Tesla to access an almost unlimited amount of data – including the ‘rare events’ mentioned at the beginning.
Data Annotation: In the next step, the selected data has to be annotated. The result: high-precision, detailed object lists. Although offline annotation (as opposed to real-time detection in the vehicle while driving) can make use of a variety of tools and state-of-the-art technology with almost unlimited computing power, a relevant part work still has to be done manually. In the above example, even with 95% automation, 395,954,000 objects still have to be annotated manually. Various automation strategies are used to reduce the level of manual work as much as possible and to achieve the necessary throughput for the specified quality requirements.
Regression, a term used in annotation, refers to the accuracy of fit (“box tightness”) of the annotation. Common labeling requirements assume a tolerance of 2-4 pixels. understand.ai uses deep-learning-based techniques to adjust boxes automatically and precisely.
Automatic object detection can help save time in the localization and identification of objects. Especially in a 3-D space, localizing objects far away from the sensor with only low lidar coverage can take a long time. However, this technology is not yet robust enough to do without manual review. For certain object properties, such as turn signals, brake lights, traffic signs, hoods etc., understand.ai also uses AI-based systems to save a considerable amount of time and improve quality.
Interpolation, extrapolation, propagation
A high degree of automation can be achieved by using interpolation, in which only keyframes are annotated manually; intermediate frames are annotated automatically. Objects that appear or disappear on intermediate frames are a particular challenge. Standard interpolation methods are not very useful either because they are too imprecise. Model-based interpolation, extrapolation and propagation methods are one approach to a solution. With these methods, the algorithms learn typical movement patterns for object classes and derive natural movement sequences from them. The interpolation rates can thus be increased significantly.
A crucial component for an annotation project is the workflow management system. It must be sufficiently versatile to cover a wide range of project requirements. understand.ai uses individual modules to create workflows of any complexity. These modules can either result in manual work packages or perform tasks automatically.
The system also must be readily available to ensure the permanent flow of data through the annotation production line. Any system downtime would result in idle time for the 950 labeling experts in the example. It must have mechanisms to distribute the work to the labeling crew as intelligently as possible. This ensures that the team is working on the right tasks at the right time to move the project forward.
Increasingly complex driving functions require more and more sophisticated systems to process the associated wave of data promptly, cost-effectively and at the highest quality. Sophisticated driving functions can be brought to series production only with the right data in the right quality and quantity. The systems for large-scale data refinement can help master this challenge.
Daniel Rödler is product manager at understand.ai, a dSPACE company launched in 2017 to make artificial intelligence more accessible for real-world applications.