Accelerating AV Training Data and Testing

Decoupling from real-time data collection saves time and cost while adding flexibility and quality.

rFpro has developed colored segmentation that enables AI to learn to identify complex street furniture. (rFpro)

Generating reliable training data to support deep learning for autonomous vehicle (AV) artificial intelligence (AI) via real-world recorded scenarios can be expensive, time inefficient and inflexible. Driving simulation specialist rFpro wants that to change. The company has developed a new approach – de-coupled from real time – that it claims delivers more effective, cost efficient and accelerated AV training and testing.

True motion blur simulation of multi-exposure images by rFpro. (rFpro)

The new approach significantly reduces hardware costs, said Matt Daley, rFpro’s managing director. He said the industry needs to generate high-quality, simulated training data that can complement existing real-world recorded data. “Achieving the necessary quality is notoriously challenging,” Daley told SAE International. “Delivering dependable training data requires several key components, such as the vehicle model, sensor models, traffic simulator and the digital world content to work together, plus a robust simulation process to harness the available computing power.”

Matt Daley said his company’s new approach provides a cost-effective way of creating the same data with “zero errors and 10,000 times quicker than manual annotation.” (rFpro)

He explained that the company’s new system, referred to as “data farming,” removes the established industry dependence on manual annotation of test data, created frame by frame, which is both time-consuming and error prone. Many companies in the AV industry, Daley added, use a veritable “army” of people to manually annotate each frame of a video, lidar point or radar return, to identify objects in the scene (other vehicles, pedestrians, road markings, traffic signals) to create training data.

“Manual annotation takes around 30 minutes for each frame and may incur a 10% error rate,” he said. “Our new approach provides a cost-effective way of creating the same data with zero errors and 10,000 times quicker than manual annotation.”

Complete datasets

An example of rFpro’s scalable sensor simulation synchronized across channels. (rFpro)
rFpro describes this image as representing a highly accurate 3D model of real-world location with complex traffic and detailed sensor models. (rFpro)

Daley said that not constraining the simulation to run in real time enables users to build complete datasets that cover the full vehicle system where every sensor is simulated simultaneously using “only limited” hardware. The data is fully synchronized across all sensors, even with what he terms “the most complex hardware designs.” He regards this as essential where sensor fusion is employed to bring together data, “such as that from multiple 8K HDR stereo cameras, lidar and radar sensors at the same time.”

And decoupling the processing activities from real-time operation removes the need for large numbers of high-end processors, enabling the user to reset the balance between cost, speed and data quality to suit particular priorities. “For engineers, this puts it within a typical departmental budget, rather than requiring senior approval, making high-quality training and test data far more accessible,” Dailey explained.

The company’s new approach has been designed to facilitate full scalability, allowing expansion across multiple hardware resources when users are ready to accelerate their data production. The conventional approach of using real vehicles to provide training data for a neural network invariably limits the available data to the weather and traffic conditions prevailing at the time of testing.

But rFpro, which has strong motorsport links via its professional driver-in-the-loop simulator software, states that its system enables the simulation of all possible combinations of weather, lighting, traffic and pedestrians. This would not be possible with physical testing alone. The process also avoids the considerable logistical expense incurred gathering data from different real-world locations. rFpro’s “digital twin ” library delivers to development teams a variety of off-the-shelf models, including city streets, highways, rural and mountain roads.

According to Daley, there are benefits from running either faster or slower than real time where there is no requirement for driver or hardware-in the-loop operation. “Running slower than real time permits fewer processors of lower specification to produce the same high-quality data,” he said. Alternatively, simple data, such as that from a single sensor, could be run much faster than real time to produce results more quickly.