Sharper Testing Delivers Better AV Performance

As autonomous vehicle development programs grow in complexity, testing complexity and efficiency have become focal points. The Certus project offers a data-driven approach to scenario selection and risk analysis – increasing developers ’confidence in the performance of the vehicle, reducing testing time, and accelerating validation confidence.

Certus’ progression metrics include the progress rate (PR), mission time efficiency (MTE) and rule compliance (RC) (HORIBA MIRA)

As self-driving car technology moves towards commercial deployment, engineers encounter a scaling challenge that mirrors issues seen in other complex critical safety systems: validation. Demonstrating that autonomous vehicles operate safely and robustly across diverse real-world conditions requires testing capabilities that challenge conventional automotive validation methods.

Project Certus delivers a targeted response to this challenge. Established in 2023 with funding from the Centre for Connected & Autonomous Vehicles, this research and development project focuses on enhancing both the effectiveness and depth of autonomous vehicle testing while maintaining rigorous safety standards. The collaboration brings together HORIBA MIRA as the lead organization, working alongside Polestar, IPG Automotive, the Connected Places Catapult, and Coventry University. Recent developments have centred on advancing how confidence in system performance is quantified and integrated into engineering decision-making. Certus’ core principles include extracting greater understanding from reduced test quantities, facilitating superior engineering judgment, and minimizing both development timelines and expenses associated with deploying reliable autonomous vehicle technology.

The problem of coverage in scenario-based testing

Aaron Mandalia, HORIBA MIRA’s technical sales lead for connected and autonomous vehicles (CAV). (HORIBA MIRA)

Validating autonomous systems is not simply a matter of clocking more and more miles. Unlike conventional vehicles, where safety-critical features are relatively well-contained, automated driving functions must demonstrate dependable performance across a highly variable operational design domain (ODD). This includes everything from complex traffic patterns to shifting weather conditions and nuanced driver interactions.

Given the number of permutations involved, it becomes impractical to rely on large-scale scenario sampling alone. Even well-resourced teams can struggle to generate meaningful test coverage using traditional methods.

Certus introduces a more targeted strategy. Instead of attempting to test every possible scenario, it focuses on targeting testing in areas where there are high levels of uncertainty in

system performance. The project has built a toolchain that can identify gaps in scenario coverage, flag areas where uncertainty remains high, and generate new scenarios that are most likely to reveal meaningful performance insights.

Improving efficiency through scenario prioritization

During recent evaluations, the Certus methodology was compared with a conventional statistical approach that generated a broad spread of 2,500 scenarios. This conventional model delivered a 56% confidence rating in the performance of the system under test. Using the Certus toolchain, a sequence of just 550 intelligently-selected scenarios raised that confidence rating to 77%.

The approach begins with a randomized sample to establish an initial understanding of system performance. Subsequent test batches are then shaped by the results of earlier iterations, with the algorithm selecting scenarios that target the most uncertain areas of system behavior. This process ensures that every additional test makes a meaningful contribution to the process of evaluating system robustness.

For developers, the result is significant. With fewer test scenarios, they can achieve a higher level of confidence in system performance. In practical terms, this can reduce total test time by around 40%.

Moving beyond simple performance metrics

Isolated, key performance indicators (KPIs) alone are not enough. To make these results actionable, Certus introduces a set of evaluation tools that look beyond single performance indicators. These tools, referred to as oracles, bring together multiple key performance metrics to assess how the system is behaving from both a safety and operational standpoint.

The safety oracle combines multiple metrics – for example, the Time To Collision, Time Headway and Lane Offset – to evaluate and provide contextual understanding around the performance of a vehicle system. By using oracles in this way, developers can look beyond simple 'pass' and 'fail' criteria and harness fresh insights that allow them to assess whether a system's response was suitable for the specific circumstances it encountered. In some cases, it may be physically impossible to avoid an incident. But the vehicle system can still mitigate the impact as much as possible, meaning it behaved in the safestOracles that assess progression, on the other hand, focus on how efficiently a vehicle completes its journey and its compliance with the rules of the road. Progression metrics include the progress rate (PR) towards a goal before the termination of a scenario, mission time efficiency (MTE), which is based on when a vehicle arrives at its destination and how efficient it is in doing so, and rule compliance (RC), which measures how safely the vehicle follows the rules of the road. These oracles help evaluate how well an autonomous system meets tInstead of operating on a binary pass-fail basis against one benchmark, the system receives evaluations across multiple performance criteria. This approach facilitates more sophisticated analysis and empowers engineers to identify specific improvement opportunities while maintaining balance across different operational characteristics.

Understanding residual risk

A further strand of the Certus project involves determining residual risk, defined as the uncertainty that remains after a given set of tests. This aspect is especially important for developers who need to make decisions about whether to release a system into the market, invest in further development, or alter the scope of its intended use.

Project Certus brings together HORIBA MIRA, Polestar, IPG Automotive, the Connected Places Catapult, and Coventry University. (HORIBA MIRA)

The system approaches this by comparing the outcomes of similar scenarios. When minor variations in scenario parameters trigger significant shifts in system response, this can indicate inconsistency in performance and suggest additional development or validation work is necessary. On the other hand, when performance demonstrates stability across comparable test conditions, engineers can establish that the system operates predictably, giving confidence in the system’s performance.

This concept is useful not only for simulation-based validation but also for managing physical test programs. In both contexts, Certus brings automation that helps engineers to identify the 'high value' scenarios that need to be targeted. The tool chain also identifies appropriate scenarios for correlation activity to ensure engineers gain confidece in the simulation results.

Knowing what isn’t already known

One of the most valuable insights for developers isn’t what a system can do, but where limitations remain. Having the ability to quantify residual risk gives engineers the ability to identify where uncertainty may still remain within the system after a series of tests. This is the valuable insight that enables ‘system readiness ’evaluations to be made.

Certus introduces a set of evaluation tools, called oracles, that look beyond single performance indicators to assess how the system is behaving from both a safety and operational standpoint. (HORIBA MIRA)

Certus also uncovers how predictably a system behaves when exposed to similar, but not identical, scenarios. If small changes in traffic flow, road layout or vehicle dynamics lead to inconsistent responses, that signals a need for further investigation. If outcomes are stable, confidence that the system will perform as expected in the real world grows.

This approach supports a more targeted engineering approach. Teams can decide where to focus development, where additional testing is justified, or where limitations should be enforced within the ODD. In regulated environments, it also gives developers a defensible position backed by evidence.

Benefits for developer

From a commercial perspective, Certus offers a way to accelerate development without increasing risk. By providing clearer insights into where a system is performing well and where uncertainty remains, developers can make more strategic decisions about test planning, resource allocation and product readiness.

There is also a benefit in how Certus enables teams to better justify their decisions. When a product is released, stakeholders – from technical leads to certification bodies – need a defensible position on system safety. Confidence ratings and residual risk metrics provide the evidence base for these conversations.

Driving into the autonomous future

As vehicle autonomy becomes more ambitious, the tools required to support the technology’s development must evolve. The Certus tool chain is designed to scale with those ambitions. Its modular architecture and focus on adaptability make it well suited to support next-generation autonomous systems, including those operating at higher

Aaron Mandalia is HORIBA MIRA’s technical sales lead for connected and autonomous vehicles (CAV) and wrote this article for SAE Media.



Magazine cover
Automotive Engineering Magazine

This article first appeared in the September, 2025 issue of Automotive Engineering Magazine (Vol. 12 No. 7).

Read more articles from this issue here.

Read more articles from the archives here.