Validation Methods for Automated Driving (VMAD) Informal Working Group: Key Concepts and Work Plan for Subgroup 1a – Traffic Scenarios
Submitted for consideration by Canada
Building on the scenarios work undertaken to date by VMAD, Canada has prepared this paper to help inform discussions by VMAD and SG1a on the development of a scenarios catalogue for validating automated vehicle (AV) safety. Specifically, the paper seeks to clarify key concepts and issues associated with scenarios development. It also outlines a number of considerations for developing a scenarios catalogue work plan, for further discussion by VMAD and SG 1a.
In order for the international community to maximize the potential safety benefits of AVs, a robust safety validation framework, that can be adopted by parties of both the 1958 and the 1998 UN vehicle regulations agreements, must be established. Such a framework must provide clear direction for assessing the safety of AVs in a manner that is repeatable, objective and evidence-based, while remaining technology neutral and flexible enough to foster ongoing innovation by the automotive industry.
At this relatively early stage in the development of AVs, much of the existing literature that assesses the current state of AV development uses metrics such as miles/kilometers travelled in real-world test situations with the absence of a collision, a legal infraction, or a disengagement by the vehicle’s automated driving system (ADS).
Simple metrics such as kilometers travelled without a collision, legal infraction, or disengagement can be helpful for informing public dialogue about the general progress being made to develop AVs. Such measurements on their own however, do not provide sufficient evidence to the international regulatory community that an AV will be able to safely navigate the vast array of different situations a vehicle could reasonably be expected to encounter.
In fact, some observers have suggested that an AV would have to drive billions of miles in the real-world to experience an adequate number of situations without an incident to prove that it has a significantly better safety performance than a human driver (Kalra & Paddock, 2016). Safety validation through such testing would not be cost and time effective, nor would it be feasible to replicate the testing later on.
What is scenario-based validation testing?
A scenario-based approach, by contrast, can help to systematically organize safety validation activities in an economical, objective, repeatable, and scalable manner.
Scenario-based testing, as it applies to AVs, involves reproducing specific real-world situations that exercise and challenge the capabilities of an AV to safely operate in a given operational design domain (ODD)  /operational domain (OD)  . Scenarios include a dynamic driving task (DDT) or sequence of DDTs. The DDT can be planned (e.g., make a left turn) or unplanned (e.g., in response to another vehicle cutting in). Scenarios can also involve a wide range of elements, such as different roadway layouts; interactions with a variety of different types of road users and objects exhibiting static or diverse dynamic behaviours; and, diverse environmental conditions (among many others factors).
The use of scenarios can be applied to different testing methodologies, such as virtual simulation, closed track, and real-world testing. Together these methodologies provide a multifaceted testing architecture, with each methodology possessing its own strengths and weaknesses. As a result, some scenarios may be more appropriately tested using certain test methodologies over others.
For example, while real-world testing provides a high-degree of environmental fidelity. However, as previously noted, a scenario-based testing methodology using only real-world testing could be costly, time consuming and difficult to replicate, and depending on the scenario and state of the AV’s development, could pose safety risks. Track testing may be a more appropriate method to test new prototype systems and to run higher risk scenarios without exposing other road users to potential harm. Test scenarios can also be more easily replicated in a closed track environment compared to the real-world. That said, track scenarios can be potentially difficult to develop and implement, especially if there are numerous or complex scenarios, involving a variety of scenario elements.
Virtual simulation testing, by contrast, is more scalable, cost-effective and efficient compared to track or real-world testing, allowing a test administrator to easily create a wide range of scenarios and permutations, including complex scenarios where a diverse range of elements are examined. Simulations however may have lower environmental fidelity than the other methodologies. Simulation software may also vary in quality and tests could be difficult to replicate across different simulation platforms.
Refer to Annex A for additional information regarding the strengths and weaknesses for each test methodology.
Developing Scenarios: Key Concepts & Issues
How are scenarios identified?
Scenario-based validation methods must include an adequate representation of relevant, critical, and complex scenarios to effectively validate an AV. There are a number of approaches for identifying which scenarios could be used to validate the safety of an AV. These include identifying safety-critical scenarios based on an analysis of human driver behaviour, including evaluating naturalistic driving data, analyzing collision data such as law enforcement and insurance companies’ crash databases  , as well as analyzing traffic patterns in specific operational design domains, (e.g., by recording and analyzing road user behaviour at intersections). Recognizing that situations that challenge a human might not challenge an AV, and vice versa, AV scenario identification also requires assessing the behavioural competencies of an AV in real-world situations (Fraade-Blanar et al., 2018) . Real-world testing is also important for identifying unexpected edge cases – scenarios that may be uniquely challenging to that vehicle’s specific ADS  .
Classifying scenarios according to their level of abstraction
The amount of information that is included in a scenario can be extensive. For example, the description of a scenario could contain information specifying a wide range of different actions, characteristics and elements, such as how the vehicle interacts with objects (e.g., vehicles, pedestrians), roadways, and environments, as well as pre-planned courses of action and major events that should occur during the scenario. It is therefore critical that a standardized and structured language for describing scenarios is established so that AV stakeholders understand the goal of a scenario, each other’s objectives, and the capabilities of an ADS.
Depending on the context in which a scenario is being described, the level of detail/abstraction may differ. For example, stakeholders involved during the concept phase of identifying a scenario likely only require a high-level description of the situation (e.g., that an ego vehicle will merge onto a two-lane highway). Whereas, during the implementation/testing phase of a scenario, a stakeholder would likely require more specific measurements (e.g., the precise width of the highway, the speed of the vehicle).
One approach that researchers have established for developing a standardized and structured language for describing scenarios, which also incorporates different levels of abstraction/detail, is classifying scenarios according to three categories functional, logical, and concrete scenarios.
Functional Scenario: Scenarios with the highest level of abstraction, outlining the core concept of the scenario, such as a basic description of: the ego vehicle’s actions; the interactions of the ego vehicle with other road users and objects; roadway geometry, and other elements that compose the scenario (e.g. environmental conditions etc.). This approach uses accessible language to describe the situation and its corresponding elements/parameters (Menzel, Bagschik, & Maurer, 2018). Refer to figure 1 for an example of a functional scenario.
Logical Scenario: Building off the elements/parameters identified within the functional scenario, developers generate a logical scenario by selecting value ranges or probability distributions for each element/parameter within a scenario (e.g., the possible width of a lane in meters). The logical scenario description covers all elements and technical requirements necessary to implement a system that solves these scenarios (Menzel, Bagschik, & Maurer, 2018). Refer to figure 1 for an example of a logical scenario.
Concrete Scenarios: Concrete scenarios are established by selecting specific values for each element/parameter. This step ensures that a specific test scenario is reproducible. In addition, for each logical scenario with continuous ranges, any number of concrete scenarios can be developed, helping to ensure a vehicle is exposed to a wide variety of situations (Menzel, Bagschik, & Maurer, 2018). Refer to figure 1 for an example of a concrete scenario.
Figure 1. Examples of scenario during different stages of its development (Pegasus, 2018).
What information should be included in a scenario?
In order to ensure that AV stakeholders are considering including the appropriate elements/parameters in a scenario, researchers have proposed models for identifying and organizing the information. For instance, Pegasus (2018) have grouped these elements according to the two entities of traffic: the vehicle with the ADS and the traffic environment. The traffic environment or the traffic scenario contains several characterizing factors that can be split into six layers of a scenario (Pegasus, 2018):
1) Street layout and condition of the surface;
2) Traffic guidance infrastructure (e.g. signs, barriers and markings);
3) Overlay of topology and geometry for temporal construction sites;
4) Road users and objects, including interactions based on maneuvers;
5) Environment conditions (e.g. weather and daytime), including their influence on levels 1 to 4; and
6) Digital information (e.g. vehicle to everything information, digital map).
Considerations for establishing a scenarios database work plan
To support the development of a global regulatory framework for AVs, the safety of AVs must be validated using a common, minimum baseline of objective, reproducible and repeatable safety scenarios, derived from real-world situations that at an AV could foreseeably encounter during its operational lifespan.
For this reason, VMAD, and specifically the SG 1a group, has been tasked with developing a methodology to create a universal catalogue of scenarios for validating AV safety.
For VMAD to move forward with the development of this catalogue, a number of outstanding questions will need to be considered by the working group to develop a formal work plan to achieve this goal. This includes:
1) Determining the scope of the scenarios outlined in the catalogue,
2) Establishing the phases of work to be undertaken, as well as the development of supporting components of the methodology, and
3) Creating a dictionary of terms for describing scenarios and its various elements in a consistent manner.
These and other considerations are discussed in greater detail below.
1) Determining the scope/level of abstraction of the scenarios outlined in the catalogue
As noted earlier in this paper, one method of classifying scenarios is according to three levels of abstraction (functional, logical, and concrete). Functional scenarios are those with the highest level of abstraction outlining the core concept of the scenario, including a basic description of: the ego vehicle’s actions; the interactions of the ego vehicle with other road users and objects; roadway geometry, and other elements that compose the scenario (e.g. environmental conditions etc.).
At this early stage in the development/implementation of the VMAD safety validation methodology,
SG 1a may wish to consider focusing its attention on developing functional scenarios for the universal catalogue. By limiting the scope of work to functional scenarios at this time, SG 1a would not define specific parameter ranges (logical scenarios) or specific values for the scenario elements (concrete scenarios) that are tested via simulation, track and real-world testing. 
2) Taking a phased approach to developing the scenarios catalogue
It can be difficult to capture and organize the scenarios that need to be considered when validating the safety of a vehicle. To help tackle this problem, VMAD should consider developing the scenarios catalogue in phases, such as organizing its work plan according to different ODD/OD types, while also considering complex situations and edge cases.
Operational Design Domains/Operational Domains:
The market deployment of ADS is expected to involve a diverse range of systems, with different ODDs, such as highway driving, urban driving, low-speed driving in controlled settings, etc.
As a result, VMAD could develop the scenarios catalogue in phases, organizing its work plan according to different OD types. For example, VMAD could consider focusing its attention at first on the OD of a limited access highway scenarios. This would also allow VMAD to leverage and build upon existing scenarios work undertaken to date by UNECE for L2 (ALKS) systems, now adapting and expanding upon these in the context of L3-5 systems. Other categories of scenarios could include ODs, such as other highway types (e.g., rural highways), urban driving environments, automated valet parking systems, controlled environments (e.g. for low speed vehicle use cases) among others (Nowakowski et al., 2014).
In line with the OD approach, the group could also consider other categories of scenarios such as complex situations that are reasonably foreseeable for an AV to encounter, including
- construction zones,
- accident scenes,
- interactions with law enforcement and first responders, and
- encountering objects or other road users on the road, including those exhibiting unexpected behaviours that may violate traffic rules.
It should be noted that some literature on AVs refers to these more complex scenarios as edge cases. However, as these are reasonably foreseeable events that are known to be challenging for a wide range of ADS under development, it is arguably more appropriate to treat them as their own scenario subsets within a VMAD catalogue that AV developers should be expected to thoroughly address as part of the development of their systems.
In addition to examining categories of complex scenarios, VMAD should also consider what role it might play in providing guidance on truly rare “edge cases” and incorporating known edge case scenarios experienced by developers into a VMAD catalogue. A n edge case is a rare situation that still requires specific design attention for it to be dealt with by the AV in a reasonable and safe way. The quantification of “rare” is relative, and generally refers to situations or conditions that will occur often enough in a full-scale deployed fleet to be a problem but may have not been captured in the design process. Edge cases can be individual unexpected events, such as the appearance of a unique road sign, or an unexpected animal type on a highway.
Any scenarios, guidance, or procedures for considering edge cases would of course need to recognize that many edge cases would be unique to specific ADS. However, sharing experiences with such edge cases, where possible, may still provide a useful point of reference for validating the safety of other systems.
3) Developing a common dictionary of scenarios terms
As previously noted, developing a common dictionary of scenario terms is a critical step towards ensuring AV stakeholders have a consistent understanding of the goal of a scenario, each other’s objectives, and the capabilities of an ADS.
In order to facilitate sharing and replicating scenarios internationally, VMAD should consider developing a common language for sharing scenarios internationally. For instance, VMAD could consider establishing a common language to describe scenarios in their various levels of abstraction/detail, starting with terminology for functional scenarios then moving toward logical scenarios, and concrete scenarios. For each of these levels of abstraction, the dictionary should include terms for the various parameters/elements, including the technical terminology and metrics. Throughout this process, SG1a should work with the other sub working groups as well as FRAV to determine best practices. For instance, the simulation/audit group may identify best practices that existing simulation software use to describe scenarios at various levels of abstraction.
4) Developing best practices to inform and maintain the scenarios catalogue
Given the many approaches that can be used to identify scenarios, VMAD could consider documenting best practices and/or developing a formal strategy to inform how it will go about incorporating scenarios within the catalogue as well as updating it, based on experiences observed through real-world deployments of AVs.
Specific subsets of scenarios, such as those dealing with constructions zones, accident scenes, interactions with law enforcement and emergency personnel, as well as practices for identifying edge cases that should be accounted for in the catalogue may require concerted attention from the group. In some cases, it may require the development of special procedures or coordinating research where there are evidence gaps.
5) Developing a VMAD engagement strategy to inform catalogue development
Although VMAD is composed of a diverse membership of 1958 and 1998 agreement parties and stakeholders that are active in AV development, an engagement strategy that facilitates VMAD collaboration with external groups (such as international standards setting bodies, academia etc.) who are active in developing scenarios should also be considered. This could help to ensure that global expertise and existing scenarios work is leveraged where appropriate, while also minimizing the workload burden placed on individual VMAD members to support the catalogue’s development.
Conclusion/ Next Steps
As outlined throughout this paper, there are number of questions for VMAD to consider as part of the development of a formal work plan for establishing a scenarios catalogue, including:
1) What is the scenario catalogue’s scope?
2) What are the phases of work to be undertaken (e.g., develop scenarios in phases based on ODs, complex situations, and edge cases)
3) How are scenarios identified?
4) What are the supporting components of the methodology (e.g., a dictionary of terms for describing scenarios and its various elements in a consistent manner)?
5) What are the best practices for informing/updating the scenario catalog?
Throughout the process of addressing these questions and establishing a work plan, SG1a should work with the other sub working groups and FRAV to determine how the approach will support VMAD’s goal of developing a new assessment/test method of autonomous driving.
Simulations: use virtual environment with virtual agents to generate knowledge about an ADS’s behavior without the need for a physical vehicle in the real-world (Thorn et al, 2018).
• Controllability – Simulation affords an unmatched ability to control many aspects of a test.
• Predictability – Simulation is designed to run as specified, so there is little uncertainty as to how the test will run.
• Repeatability – Simulation allows a test to be run many times in the same fashion, with the same inputs and initial conditions.
• Scalability – Simulation allows for generation of a large number and type of scenarios.
• Efficiency – Simulation includes a temporal component, which allows it to be sped up faster than real time so that many tests can be run in a relatively short amount of time.
• Cost effectiveness – the costs of running a virtual simulation using a computer are less than using a physical vehicle and objects from the real-world, which could be damaged during testing.
• Safety benefits – simulation allows a test administrator to virtually evaluate how an AV will react in situations with object and individuals without putting them at risk
• Can be leveraged to inform testing requirements and prioritize test scenarios for additional testing using other techniques.
• Can easily allow fault injection to test failure modes and the system’s responses to those failures.
• It is difficult to model systems and physical properties with full fidelity.
• It is difficult to capture all the situations that a vehicle may be exposed to. Therefore, its environmental validity is lower.
Test Tracks: use a closed-access testing ground that uses real obstacles or obstacle surrogates to a production-level vehicle using actual sensors and software running on target platforms (Thorn et al, 2018).
• Controllability – Track testing allows for control over many of the test variables, including certain aspects of ODD and object and event detection and response (OEDR).
• Improved fidelity – Track testing involves functional, physical ADS and lifelike obstacles and environmental conditions.
•Reproducibility– Track testing scenarios can be replicated in different locations.
• Repeatability – Track testing allows for multiple iterations of tests to be run in the same fashion, with the same inputs and initial conditions.
• Road testing is an inefficient way to observe rare events manifesting by chance. Closed-course testing can accelerate exposure to known rare events by setting them up as explicitly designed test scenarios.
• Prolonged and costly –Track testing can take a significant amount of time to set up and execute, resulting in elevated costs.
• Limited variability – Track testing facility infrastructure and conditions may be difficult to modify to account for a wide variety of test variables (e.g., ODD conditions).
• Personnel and equipment needs – Track testing may need specialized test equipment (e.g., obstacle objects, measurement devices, safety driver).
• Potentially hazardous – Track testing with physical vehicles and real obstacles presents a potentially uncertain and hazardous environment to the test participants (e.g., safety driver and experiment observers).
• Conditions, such as weather and ambient lighting, cannot necessarily be controlled
Real World/ Open Road Testing: uses public roads to support testing and evaluation of ADS (Thorn et al, 2018).
• T he situations are extendable (i.e., there are a wide variety of scenarios with diverse of conditions)
• High ecological validity
• Lack of controllability – Public-road scenarios do not afford much, if any, control over OD and OEDR conditions.
• Lack of reproducibility – Public-road scenarios are difficult to replicate exactly in different locations.
• Lack of repeatability – Public-road scenarios are difficult to repeat exactly over multiple iterations.
• Limited scalability – Public-road scenarios may not scale up sufficiently.
• It is not practical to certify connected and automated vehicles by proving road tests alone. It may serve best as a final step in the safety validation process.
• Expensive, time-consuming
Fraade-Blanar, L., Blumenthal, M.S., Anderson, J.M., & Kalra, N., (2018). Measuring automated vehicle safety: Forging a framework. RAND Corporation.
Kalra, N., & Paddock S.M. (2016). Driving to safety: How many miles would it take to demonstrate autonomous vehicle reliability? RAND Corporation.
Czarnecki, K. (2018). Automated driving system (ADS) task analysis: Part 2: Structured road maneuvers. Waterloo Intelligent Systems Engineering (WISE) Lab: University of Waterloo.
Menzel, T., Bagschik, G., & Maurer, M. (2018). Scenarios for development , test and validation of automated vehicles. Institute of Control Engineering Technische Universitat Braunschweig.
Najm, W.G., Smith, J.D., & Yanagisawa, M. (2007). Pre-crash scenario typology for crash avoidance research. National Highway Traffic Safety Administration .
Nowakowski, C., Shladover, S., Chan, C.-Y., & Tan, H.-S. (2014). Development of California Regulations to Govern Testing and Operation of Automated Driving Systems. Journal of the Transportation Research Board , 137-144.
Pegasus (2018). The Pegausus method. Retrieved from Pegasus - https://www.pegasusprojekt.de/en/pegasus-method
Thorn, E., Kimmel, S., & Chaka, M. (2018). A framework for automated driving system testable cases and scenarios. National Highway Traffic Safety Administration .
 ODD is defined by SAE J3016 as the specific conditions under which a given driving automation system or feature thereof is designed to function, including, but not limited to, driving modes (SAE, 2018).
 OD the foreseeable conditions vehicles may reasonably be expected to encounter when in use. ADS vehicles must be prepared to respond to these conditions (which may include not being operative or transferring control to the driver).
 For example, the United States Department of Transportation’s National Highway Traffic Safety Administration (NHTSA) analyzed data to identify the most common pre-crash scenarios for two light-vehicles (Najm et al., 2007).
 An edge case is a rare situation that still requires specific design attention for it to be dealt with by the AV in a reasonable and safe way. The quantification of “rare” is relative, and generally refers to situations or conditions that will occur often enough in a full-scale deployed fleet to be a problem but may have not been captured in the design process. Edge cases can be individual unexpected events, such as the appearance of a unique road sign, or an unexpected animal type on a highway. It should be noted that media reporting of automated vehicles often confuses edge cases with corner cases. The latter, corner case is a scenario at the very edges of the ODD, involving situations that occur outside of normal operating parameters. Specifically, corner cases present themselves when multiple environmental variables or conditions occur simultaneously at extreme levels within a parameter range. For example, a situation that an AV may face outside its operating parameters is an iced-over road, with a low sun angle, high winds, and a pedestrian in the roadway.
 In addition to determining what scenarios are appropriate to be tested via simulation, track and real world testing, the determination of parameter ranges and specific values for the scenario elements to be applied during testing may be something that is more appropriately determined by VMAD SG 2a and SG 2b. This would include identifying appropriate corner cases that test the extreme parameter ranges of different scenario elements. In addition, determining specific values to parameters may require engagement with FRAV to establish the functional requirements.