Drift Improvement with Reinforcement Training of Inertial Sensors

Analyzing the use of Reinforcement Learning (RL) to extend the holdover time of inertial sensors in the absence of a Global Navigation Satellite System (GNSS).

All of the hardware used for data collection is shown here, including the NUC data logging computer, Kearfott, SBG sensor, KVH sensor and compass. (Image: NIWC)

The main objective of the Drift Improvement through Reinforcement Training – Inertial sensors (DIRT-I) project is to extend the holdover time of inertial sensors in the absence of a Global Navigation Satellite System (GNSS), through the use of Reinforcement Learning (RL) or training. For the purposes of this document, the acronyms GNSS and GPS (Global Positioning System) are used interchangeably. This report is a continuation of the year one effort that was reported on previously. The year two effort (and this report) focus on the use of different inertial sensors with a wide range of performance specifications.

The goal was to determine if the RL system offered similar performance regardless of the inertial sensor being used, or if the inertial sensor’s performance limited the amount of improvement the RL system could offer. To answer this question, the same setup that was used in first year report was utilized for this work. The main difference is that data was logged from multiple inertial sensors (instead of one) and the same RL algorithm was used (rather than comparing multiple algorithms).

Inertial sensors are used to measure the acceleration (accelerometer) and angular velocity (gyroscope) of the platform the sensor is mounted on. For navigation purposes, it is common to have both types of inertial sensors on each of the X, Y and Z axes. The performance of inertial sensors varies widely. It is directly linked to the cost of the sensor, and to a lesser extent, the technology used to build the sensor. A supporting computer and associated software will also affect the system’s performance and cost. Generally speaking, inertial sensors based on microelectrome-chanical systems (MEMS) tend to have the lowest cost and worst performance. At the other end of the spectrum are Ring laser gyro (RLG) based inertial sensors and fiber optic gyro (FOG) based sensors. Traditionally, RLG sensors offered better performance than FOG sensors, but this is not always the case anymore. Obviously, this discussion is limited to standard commercial off the shelf (COTS) products.

Just like the previous effort, the data collections were divided into two types, including GNSS-enabled (Training Mode), and GNSS-denied (Testing Mode). In Training Mode, the GPS receiver provided an input to the inertial sensors being tested and their raw inertial measurements were recorded. For the purpose of this report, inertial measurements refer to acceleration and angular velocity (typically on all three axes). During the Testing Mode, the inertial sensors did not receive any input from the GPS receiver.

During post-processing, the RL system was trained using the data recorded while in Training Mode, then tested using the Testing Mode data to emulate a GNSS-denied situation. The position solutions from the Kalman filter for the GNSS-denied and RL-aided inertial measurements were then compared to the true GPS positions for each data point. This process was repeated for each inertialsensor, and the positional errors were compared to determine if the improvement due to the RL system was proportional to the inertial sensors’ performance, or if the RL system could offer greater improvement for lower-preforming sensors.

Unlike in the first year, only the trust region policy optimization (TRPO) algorithm was used for this effort. TRPO was found to be the best overall preforming algorithm during the first year of this effort. It focuses on the local optimization of the policy, with an approach that attempts to increase performance by using a trusted region rather than the gradient approach used in other algorithms. TRPO is often used with robot localization and video games.

This report shows that the DIRT-I system can be used with a wide range of inertial sensor systems with minimal effort besides simply correctly formatting the inertial data from the sensor. This report also shows that the quality and performance of the inertial sensor does impact how well the DIRT-I system is able to improve the performance of the system’s position solution. Finally, this report illustrates that the RL system is being trained as expected, but the overall performance is not great - especially when more training data is available.

This may indicate that the RL algorithm being used was not dynamic enough for this task, or perhaps the observation space was too narrow. There is potential to use the DIRT-I system to improve the positional error of inertial sensors without access to corrections from external sensors such as GNSS. However, several changes to the system and much more research into the effectiveness of the RL algorithms used, would be required.

This work was performed by Eric Bozeman, Minhdao Nguyen, Jeffrey Onners, and Mohammad Alam for the Naval Information Warfare Center. For more information, download the Technical Support Package (free white paper) at mobilityengineeringtech.com/tsp under the Sensors category. NIWC-0001

This Brief includes a Technical Support Package (TSP).
Document cover
Drift Improvement with Reinforcement Training of Inertial Sensors

(reference NIWC-0001) is currently available for download from the TSP library.

Don't have an account? Sign up here.