Adding Context to Full-Motion Video for Improved Surveillance and Situational Awareness

Today’s intelligence, surveillance, and reconnaissance (ISR) platforms integrate state-of-the-art technologies and subsystems into airborne platforms intended to observe, detect, identify, and neutralize threats. Full motion video (FMV) sensors are essential assets and are routinely used on fixed-wing, rotary-wing, and manned and unmanned platforms to provide real-time situational awareness with minimal latency.

Ultimately, FMV is invaluable for helping operators answer the what, where, when, and why of a scenario. These sensors, which are easy and relatively inexpensive to integrate and deploy, support mission-critical tasks such as target detection and analysis. This may include tracking and pattern recognition—for example, FMV is particularly well suited for long stand-off ranges such that a target is not aware that they are being observed.

Limitations of FMV

While FMV offers many advantages, there is a tradeoff between resolution, field-of-view, and frame rate that has historically limited FMV’s usefulness for some tasks. With a wide field-of-view, there is limited resolution. With a narrow field-of-view, the resolution is much higher but only a very limited “soda straw” view of the world is imaged.

Example of a Moving Target Indicator
FMV with Contextual ISR Data on Road
Annotations on FMV

Another option is wide-area motion imagery (WAMI) sensors, also known as wide-area persistent surveillance (WAPS), which provide high resolution and a wide field-of-view but with a low frame rate (2-3 frames per second), which limits their ability to observe fast motion and human behaviors.

Even in instances where the resolution is high, the contextual information of FMV feeds is limited. Telemetry overlays assist by adding symbology on top of the video feed, such as aircraft location and projections of the target’s location, but it is still difficult to correlate with other data and intelligence sources.

Common FMV Enhancements

Manufacturers are incorporating enhancing techniques such as color balancing, histogram equalization, haze reduction, and stabilization, to name a few, which maximize the advantages of FMV. Some software is now available to mosaic (or stitch) consecutive video frames into a larger frame using direct or feature-based frame-to-frame alignment. This enables the construction of a wide field-of-view image from a narrow field-of-view video.

Mosaicking can be done in two ways. Fast mosaicking runs in real time and supports EO/IR and a variety of imaging conditions. This approach is comparable to the panorama function on most smart phone cameras and works well if the terrain is relatively flat. Alternatively, ortho-mosaicking is a distinct operation that attempts to correct the perspective distortions caused by the underlying terrain. Ortho-mosaicking preserves distances and allows for measurements to be taken from the reconstructed frame. This technique is robust to significant elevation changes because, unlike with fast mosaicking, it corrects distortion caused by terrain throughout the image to provide an accurate mosaic. However, this approach is computationally expensive and requires much more processing time in comparison to fast mosaicking.

Another area where manufacturers are enhancing FMV is with moving target indicators. Many sensor systems now include this capability internally and enable operators to detect, highlight and track moving objects in the raw video feed. This is beneficial as it enables the operator to lock on to a moving target and automatically track it with the sensor.

Overlaying video onto a map (i.e. video-on-terrain) can also enhance FMV. By taking video and metadata (e,g. the sensor’s geographic location) from the sensor and projecting it onto a map, the video can be positioned and displayed in its specific location on the terrain. The efficacy of this approach relies on the accuracy of the metadata, which varies from system to system. For example, accurate projection of the video requires knowledge of the distance to the terrain. Sensor manufacturers solve this problem in a variety of ways, such as from least to most accurate:

  • Assuming a constant ground elevation throughout the frame;

  • Using a low-resolution digital terrain model with wide elevation postings;

  • Automatically firing a laser to frequently update the range to target.

Alternatively, the video projection can be computed using software that takes metadata from the sensor and uses an accurate terrain model to project video onto the map. At minimum, this requires the sensor’s location, orientation, and fields-of-view and an accurate terrain model.

A significant limitation of video-on-terrain is that the fidelity of the projected image is dependent on the viewpoint, and the terrain can create large amounts of distortion resulting in loss of critical visual information. For example, at shallow slant angles, the projected pixels become very large, which can make it difficult to visually detect and track moving people and vehicles. Picture-in-Picture (PiP), or multiple monitor user interfaces, where one screen shows the raw frame and the other shows the map projection, are often required to address the distortion and enable better situational awareness. However, this requires more screen space and tight synchronization of the views.

Furthermore, the operator must now visually and manually correlate objects in two views instead of one. For example, determining the name of a building in the video view requires a corresponding map view with parcel data. The operator must then re-orient the map view and estimate which building is overlapped by the video frame. Some sensor operators also find the map view less intuitive and have difficulty controlling the sensor. This leads to operator fatigue and therefore these sensor operators tend to prefer the full video view.

Adding Geospatial Context to FMV

By using a geospatial information system (GIS), versus projecting video onto a map, the technique can be flipped and instead of projecting the video into the map, the map can be projected onto the video. Feature data such as road networks, 3D models, known checkpoint locations, significant activity reports, and even building names can be shown directly on top of the video feed in real time without distortion. In this way, an operator can immediately determine what is being imaged.

GIS provides greater situational context, like video-on-terrain, but without the terrain distortion. For the best user experience, accurate metadata with minimal jitter (random perturbations of the measurements) is required. Advanced algorithms, like geo-registration, where video frames are matched and aligned with known satellite imagery, may be required to correct inaccuracies in the metadata. In addition, Extended Kalman or particle filtering can be employed to suppress jitter in the inertial navigation system (INS) measurements.

A significant differentiator of this approach of projecting the map into the video is that any geospatial information can be displayed on to the video, including:

  • Static raster data;

  • Other aerial surveillance videos;

  • Ground moving target indicators;

  • Ship transponders;

  • LiDAR, 3D models, etc.

Previous sensor detections and intelligence data can also be applied to aid in quickly analyzing the video feed. For example, a wide field-of-view can be used to visualize attacks that occurred over a 30-day period on a specific area of road in the vicinity of the current video view.

Another advantage is that annotations, such as drawings, measurements, and symbols, can be performed directly on the video. Analysts can essentially create products using live FMV and disseminate the file to users on the ground in real time. In one real-world scenario, by using this technique for monitoring wild fires, operators are able to annotate the boundary of the fire, measure its size, determine its growth pattern and calculate the threat to homes and critical infrastructure in real-time. Furthermore, by attaching transponders to the engines and dozers, they can determine if the fire has crossed fire breaks or if containment efforts are being successful.


FMV is an integral part of providing real-time situational awareness and collecting mission-critical intelligence. Ultimately, the goal of all datasets received is to analyze, create reports, and send accurate data to decision makers. The limitations and tradeoff between FMV’s resolution, field-of-view, and frame rate are being overcome with newer approaches that support added context for improved surveillance and situational awareness. Various techniques and feature enhancements allows analysts to be able to visualize diverse datasets by projecting the map into the video. By utilizing this technique, users can easily create products and generate and distribute reports in near real time and post mission. These reports can include customizable context (video frames, maps, annotations) so they can be appropriately adapted to mission objectives.

This article was written by Manan Patel, Chief Technology Officer, and Darren Butler, Chief Scientist and Director of Software Engineering, AEVEX Aerospace (Solana Beach, CA). For more information, visit here .