Context-Aware Visual Search Using a Pan-Tilt-Zoom Camera

While in some scenarios an Internet-of-Things (IoT) approach allows multiple distributed cameras to cooperatively survey a large region of interest (ROI), other scenarios, such as surveillance by a mobile robot, require the ROI to be surveyed by a small number of collocated cameras.

Figure 4 from the technical report shows a simulated panoramic image of an urban environment. (Image: Army Research Laboratory)

Surveillance cameras are becoming more commonplace in public environments, as well as finding use in private security and military operations. We are particularly interested in scenarios where a single pan-tilt-zoom (PTZ) camera is used to perform surveillance in large outdoor environments, which may include 360-degree horizontal coverage and depths out to 1 km or more. These scenarios exist in many environments such as security for building exteriors, airports, highways, parking lots, and property perimeters; anomaly detection in dense urban environments; and surveillance in military overwatch missions. In environments with many vertical obscurations (e.g., trees and buildings), ground-based cameras will need to be carefully located to provide long-range views. As the elevation of the camera is increased above the ground level, by placement on tall poles or building rooftops, for example, obtaining views of distant regions becomes easier.

Imagery from these cameras may be used to detect objects and events of interest, either via human operators or automated processing. Many of these cameras have the capability for an operator to pan, tilt, and zoom the camera optics to view focused regions of an environment in high detail, allowing both wide-area and long-range surveillance. With the proliferation of surveillance cameras, however, it is becoming impracticable for human operators to monitor and control all installed cameras. Intelligent, automated algorithms are needed to replace the human operators. It is not sufficient to simply repeatedly scan an environment with a PTZ camera set to a high zoom, as this is unnecessary for parts of the environment in the near field as well as in areas that cannot possibly contain objects and events of interest. Furthermore, the time to perform a high-zoom scan is excessive and may cause long delays in revisit times, increasing the chances of missing important events.

We introduce here the formal PTZ Search Problem (PTZSP). In the PTZSP, a fixed PTZ camera starts at a given PTZ position and must move through the continuous 3D space of PTZ coordinates to detect as many objects of interest as possible as quickly as possible. Reward is given for true positives, deducted for false positives and false negatives, and deducted for the time spent performing the search. When the PTZ space is discretized, the PTZSP is closely related to the Orienteering Problem with Functional Profits (OPFP). However, the huge size of the discretized search space makes the problem intractable for all existing OPFP algorithms.

There are no existing benchmarks on which to develop, evaluate, and compare algorithms for the PTZSP. Creating such benchmarks is difficult because each run of an algorithm may acquire a different set of images of the environment, and one cannot expect to record in advance all possible views of a real environment. To remedy this problem, we have created a high-fidelity simulation of a state-of-the-art object detector as it would perform on real color camera imagery, and have integrated this with a small and accessible simulation of PTZ cameras operating in large outdoor environments. The combination enables researchers an easy means to develop and compare algorithms for the PTZSP. We present details of the simulations, describe three algorithms for solving the discretized PTZSP, and then analyze the performance of each algorithm on a set of 100 reproducible test cases.

This work was performed by Philip David, André Harrison, Ramavarapu Sreenivas, and Joshua Whitman for the Army Research Laboratory. For more information, download the Technical Support Package (free white paper) under the Communications category. ARL- 9657.

This Brief includes a Technical Support Package (TSP).
Document cover
Context-Aware Visual Search Using a Pan-Tilt-Zoom Camera

(reference ARL-9657) is currently available for download from the TSP library.

Don't have an account? Sign up here.