Variable-Data-Rate Speech Encoder

This encoder could supplant older encoders that operate at diverse fixed rates.

Avariable-data-rate (VDR) speech encoder has been designed to be interoperable with, and eventually to supplant, the many different voice encoders now used in military communication systems. Because these older systems were designed to utilize specific radio links with fixed and limited channel capacities, these systems utilize many different voice compression algorithms operating at various fixed rates. The incompatibility of these systems is an obstacle to interoperability. Emerging net-centric communication systems promise to provide connectivity to all military users, but compatible encoding will be necessary for interoperability, and encryption will be necessary for secure communications.

The Seven Operating Modes of the VDR voice encoder are characterized by different average data rates. Mode 1, characterized by a fixed rate of 2.4 kb/s, is the same mode as that of the Federal standard MELP encoder for narrow-band speech.
The VDR voice encoder is designed to provide both interoperability and security in net-centric voice communications. The VDR speech encoder can operate at any or all of the various data rates of older military speech encoders. Notably, it can operate over a range of data rates up to 26 kb/s and is backward-compatible with the Multiple Excitation Linear Predictive (MELP) voice encoder, which is a Federal-standard encoder that operates at a data rate of 2.4 kb/s. The VDR speech encoder is interoperable at any and all rates simultaneously. The rate setting can be changed dynamically (that is, during operation) without disrupting operation, even when used with encryption: Hence, without compromising security, the VDR speech encoder can be dynamically adjusted to make efficient use of network bandwidth under changing network traffic conditions.

The heart of the VDR voice encoder is a multirate voice processor in which a single voice algorithm generates multiple data streams at rates from 2.4 kb/s to an average rate of about 23 kb/s for input speech at frequencies from 0 to 4 kHz. The algorithm provides for seven different operating modes (see table). Inclusion of a few more kb/s of data from the 4-to-8-kHz audio frequency band makes it possible to encode wide-band speech comparable in quality to that of standard frequency-modulation (FM) broadcasting.

The VDR bit stream has an embedded structure in which higher-rate voice data frames contain successively lower-rate voice data frames as subsets. Deletion of a certain portion of the superset (higherrate frames typically representing higher audio frequencies) makes it possible to reduce the data rate, even in the presence of encryption. Because of this embedded data structure, any of the VDR data rates are interoperable and can be switched, as often as 44 times per second, even when speech is present. Because the speech waveforms of all the VDR rates are synchronous, switching of data rates does not introduce such undesirable sounds such as clicks or warbles.

It must be emphasized that the multirate voice processor in the VDR voice encoder is a single processor running a single algorithm, in contradistinction to both (1) a collection of separate processors operating at different rates and (2) a processor running a multitude of speechcompression algorithms. Prior voice encoders that use multiple compression algorithms do not perform well when algorithms are switched while speech is present. Speech waveforms sometimes become cropped upon switching because different voice algorithms can have different internal delays. Such cropping degrades speech quality and is annoying to listeners.

The VDR speech encoder exploits the variable nature of the speech waveform, utilizing higher or lower data rates as needed (e.g., higher rates for vowels, lower rates for consonants). Unlike some prior speech processors, the speech processor in the VDR speech encoder processor does not eliminate gaps in speech for the sake of efficiency. Elimination of speech gaps that contain ambient sounds could be harmful in military communications because speech gaps often contain sounds that help listeners gauge battlefield conditions at transmitter sites. In the VDR speech encoder, speech gaps are encoded at appropriately low data rates that still provide audible information.

This work was done by Thomas M. Moran, David A. Heide, Yvette and T. Lee of the Naval Research Laboratory and George S. Kang of ITT Industries.



This Brief includes a Technical Support Package (TSP).
Document cover
Variable-Data-Rate Speech Encoder

(reference NRL-0019) is currently available for download from the TSP library.

Don't have an account?



Magazine cover
Defense Tech Briefs Magazine

This article first appeared in the October, 2007 issue of Defense Tech Briefs Magazine (Vol. 1 No. 5).

Read more articles from the archives here.


Overview

The document titled "Variable Data Rate Voice Encoder for Narrowband and Wideband Speech" discusses the development and significance of a Variable Data Rate (VDR) voice encoder designed for secure voice communication within the Department of Defense (DoD). The primary objective of the VDR encoder is to enhance interoperability among various secure voice terminals used by the DoD, allowing for efficient communication across different platforms and environments.

The introduction highlights the necessity of a VDR voice processor, emphasizing that it can replace multiple incompatible voice encoders currently in use, thereby streamlining communication for military personnel. The report outlines the challenges faced in tactical communication environments, which often require varying data rates ranging from as low as 2.4 kbps to as high as 64 kbps, depending on the operational context. For instance, noisy environments, such as those encountered in aircraft or naval vessels, necessitate robust encoding methods to ensure clarity and reliability in voice transmission.

The document details the architecture of the VDR encoder, which integrates different encoding techniques to optimize voice quality and data rate based on real-time network conditions. A notable feature is the use of a superposition method in difficult modes, where low-frequency audio is encoded separately from higher frequencies, improving noise tolerance and overall sound quality.

Additionally, the report discusses the future vision for the VDR technology, including the development of the Universal Voice Terminal (UVT) and the Personal Secure Terminal (PST). The UVT aims to unify various secure voice terminals into a single interoperable system, while the PST is envisioned as a compact device for individual soldiers, enhancing their communication capabilities on the battlefield.

The document concludes by underscoring the importance of secure and efficient voice communication for military operations, particularly in critical situations where timely contact with command centers is essential. The VDR encoder represents a significant advancement in achieving these communication goals, ensuring that military personnel can maintain connectivity and operational effectiveness in diverse and challenging environments.

Overall, the report provides a comprehensive overview of the VDR voice encoder's design, functionality, and potential impact on military communication systems, highlighting its role in enhancing the safety and effectiveness of DoD operations.