TSP Lab - Audio Demonstration

Telecommunications & Signal Processing Laboratory

Audio Demonstration

J. H. Y. Loo, W.-Y. Chan, and P. Kabal
"Classified nonlinear predictive vector quantization of speech spectral parameters", Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing (Atlanta, GA), pp. 761-764, May 1996.

Nonlinear predictive split vector quantization (NPSVQ) and classified NPSVQ (CNPSVQ) are introduced to exploit the correlation among the speech spectral parameters from two adjacent analysis frames. By interleaving intraframe SVQ with forward predictive SVQ, error propagation is limited to at most one adjacent frame. At an overall bit rate of about 21 bits/frame, NPSVQ can provide similar coding quality as intraframe SVQ at 24 bits/frame. Voicing classification is used in CNPSVQ to obtain an additional average gain of 1 bit/frame for unvoiced frames. Therefore, an overall bit rate of 20 bits/frame is obtained for unvoiced frames. The particular form of nonlinear prediction we use incurs virtually no additional encoding computational complexity. We have verified our comparative performance results using subjective listening tests.

Demonstration sound files:

Uncoded.au [35 kB]: Unencoded male speaker test sentence, "They sat in the cool park."
C24.au [35 kB]: Line spectral frequency (LSF) vectors encoded with 3-way split VQ (3-SVQ) at 24 bits/frame.
C21.au [35 kB]: LSF vectors alternately encoded with 3-SVQ, at 24 bits/frame, and nonlinear predictive 3-SVQ (3-NPSVQ), at 18 bits/frame.
C22-24.au [35 kB]: LSF vectors are encoded with classified 3-SVQ (3-CSVQ) at 24 bits per voiced (V) frame and 22 bits per unvoiced (UV) frame.
C20-21.au [35 kB]: LSF vectors are alternately encoded with 3-CSVQ at 24 bits per V frame and 22 bits per UV frame, and classified 3-NPSVQ (3-CNPSVQ), at 18 bits/frame.

Paper titles.