Telecommunications & Signal Processing Laboratory
Audio Demonstration
J. H. Y. Loo, W.-Y. Chan, and P. Kabal
"Classified nonlinear predictive vector quantization of speech
spectral parameters", Proc. IEEE Int. Conf. on Acoustics,
Speech, Signal Processing (Atlanta, GA), pp. 761-764, May
1996.
Nonlinear predictive split vector quantization (NPSVQ) and
classified NPSVQ (CNPSVQ) are introduced to exploit the correlation
among the speech spectral parameters from two adjacent analysis
frames. By interleaving intraframe SVQ with forward predictive
SVQ, error propagation is limited to at most one adjacent frame.
At an overall bit rate of about 21 bits/frame, NPSVQ can provide
similar coding quality as intraframe SVQ at 24 bits/frame. Voicing
classification is used in CNPSVQ to obtain an additional average
gain of 1 bit/frame for unvoiced frames. Therefore, an overall
bit rate of 20 bits/frame is obtained for unvoiced frames. The
particular form of nonlinear prediction we use incurs virtually
no additional encoding computational complexity. We have verified
our comparative performance results using subjective listening
tests.
Demonstration sound files:
- Uncoded.au [35 kB]:
Unencoded male speaker test sentence, "They sat in the cool
park."
- C24.au [35 kB]: Line spectral
frequency (LSF) vectors encoded with 3-way split VQ (3-SVQ) at
24 bits/frame.
- C21.au [35 kB]: LSF vectors
alternately encoded with 3-SVQ, at 24 bits/frame, and nonlinear
predictive 3-SVQ (3-NPSVQ), at 18 bits/frame.
- C22-24.au [35 kB]: LSF
vectors are encoded with classified 3-SVQ (3-CSVQ) at 24 bits
per voiced (V) frame and 22 bits per unvoiced (UV) frame.
- C20-21.au [35 kB]: LSF
vectors are alternately encoded with 3-CSVQ at 24 bits per V
frame and 22 bits per UV frame, and classified 3-NPSVQ (3-CNPSVQ),
at 18 bits/frame.
Paper titles.