Compare MFCC with PLP.
Answers
Introduction
In traditional phonology, the place and the manner of the articulation in the vocal tract are applied to
classify Chinese initials. And the statistical and psychological methods are used to explore the
perceptual characteristics. The characteristics of phonation and articulation such as voiced or
voiceless, aspirated or unaspirated, and fricative or frictionless, are the most important factors that
influences the perception of initials [1, 2]. A perceptual measurement based on LPC among Chinese
finals has been proposed in [3], which makes it easier to evaluate the equivalence of different
audiometric word lists. The acoustic features most commonly used are Mel Frequency Cepstral
Coefficients (MFCC) and Perceptual Linear Prediction (PLP) features [4]. Both MFCC and PLP are
tested with and without ‘pitch’ information using the same back-end on an English consonants corpus
and the results are compared with human listener results at the level of articulatory feature
classification, which shows that no representation reaches the levels of human performance but PLP
has higher accuracies for most manner values on English consonants than MFCC [5]. However, the
perception of Chinese initials, which are not exactly the same as English consonants, is more difficult
for humans, especially for patients, than that of Chinese finals. Hence, it is very important to do the
research on the perceptual characteristics of Chinese initials.
In this paper, we discuss which acoustic features or their combinations are the most consistent with
the perception of Chinese initials. We systematically test PLP and MFCC representations of Chinese
initials by carrying out two experiments with respect to acoustic space and perceptual space,
respectively. We then combine the results of the two experiments by using a statistical method, called
Spearman's rank correlation coefficient, to assess how well the relationship between two types ofdistance can be described using a monotonic function. We also single out a proper acoustic feature
representations for Chinese initials and distance metrics between different categories of initials to
measure the acoustic distance which is monotonically related to the perceptual distance.