In a significant body of work spanning several decades, Stevens and colleagues suggested that the speech signal can be usefully analyzed in terms of landmarks—that is, acoustic events that correlate with changes in speech articulation [1]. Most research using the landmark approach has focused on the lexical content of speech [2][3]. In our work [4][5] , we have found that tools based on landmarks can be useful for investigating non-lexical attributes of speech, such as syllabic complexity or vowel space area over time. In particular, we have found that landmark-based software tools are well suited for analysis of subtle differences in production of the same speech material by the same speaker.
How are Acoustic Landmarks Detected?
The landmark detection process begins by analyzing the signal in several broad frequency bands. Because of the different vocal-tract dimensions, the appropriate frequencies for the bands are different for adults and infants; however, the procedure itself does not vary. First, an energy waveform is constructed in each of the bands. Then the rate of rise (or fall) of the energy is computed, and peaks in the rate are detected. These peaks therefore represent times of abrupt spectral change in the bands. Simultaneous peaks in several bands identify consonantal landmarks.
[1] Stevens, K.N., et al. “Implementation of a Model for Lexical Access based on Features”, in International Conference on Spoken Language Processing (ICSLP) Proc., 1992.
[2] Juneja, A. and C.Y. Espy-Wilson. “Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines”, in International Joint Conference on Neural Networks Proc., 2003.
[3] Slifka, J.S., et al. “A Landmark-Based Model of Speech Perception: History and Recent Developments”, in From Sound to Sense: Fifty Years of Speech Research, 2004.