In speech acoustics, landmarks are patterns that mark certain speech-production events. Speechacoustic
landmarks come in two classes: peak and abrupt.
Peak: At present, the peak landmarks detected in SpeechMark® are vowel landmarks (VLMs) and
frication landmarks. These are identified as instants in an utterance at which a maximum (or peak) of
harmonic power or of fractal dimension occurs, respectively, and may be considered the centers of
the vowels or fricated intervals (resp.).
Tutorials and User Guides
What Are Acoustic Landmarks, and What Do They Describe?
Peak Landmarks in SpeechMark
Landmarks (LMs) are acoustically identifiable points in an utterance. They come in the form of abrupt
transitions (abrupt LMs) and peaks (peak LMs) of some contour or contours. Here we describe the peak
set of landmarks used in SpeechMark®.
Frication Peak Landmarks
Landmarks (LMs) are acoustically identifiable points in an utterance. They come in the form of abrupt transitions (abrupt LMs) and peaks (peak LMs) of some contour or contours.
Until now the only peak type has been Vowel, computed by vowel_lms.
For vowels the peak is that of maximum harmonic power and often corresponds to the maximum opening of the mouth.
Frication-type peak landmarks are computed using…
Using the SpeechMark MATLAB Toolbox for Syllabic Cluster Analysis
The SpeechMark MATLAB Toolbox is a platform-independent add-in to the MATLAB language and computation environment, developed by MathWorks. This Toolbox adds acoustic landmark detection and visualization tools, methods, and scripts to MATLAB.
Downloading
This product is a standard MATLAB toolbox. To use it, a valid instance of MATLAB (version R2010b or newer) must be installed, as well as a valid version of the MATLAB Signal Processing Toolbox.
WaveSurfer and SpeechMark Configuration
WaveSurfer makes use of text configuration files to allow the user to specify and automatically set up and reuse specific configurations of panes for purposes of speech analysis, recording, labeling, and so forth. WaveSurfer comes with a handful of predefined configurations, and users can easily define and use their own custom configurations.
SpeechMark family of products
The SpeechMark family of products is designed to detect acoustic landmarks in speech recordings. Landmarks are acoustic events that correlate with changes in speech articulation. The SpeechMark family comprises plug-ins that augment the capabilities of existing third-party software, as well as stand-alone libraries and command line utilities.
How are Acoustic Landmarks Detected?
The landmark detection process begins by analyzing the signal in several broad frequency bands. Because of the different vocal-tract dimensions, the appropriate frequencies for the bands are different for adults and infants; however, the procedure itself does not vary. First, an energy waveform is constructed in each of the bands. Then the rate of rise (or fall) of the energy is computed, and peaks in the rate are detected. These peaks therefore represent times of abrupt spectral change in the bands. simultaneous peaks in several bands identify consonantal landmarks.
What are Acoustic Landmarks?
In a significant body of work spanning several decades, Stevens and colleagues suggested that the speech signal can be usefully analyzed in terms of landmarks—that is, acoustic events that correlate with changes in speech articulation [1]. Most research using the landmark approach has focused on the lexical content of speech [2][3]. In our work [4][5] , we have found that tools based on landmarks can be useful for investigating non-lexical attributes of speech, such as syllabic complexity or vowel space area over time. In particular, we have found that landmark-based software tools are well suited for analysis of subtle differences in production of the same speech material by the same speaker.
How are Acoustic Landmarks Detected?
The landmark detection process begins by analyzing the signal in several broad frequency bands. Because of the different vocal-tract dimensions, the appropriate frequencies for the bands are different for adults and infants; however, the procedure itself does not vary. First, an energy waveform is constructed in each of the bands. Then the rate of rise (or fall) of the energy is computed, and peaks in the rate are detected. These peaks therefore represent times of abrupt spectral change in the bands. Simultaneous peaks in several bands identify consonantal landmarks.
[1] Stevens, K.N., et al. “Implementation of a Model for Lexical Access based on Features”, in International Conference on Spoken Language Processing (ICSLP) Proc., 1992.
[2] Juneja, A. and C.Y. Espy-Wilson. “Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines”, in International Joint Conference on Neural Networks Proc., 2003.
[3] Slifka, J.S., et al. “A Landmark-Based Model of Speech Perception: History and Recent Developments”, in From Sound to Sense: Fifty Years of Speech Research, 2004.
Marking Adult Vowel-Space Formant Boundaries
The usual SpeechMark® vowel-space plot for adults includes a polygon that marks the boundaries of typical formant-frequency (F1, F2) pairs for normal adult speakers. The boundary drawn depends on whether the sex of the actual speaker has been specified as male, female, or unknown. The polygon is intended solely as a “fiducial” reference (an aid to the eye) much like grid lines. Like grid lines, it does not depend on the plotted data: part of its value is that it remains constant across all plots for adults of a given sex.
Syllabic Clusters
This document describes the process by which the SpeechMark syllabic cluster analysis operates to group previously computed landmarks. The grouping algorithms were developed to deal with English-focused infant speech including babble—that is, speech whose intended lexical content is unknown (if it exists).
Sequences that would be transcribed as an infant attempt at a speech syllabic cluster were identified, and empirical rules for separating these from the speech stream and from each other were developed based on landmark sequences and timing.
It is important to remember that the syllabic cluster rules so developed are sensitive only to the speech AS UTTERED. They may or may not match syllabic clusters of speech as analyzed by transcription.