SpeechMark Newsletter February 2021

SpeechMark Newsletter

Announcing SpeechMark MATLAB Toolbox Version 1.3.
This updates certain SpeechMark functionality that is only available through the toolbox. It does not affect other SpeechMark products.

New and Enhanced Toolbox Functionality
Strings come to SpeechMark at last! We have updated our entire library to allow you to specify strings as input arguments, such as age/gender or filename. This especially affects the high-level functions such as landmarks, vowel_segs_std, and lm_vowelarea, which permit filenames (and arrays of filenames) as input arguments. However, SpeechMark functions do not produce strings in their returned results, and character arrays are still supported, so any existing code will continue to work as-is.

Bug Fixes
We have also made updates and fixes to some of our internal functions, especially suppressing inappropriate warnings when computing F0 and related voicing measures.

SpeechMark uses fuzzy logic in much of its processing. In the coding world, logical normally means exactly true or false, yes or no, 1 or 0. Fuzzy logic, however, is imprecise and subjective, expressing the “degree of membership” in a set or the degree to which a condition is true. This is always a value between 0 and 1, with 0 indicating no membership at all and 1 indicating full membership.

SpeechMark uses what’s called first-order fuzzy logic. For example, the “strength” component of a landmark is the (fuzzy) degree to which a landmark is present. (Well, almost: SpeechMark often multiplies the strength by -1, so be sure to use the absolute value if you want to work with the fuzzy degree.) In the MATLAB toolbox, functions that return fuzzy values are named deg_*, such as deg_sigsgram, deg_voicedhnr, and even deg_speech.

SpeechMark focuses on the physics and neurophysiology of speech production, which is an objective domain. One could, for example, visually determine whether the vocal folds are moving to produce glottal pulses.

However, SpeechMark analyzes acoustic signals. This is sometimes subjective, because it may be a matter of judgement whether, say, vocal-fold motion can be inferred from an acoustic signal, and how firmly. The degree of voicing produced by several of our deg_voiced* functions captures various parts of this imprecision: specifically local evidence from periodic motion, or specifically including various acoustic measures such as nearby landmarks or the local harmonics/ noise ratio HNR.

Using fuzzy logic often entails combining fuzzy measures (of membership in some set of interest) with the fuzzy analogs of the usual logical operators: And, Or, Not, and others. The first two have reassuringly simple rules: From two fuzzy variables A and B whose degrees are a and b, the degree to which both A and B are in the set (i.e., A And B) is just the smaller of a and b, Min(a,b). Likewise, the degree to which at least one is in the set (A Or B) is the larger of the two, Max(a,b).

Where does this appear in SpeechMark? An example is landmark strength. Landmarks are defined by big-enough peaks of energy change that occur simultaneously in at least three spectral bands. Each peak is given a degree to which it is “big enough”, and the landmark’s strength is the third-highest of these. Why the third highest? Because the landmark is defined by the presence of a peak in the strongest band and a peak in the second-strongest and a peak in a third-strongest. (“A chain is only a strong as its weakest link.”)

SpeechMark is cautious using Not. You might be tempted to think that if False corresponds to 0 and True to 1, then the degree of Not A must be 1-(degree of A). But 1-x isn’t the only function that converts 1 to 0 and vice versa. Different versions of fuzzy logic adopt different definitions of Not. (If you work in the Math Department, think of t-norms or negators. The rest of us will be thinking of speech production.) In speech production, the degree to which the acoustic evidence for, say, voicing is strong might not be the degree to which that same evidence for Not-voicing (non-voicing) is weak. The evidence may simply be incomplete (voicing Or Non-voicing < 1) or ambiguous (voicing Or Non-voicing > 1).

So much for today’s math lesson! There may be a quiz in the morning.