Solution to Last Month's Mystery Spectrogram

Solution for April 2006

labeled spectrogram
"Mice in cartoons eat cheese."

Lower-Case M
[m], IPA 114
Starting at 75 msec and goign on until about 150 msec, we've got a nice little sonorant happening. It's got a nice, clear voicing bar at the bottom, and resonances at the higher frequencies. The sharpness of the edge (of the following vowel), the overall lowered energy (relative to the following vowel), the presence of a nice clear zero (around 750 Hz) and mostly flat (unchanging) resonating structures, are all good pointers to a nasal stop. The pole around 1000 Hz is usually a pretty good clue (in my voice) that it's bilabial. The F2 transition in the following vowel is consistent with that--that F2 onset frequency is too low to be alveolar, and the distance between F2 and F3 is atypical of velars. But it's that F3 transition that bothers me. The F3 seems to fall into the following vowel, which is consistent really only with alveolars. So we've got conflicting cues. Which are we going to believe? Well, we're going to wait for a deciding vote. Once we have a clearer idea of what the first few syllables of this utterance are, knowing it's English and a declarative sentence, we'll use lexical access to decide whether we're looking at an [m] or an [n]. Or something else...

Lower-Case A + Small Capital I
[aɪ], IPA 304 + 319
So the F1 onset frequency in the first full pulse is just below the 750 Hz zero in the nasal, but it rises very quickly and reaches a peak well before 200 msec. So ignoring the first few pulses as transition, we've got something that starts fairly low in the vowel space. The F2 at that moment is still fairly low as well, but some of that might be transitional. So we've got something that starts lowish and sort of backish (or roundish?), but the F1 lowers over almost 100 msec toward the following consonant, indicating a slow rising of the vowe, and the F2 never stops moving up (forward in the vowel space). So what we have here is a diphthong starting lowish and backish and moving up and forward. Again, there may be two choices, but one is probably better than the other. (Quick, what's the other choice, and how would you expect it to look, assuming this isn't it?)

Always be an active learner.

Lower-Case S
[s], IPA 132
Well, this is interesting. From 300 msec (a little earlier in the higher frequencies) to almost 400 msec, there's a nice voiceless fricative. There's no hint of voicing or anything at the low end. There's some noise into the very low frequencies, and for some reason the amplitude hikes up a bit at about 1500 Hz. Then it stays pretty much flat (i.e. at the same amplitued) all the way up. So this is fairly strong and broad band, typical of sibilants. And the sudden drop off below 1500 Hz is usually a clue that it's post-alveolar. But I'm going to suggest it's not. Partly, it's because I know what it's supposed to be, and I'm floundering for reasons to be right. Okay, usually a post-alveolar (rather than alveolar) sibilant has that strongest energy in the F2-F4 range, and I think that low energy 'border' isn't quite continuous with the F2 band in the following vowel, such as it is. So I don't know. This is supposed to be an [s]. And I think if we followed it up to the 6-12 kHz range, we'd see it really gets really, really loud up there. So this is an alveolar. Accept it. Move on.

Barred I
[ɨ], IPA 317
Well, for a scant 25 msec or so, there's a vowel. There is. Look at it. But it's so short, it's hardly worth spending any time worrying about. So I won't.

Quick, why isn't it worth spending any time worrying about?

Lower-Case N
[n], IPA 116
Another nasal. Now look at this one carefully. There's a nice strong voicing bar, and there's a band of weaker energy just above that. Now compared with the initial one, this one is a little higher in frequency or broader in band. So they're not quite the same. There's a zero. It's narrow, but it's a little higher in frequency than the zero in the previous one. There's a little energy at 1000 Hz, but it's weak w/r/t the previous one. And there's that blip, or whatever youw ant to call it, several pulses of resonance, or something, up just below 1500 Hz. I point that out because it turns out that it's important. I think that's the real pole. But I could be wrong. But what are the odds.

Lower-Case K + Superscript Lower-case H
[kʰ], IPA 109 + 404
There's what appears to be a closure transient, or maybe it's just a clunk, where I've marked the boundary. There's some perseverative voicing, I guess, but look at that aspiration. Even excluding the material before the second burst, that's at least 75 msec of aspiration. Which is quite a lot for me. So this has to be aspirated, and therefore voiceless. Now look at that double burst. Double bursts like that, especially centered in F2/F3 like that, are typical of velar releases. So there you go. There's not a lot of unambiguous transition information but the long VOT and the double burst and pretty good cues.

Script A + Rhoticity Sign
[ɑ˞], IPA 305 + 419
Well, the F3 is low, so you might be tempted to call this a syllabic /r/. But that wouldn't explain the F2 movement. Or for that matter, the F1 movement. Which together look like diphthongal movement, which I suppose is what this sequence is.

Lower-Case T + Superscript H
[tʰ], IPA 103 + 404
From 775 msec to about 850 msec, there's serious gap. The few periods of voicing leadin gup to 800 msec I'd say are just perseverative. Since the release at 850 is followed by going on to 75 msec of aspiration (voicelessness, VOT), there's little doubt that this plosive is aspirated. The transitions into it are decidedly alveolar looking, in the sense that both F2 and F3 are pointed up, but given their frequency in the preceding segment, they have precious little choice. The aspiration noise is the big clue. It all respects (except for the formant shapping in F2 and F3, this looks like a sibilant, particularly [s]. (I suppose you might say it looks like an [ʃ], but really it doesn't. There's not enough energy in the F2/F3 pole relative to the higher ones.) Ennyhoo, it's not an [s], it's just really heavy aspiration following an alveolar release. So it's not 'grooved' like an [s], but the airflow is basically high pressure being directed at the incisors, just like [s]. SO this has to be alveolar. The transitions out look vaguely velar-pinch-y, but since there's no way a velar would have aspiration that looks like this, we can rule that out.

Turned M
[ɯ], IPA 316
Well, this is not good. The highest-pitched voice in the whole spectrogram. Which probably makes this syllable the nuclear accent, or at least the focus accent of the utterance. But in practical terms it means a) the striations are so close together you can't tell one pulse from the next, and b) the harmonics are widely separated (Quick--why?) and so bandwidths just increase. sSo it's hard to tell exactly where F1 is. It could be that band around 500 Hz (or just below, but above the very strong voicign bar), or it could be that band up around 800 Hz. Which makes this either a relative mid to higher-mid kind of vowel or a very, very low one. The F2 is a little easier. Before it fuzzes out, you can see the F2 transition in the aspiration noise, so you know where it's headed at least. So the F2 has to be around 1200 Hz or so, depending on exactly where you measure. So knowing the answer, I might suppose that the strength of the 'voicing bar' was actually a very low first formant, and the two things I'd considered before are just strong harmonics. But I don't know. It probably ain't the increibly low vowel that it would be. SO figure not high and realtively back, but not outrageously round (or very round but not outrageously back). And we'll try to make a word out of it later.

For the record, this is a fairly typical /u/ for me. Not at all round, fairly high, and with front on-glide following the coronal.

Lower-Case N
[n], IPA 116
So I think the oral closure happens on at about 1075 msec--when the zero kicks in. Which is another contributer to the fuzziness of the preceding vowel--nasalized vowels tend to have broader bandwidth (and more centralized formant frequencies) than their oral counterparts. So the zeroes are a good thing, really--they tell us this has to be a nasal. Frankly, the pole looks like it's about 1000 Hz, and so I'd say this was bilabial. And I'd be wrong. Good guess, but if it's not bilabial, then it has to be alveolar. No hint of velar pinch, and, well, there is that narrow thing at 1500, which is where I'd expect the pole for an [n] to be, in my voice. There's no hint of that in the initial nasal of this utterance, so there's some difference. But I wish I knew what was going on on at 100 Hz.

Lower-Case Z
[z], IPA 133
Well, there's a hint of voicing at the bottom, so this is probably voiced. The noise is [s]-shaped, if you follow, and weaker (and shorter) than we'd expect for [s], which is consistent with the idea that it's voiced.

Lower-Case I
[i], IPA 301
Well, if the previous thing is an alveolar, then we can say that the onset frequency of F2 is in line with the alveolar locus, which means all that movement is just transitional. Or we could suppose that it's meaningful. I n the first case, coupled with the relatively low F1, I'd be looking at that spot, just after 1300 msec where the F2 levels off or just a bit, and say that was our target F2 frequency, which would make this an [i], just because nothing else ever has an F2 above 2200 Hz. But in the other case, we'd say this was a relatively high, front vowel moving higher (I guess) and much much fronter, something much more like classical [eɪ]. One or the other. One is right, the other's a good guess.

Lower-Case T
[t], IPA 103
So with the exception of that one pulsey thing before 14500, the gap here seems to start at about 1350 emx and go on for almost 100 msec. The transitions into look sort of pinchy (but very front velar, if you follow) and the burst is slightly doubled. All of which just screams [k]. But then we wouldn't get this spectrogram to say anything. So on the high-tilt to the burst, and the phonotactics of the following thing, I'd say this was [t].

Esh
[ʃ], IPA 134
So here you see how much stronger the F2 pole is. And the energy below is weaker. So this looks like an [ʃ]. THis is also more consistent with the F2/F3(/F4?) poles, which are more typical of postalvelaors than alveolars. There's just more room to couple and a longer front cavity to play in. That is, for acoustic coupling to take place and to resonate in, respectivecly. Shame on you for thinking what you were thinking!

Lower-Case I
[i], IPA 301
Well, there's a couple of odd amplitude discontinuities, but they're not really radical, considering the length and overall energy in this vowel. So I'm thinking it all has to do with pitch change, and therefore striation spacing and harmonic structure. So from 1575 to 1925 msec, I'm thinking this is really all one vowel. And since the F2 reaches 2200 Hz (i.e. 'absurdly high for anything except [i], and then still very, very high'), I'd say this was [i]. If you were determined to put vowels on either side, what would you do with the middle?

Lower-Case Z + Under-Ring
[z̥], IPA 133 + 402
Well, this is a lesson, so here goes. This looks like an [s] again, but it's very weak. There's no hint of voicing, but it's weak, and it's shorter than even the fricative in the affricate, even though it's final in utterance. So there's something odd about it. It's not post-alveolar, because even though it looses energy below F2, you'd still expect the F2 pole in the fricative to be a little stronger than above it, and this is flat. The noise gets a little better organized off the top of the spectrogram. All this points to [s]. So how do we account for the weakness? Well, voiced fricatives are almost always weaker and shorter than their voicless counterparts, just because the act of voicing impedes airflow and therefore pressure build up. But this isn't voiced. So I'll suggest it's passively devoiced. That is, rather than devoicing by abducting the vocal folds (as with underlyingly voiceless sounds), the vocal folds remain adducted here. But because we're at the end of an utterance, we (I) don't have a lot of subglottal pressure to work with, and the result is the vocal folds don't vibrate. And there you have it, devoiced [z]. As distinct from [s].