Solution to Last Month's Mystery Spectrogram

Solution for February 2005

"We ate toast with jam."

I was thinking about intonation when I did this one, and in my memory I kept hearing Mary Beckman's voice intoning ToBI sentences. Since I can't use proper names in these things (by rule set down long ago by Peter Ladefoged), I couldn't use my favo(u)rite "Marianna" sentences, and anyway, since I wasn't doing intonation with this one it wasn't that important. But there are a lot of 'jam' sentences as I recall. But we will be doing some intonation stuff later on this year.

Lower Case W
[w], IPA 170
Well, starting at about 75 msec or so, there's some voicing going on. You'll notice that there's not much in the way of energy above 1000 Hz. ANd the F1, such as it is, is weaker than the F1 of the following vowel (or whatever that is). So we're looking at something sonorant (voiced and open enough have some serious periodicity to it), but not open enough to really be even a high vowel. So this must be some kind of approximant, by traditional definition. The F1 is hard to see, but it's lowish, whatever it is, which is consistent with a tighter-than-open constriction. The F2 is tough to make out, if in fact you can see it at all, but it's clear the F2 transition into the following vowel starts around 900-1000 Hz or so. So it must be quite back and/or round. Probably and. So how many back, round approximants can you think of?

Lower-Case I
[i], IPA 301
So abstracting away from the transition, this vowel thing has a low F1, lower than mid-range anyway. and an absurdly high F2 up raround 2300-2400 Hz. That's just freaking high. So we're dealing with something high and exceptionally front. Again, how many such vowels can you think of? Good.

So at this point, knowing we've got an english sentence, we could probably make a good guess at the first word. Or at least the first syllable. And further, if we feel like we have a word/syllable that could plausibly be the subject of a sentence (someday I'm going to put a weird adverbial or something at the front of a sentence and derail this whole line of reasoning), we might guess that the next bit has to be some kind of verb. Or we could be wrong about [wi] being a subject, or even about it being [wi]. But it's a working hypothesis.

Lower-Case E
[e], IPA 302
I'm trying to be consistent about marking movement in vowels, but I'm not sure what I was thinking here. But we have something here that is separate from the preceding vowel--there's a sharp change in frequency, as well as a sudden change in F1 frequency. The F1 frequency is higher than the previous vowel, approaching the mid-range, but not quite. So this vowel is still quite high, but not as high as [i]. The F2 is similarly not-quite-as-high as the preceding vowel, so the this is not quite as front but still quite radically front. So not as high or as front as [i], but still high or high-of-mid, and very front. Possibly moving back towards [i], at least as we approach 400 msec or so. So possibly diphthongal. Sound familiar? If you're wondering about the height, check out the height of /e/ in my 1997 JASA paper.

Glottal Stop
[ʔ], IPA 113
Well, as we approach 400 msec and a bit beyond, the periodicity, or the regularity of the voicing striations starts to fall apart. So either there's a very abrupt and very extreme drop in F0, or there's some creak going on here. Creak in the sense of glottalization. Glottalization as might result from a glottal stop. Hint hint.

Lower-Case T + Superscript Lower-Case H
[tʰ], IPA 103 + 404
Well, glottal stop aside, there's longish gap here. 75 msec or so. Well, not quite, but long enough to probably be a plosive. There's some indication, in all that glottality of falling transitions in the lower three formants, so we might be thinking bilabial. But look at the release on the other side. Sharp release concentrated in the high frequencies. Strong noise, again concentrated in the higher frequencies. And a longish VOT, 50-75 msecs again. But most of that is clearly aspiration with formants running through it and everything. So let's look at that release. Almost nothing in th elow frequencies. And no indication of bilabiality in the transitions. And that noise in the high frequencies like it was a really short [s] or something. Hmm. Something with an [s]-shaped release. Maybe an alveolar stop? Voiceless and aspirated, as it turns out.

Lower-Case O + Upsilon
[oʊ], IPA 307 + 321
Now this is a diphthong. The F1 starts a little high of the mid range and moves downward. So this starts a a little lower than mid and moves toward a high vowel. The F2 starts well, the F2, when the voicing kicks in at about 500 msec, is low of the mid range, so this is sort of back and/or round, but the F2 again drops in frequency reaching a min at about 700 msec. So it's getting backer and/or rounder. So middish to highish, and backish/roundish to backer/rounder.

Lower-Case S
[s], IPA 132
So this next bit is definitely voiceless (no periodicity, no striations, no low-frequency "voicing bar" energy). It's not very little in the way of formant structure (except possibly some in F2, almost definitely the front cavity. And fairly high amplitude noise in the very high frequencies (very high at least in the sense of being at the top of the frequency range in this spectrogram, which goes up to about 4400 Hz. So this is probably an [s].

Lower-Case T
[t], IPA 103
Well, here's another gap. This one is shorter than the previous one. It's voiceless, but it's hard to tell if it's aspirated. There's some periodic looking things in the low frequencies that could be voicing. But during the closure it's voiceless. The release is sort of sharp, but doesn't have a strong transient to it, suggesting that the closure was sort of weak without a lot of pressure building up behind it. The release noise is concnetrated in the F3 range and higher. The F2 is a little lower. So there's no obvious velar pinch in the release. The noise is consistent with an alveolar, but not great. BUt as it turns out there's a reason for the noise to be a little lower than in the previous [s] or [t]...

Lower Case W + Under-Ring
[w̥], IPA 170 + 402
So, you might have noticed that the vowel starts out lower, in the aspiration, or whatever that is, than in the voiced poriotn. For something so weakly released, that apparent voicelessness/apsiration/absence of periodicity in the F2 and F3 range goes on for an awfully long time. Maybe there's something else here to time the voicing (or lack thereof) with. SOmething that would otherwise have a very low F2. Hmm.

Schwa
[ə], IPA 322
Well, there's teeny tiny bit of real voicing in here, with formants and everything Certainly a local sonority peak, worthy of being called a vowel, but otherwise not worth worrying about. Schwa. Done.

Theta + Raising Sign
[θ̝], IPA 130 + 429
So there may be a little short gap before the fricative thing, but as it turns out that will be a red hearing. So paiyng attention to the noise, it looks noisy. If you were misled by blip of energy at the very low frequencies which otherwise might be consistent with voicing, you were misled. With that much energy down there, we should see definite striations, and given that there is formant-like energy above, I'd expect it to look more periodic up there too. So this is just noisy and voiceless. There's some formanty stuff, and it's not loud enough of broad-band enough to be a sibilant. It could be an [h], but then there should be more in the F1. With gaps on both sides, it's not like there's a lot of transitional information, but f we look at the transitions, they don't look particularly velar or bilabial. So it's some kind of front, and maybe coronal fricative.

Lower-Case D + Under-Ring
[d̥], IPA 104 + 402
Well, this is a voiceless gap, and probably coronal for the same reasons as the preceding. And I mean that literally. The release transition isn't followed sharply by the high amplitude noise, like I'd expect with a simple /t/ release, but what do I know?

Yogh + Over-Ring
[ʒ̊], IPA 135 + 402
So if you notice the earlier [s] and [t] bursts, this doesn't really look quite the same. This is a period of high amplitude noise. It's broad-band, but centered a little lower than the [s]s earlier, and it's pretty dead below 1500 Hz. Typical of [ʃ]. But if this were a syllable-initial [tʃ]. I'd expect it to look, well, more aspirated. So I transcribed its a devoiced [dʒ], but whatever.

Ash + Length Mark
[æː], IPA 325 + 503
Well, I marked these last two segments as long, but I'll probably stop doing that since phrase-final lengthening is so totally predictable in these things. I was in a mood, I guess. So, we've got an F1 that starts middish (although that may be transitional) and moves upward, so this is a mid-to-low sort of vowel. The F2 starts very high (so this is quite front) and seems to transition down to at least the mid-neutral range (the last bit, after 1400 msec or so I'd ignore since there's an amplitude change there and things definitely start to transition at that point). SO this is very front and moves centrally or backish. Which is not what I would call stereotypical English vowel behavio(u)r. So let's thing a second. The preceding sound is a close fricative in the post-alveolar region, so very front high transitions in the vowel are okay. The next sound is obviously a sonorant consonant, probably a nasal. So there's probalby some nasalization covering the transitions. So I'll concentrate on the middle portion of this, rather than treat it as a diphthong. And it's mostly a lowish vowel, and vaguely front. This narrows the choices down a bit.

Lower-Case M + Length Mark
[mː], IPA 114 + 503
So this is probably a nasal. It's got weak resonances, but the main one is either at 1000 Hz or so, or at 1500. Which is not helping, since depending on which it would be a different nasal. So there's two things about the transitions in the preceding to consider. The first is that the F2 transition neither pinches up with the F3, nor points toward 1700-1800 Hz. In fact it falls much further than that, which is consistent mostly with bilabial. The second thing is that the F2 transition is clearly contiguous with the 1000 Hz resonance. So that's probably the one to pay attention two. And in my voice, the 1000 Hz pole is usually for the bilabial [m]. and the 1500 Hz pole is usually for [n]. Voilá.