I was thinking about intonation when I did this one, and in my memory I kept hearing Mary Beckman's voice intoning ToBI sentences. Since I can't use proper names in these things (by rule set down long ago by Peter Ladefoged), I couldn't use my favo(u)rite "Marianna" sentences, and anyway, since I wasn't doing intonation with this one it wasn't that important. But there are a lot of 'jam' sentences as I recall. But we will be doing some intonation stuff later on this year.
Lower Case W
[w], IPA 170
Well, starting at about 75 msec or so, there's some voicing going on. You'll notice
that there's not much in the way of energy above 1000 Hz. ANd the F1, such as it is,
is weaker than the F1 of the following vowel (or whatever that is). So we're looking
at something sonorant (voiced and open enough have some serious periodicity to it), but
not open enough to really be even a high vowel. So this must be some kind of
approximant, by traditional definition. The F1 is hard to see, but it's lowish,
whatever it is, which is consistent with a tighter-than-open constriction. The F2
is tough to make out, if in fact you can see it at all, but it's clear the F2 transition
into the following vowel starts around 900-1000 Hz or so. So it must be quite back and/or
round. Probably and. So how many back, round approximants can you think of?
Lower-Case I
[i], IPA 301
So abstracting away from the transition, this vowel thing has a low F1, lower than
mid-range anyway. and an absurdly high F2 up raround 2300-2400 Hz. That's just freaking
high. So we're dealing with something high and exceptionally front. Again, how many
such vowels can you think of? Good.
So at this point, knowing we've got an english sentence, we could probably make a good guess at the first word. Or at least the first syllable. And further, if we feel like we have a word/syllable that could plausibly be the subject of a sentence (someday I'm going to put a weird adverbial or something at the front of a sentence and derail this whole line of reasoning), we might guess that the next bit has to be some kind of verb. Or we could be wrong about [wi] being a subject, or even about it being [wi]. But it's a working hypothesis.
Lower-Case E
[e], IPA 302
I'm trying to be consistent about marking movement in vowels, but I'm not sure what
I was thinking here. But we have something here that is separate from the preceding
vowel--there's a sharp change in frequency, as well as a sudden change in F1 frequency.
The F1 frequency is higher than the previous vowel, approaching the mid-range, but
not quite. So this vowel is still quite high, but not as high as [i]. The F2 is
similarly not-quite-as-high as the preceding vowel, so the this is not quite as front
but still quite radically front. So not as high or as front as [i], but still high
or high-of-mid, and very front. Possibly moving back towards [i], at least as we
approach 400 msec or so. So possibly diphthongal. Sound familiar? If you're wondering
about the height, check out the height of /e/ in my 1997 JASA paper.
Glottal Stop
[ʔ], IPA 113
Well, as we approach 400 msec and a bit beyond, the periodicity, or the regularity of the
voicing striations starts to fall apart. So either there's a very abrupt and very
extreme drop in F0, or there's some creak going on here. Creak in the sense of
glottalization. Glottalization as might result from a glottal stop. Hint hint.
Lower-Case T + Superscript Lower-Case H
[tʰ], IPA 103 + 404
Well, glottal stop aside, there's longish gap here. 75 msec or so. Well, not quite,
but long enough to probably be a plosive. There's some indication, in all that glottality
of falling transitions in the lower three formants, so we might be thinking bilabial.
But look at the release on the other side. Sharp release concentrated in the high
frequencies. Strong noise, again concentrated in the higher frequencies. And a longish
VOT, 50-75 msecs again. But most of that is clearly aspiration with formants
running through it and everything. So let's look at that release. Almost
nothing in th elow frequencies. And no indication of bilabiality in the transitions.
And that noise in the high frequencies like it was a really short [s] or something.
Hmm. Something with an [s]-shaped release. Maybe an alveolar stop? Voiceless
and aspirated, as it turns out.
Lower-Case O + Upsilon
[oʊ], IPA 307 + 321
Now this is a diphthong. The F1 starts a little high of the mid range and moves downward.
So this starts a a little lower than mid and moves toward a high vowel. The F2 starts
well, the F2, when the voicing kicks in at about 500 msec, is low of the mid range,
so this is sort of back and/or round, but the F2 again drops in frequency reaching
a min at about 700 msec. So it's getting backer and/or rounder. So middish to highish,
and backish/roundish to backer/rounder.
Lower-Case S
[s],
IPA 132
So this next bit is definitely voiceless (no periodicity, no striations, no low-frequency
"voicing bar" energy). It's not very little in the way of formant structure (except
possibly some in F2, almost definitely the front cavity. And fairly high amplitude
noise in the very high frequencies (very high at least in the sense of being at the
top of the frequency range in this spectrogram, which goes up to about 4400 Hz. So
this is probably an [s].
Lower-Case T
[t],
IPA 103
Well, here's another gap. This one is shorter than the previous one. It's voiceless,
but it's hard to tell if it's aspirated. There's some periodic looking things in the
low frequencies that could be voicing. But during the closure it's voiceless. The
release is sort of sharp, but doesn't have a strong transient to it, suggesting that
the closure was sort of weak without a lot of pressure building up behind it. The
release noise is concnetrated in the F3 range and higher. The F2 is a little lower.
So there's no obvious velar pinch in the release. The noise is consistent with
an alveolar, but not great. BUt as it turns out there's a reason for the noise to
be a little lower than in the previous [s] or [t]...
Lower Case W + Under-Ring
[w̥], IPA 170 + 402
So, you might have noticed that the vowel starts out lower, in the aspiration, or
whatever that is, than in the voiced poriotn. For something so weakly released,
that apparent voicelessness/apsiration/absence of periodicity in the F2 and F3 range
goes on for an awfully long time. Maybe there's something else here to time
the voicing (or lack thereof) with. SOmething that would otherwise have a very
low F2. Hmm.
Schwa
[ə], IPA 322
Well, there's teeny tiny bit of real voicing in here, with formants and everything
Certainly a local sonority peak, worthy of being called a vowel, but otherwise not
worth worrying about. Schwa. Done.
Theta + Raising Sign
[θ̝], IPA 130 + 429
So there may be a little short gap before the fricative thing, but
as it turns out that will be a red hearing. So paiyng attention to the
noise, it looks noisy. If you were misled by blip of energy at the very low
frequencies which otherwise might be consistent with voicing, you were misled.
With that much energy down there, we should see definite striations, and given
that there is formant-like energy above, I'd expect it to look more periodic up
there too. So this is just noisy and voiceless. There's some formanty stuff, and
it's not loud enough of broad-band enough to be a sibilant. It could be an [h], but
then there should be more in the F1. With gaps on both sides, it's not like there's
a lot of transitional information, but f we look at the transitions, they don't
look particularly velar or bilabial. So it's some kind of front, and maybe coronal
fricative.
Lower-Case D + Under-Ring
[d̥], IPA 104 + 402
Well, this is a voiceless gap, and probably coronal for the same reasons as the preceding.
And I mean that literally. The release transition isn't followed sharply by the high
amplitude noise, like I'd expect with a simple /t/ release, but what do I know?
Yogh + Over-Ring
[ʒ̊], IPA 135 + 402
So if you notice the earlier [s] and [t] bursts, this doesn't really look quite
the same. This is a period of high amplitude noise. It's broad-band, but centered
a little lower than the [s]s earlier, and it's pretty dead below 1500 Hz. Typical of
[ʃ]. But if this were a syllable-initial
[tʃ]. I'd expect it to look, well, more aspirated.
So I transcribed its a devoiced [dʒ], but whatever.
Ash + Length Mark
[æː], IPA 325 + 503
Well, I marked these last two segments as long, but I'll probably stop doing
that since phrase-final lengthening is so totally predictable in these things.
I was in a mood, I guess. So, we've got an F1 that starts middish (although that
may be transitional) and moves upward, so this is a mid-to-low sort of vowel. The
F2 starts very high (so this is quite front) and seems to transition down to at least
the mid-neutral range (the last
bit, after 1400 msec or so I'd ignore since there's an amplitude change there and
things definitely start to transition at that point). SO this is very front and moves
centrally or backish. Which is not what I would call stereotypical English vowel
behavio(u)r. So let's thing a second. The preceding sound is a close fricative in
the post-alveolar region, so very front high transitions in the vowel are okay.
The next sound is obviously a sonorant consonant, probably a nasal. So there's
probalby some nasalization covering the transitions. So I'll concentrate on the
middle portion of this, rather than treat it as a diphthong. And it's mostly a lowish
vowel, and vaguely front. This narrows the choices down a bit.
Lower-Case M + Length Mark
[mː], IPA 114 + 503
So this is probably a nasal. It's got weak resonances, but the
main one is either at 1000 Hz or so, or at 1500. Which is not helping, since
depending on which it would be a different nasal. So there's two things about
the transitions in the preceding to consider. The first is that the F2 transition
neither pinches
up with the F3, nor points toward 1700-1800 Hz. In fact it falls much further
than that, which is consistent mostly with bilabial. The second thing is that
the F2 transition is clearly contiguous with the 1000 Hz resonance. So that's
probably the one to pay attention two. And in my voice, the 1000 Hz pole is usually
for the bilabial [m]. and the 1500 Hz pole is usually for [n]. Voilá.