[h], IPA 146
So starting note quite 100 msec in, and going on until 225 msec or so, there's some voiceless (no striations in the very low frequencies, in the range of the fundamental or first harmonic, which in my voice could be anywhere between 90 Hz to 130 or 140. So it's voiceless. There's lots of energy up above, but it's aperiodic, or noisy. If you notice the formants of the following vowel, there's a little more noise in those same frequencies. Which is typical of [h]. The noise, being produced in the laryngopharynx, bounces around the vocal tract the same way periodic energy does, and thus gains energy in the frequencies of the vocal tract resonances and loses it in between. What's interesting is the high F3-there's no hint of rhoticity in the noise, until about 200 msec, when it starts to come down in frequency. We can see that transition continue once the voicing kicks in, but then we're well into the next segment.
Script A + Rhoticity Sign
[ɑ˞], IPA 305 + 419
So ignoring the voicing bar, which you can see is a very narrow band down there around 150 Hz or so, the first resonance is quite high. Depending on how you measure these things, I'm thinking it's that upper band around 850 Hz or so. If you look though, there's another, slightly fainter band just below, closer to 600 Hz. I'm thinking that's just an idiossyncratically strong harmonic (there's something about there all through the spectrogram, regardless of where the F1 is). (Well, use your imagination.) In a perfect world, we might located the 'center' of that formant in that slightly lower-energy space in between what I'm calling F1 and what I'm calling that weird harmonic, since the combined width of those two things is only a little wider than the formants above it. So I don't know. But F1 is definitely high here, so this is a low vowel. F2 starts about 1200 Hz or a bit below, but it rises a little into the following segment. Now look at that F3. This is the best argument for segments (or at least sub-syllabic constituents) I've seen in a while. The F3 in the [h] is up around 2500 Hz, dead neutral. It comes down in the last part of the fricative and through the 'clear' part of the vowel until it approaches its low steady state in the following segment. But if you believe a) [h] does not have oral features/targets of its own, and b) "rhoticity" (lowering of F3) is a feature realized on vowels before approximant /r/, why doesn't the F3 start low in the fricative? Or at least lower, if you believe that the F3 of the vowel is categorically affected by rhoticity. Which it obviously is not. But here you can see the rhoticity is a) not phonological, and b) constrained in the phonetic grammar to the coda /r/, and is allowed to creep into (but not take over) the F3 of the vowel. But not really at all into the fricative. But there's nothing in the fricative to prevent it from doing so. Except obviously there is. So there must be something 'there'.
[ɹ], IPA 151
So I guess I've given it away, this is an approximant /r/ (properly, IPA [ɹ]) of the North American variety. The F1 is still where it was for the vowel, the F2 (oddly enough) is raised to approximate the low F3, and the F3 is very low, almost 800 or 900 Hz lower than it was in the beginning of the [h] (where you can see it returns eventually. Typical of [ɹ].
[m], IPA 114
Then the amplitude falls off around 350 msec. The F2 transition in the /r/ is diving at that moment, which suggests labial transitions. The overall energy from 350 to 425 msec (or so) is lower than either of the surrounding vowels, so this is relatively consonantal. And its edges are sharp, if you see what I mean, suggesting some acoustic change that sucks energy out of the source suddenly turns on, and then off. So this is a typical nasal-the aforesaid sucking occurring as the nasal cavity is opened and the oral cavity is closed, and then stopping when the velopharyngeal port is closed and the oral closure released. There's a nice pole around 400 Hz, which is just to be expected, but the first 'real' pole/formant in the nasal is around 1000 Hz. You can see pole above that (continuous with the F3 of the /r/) is rising. The frequency of that middle pole, the one around 1000 Hz is a good cue to this being bilabial-if the oral closure were further back, this would be higher in frequency. (Go back to acoustic phonetics and read about 'side cavities' if you're not sure why.) So that's two solid cues to this being [m], and none particularly pointing anywhere else.
[ɨ], IPA 317
So from 425 to about 475 msec, there's a vowel. The F1 is sort of low, unless you believe it's still high, but it's not particularly distinct either way. The F2 is in constant motion, almost as if it had nowhere in particular to go. The F3 is still transitioning, so it' snot helping either. Also the F4 if it comes to that, but since we almost never look at F4, we won't belabo(u)r the point. So we've got a short vowel of indistinct structure that never really develops a strong identity of its own. So call it reduced, transcribe accordingly, and move on.
[n], IPA 116
So here we have another one of these. Note its similarity, in terms of its amplitude and edges, to the previous nasal. There's a pole I don't think I've ever seen before at about 850 Hz, so I'm going to ignore it.... The main pole is up around 1400 or not quite 1500 Hz. Note how much higher it is than the 1000 Hz or so pole in the [m]. So there we go. This one isn't bilabial, so we're stuck with alveolar or velar. There's no hint of velar pinch in the transitions into or out of this nasal, and the transition-end frequencies (around 1700 Hz) is consistent with the locus of alveolar transitions.
[i], IPA 301
So the F1 is still rather low. Note the voicing bar in the first syllable. There's a strongish harmonic just below 500 Hz but the main body of the resonance is clearly between the voicing bar and that harmonic. So this is an exceptionally low F1. So this is an exceptionally high vowel. The F2, once it straightens out, is exceptionally high, up around 2100 or 2200 Hz. So this vowel is exceptionally front. And the highest, frontest vowel you can think of? Right!
[ɨ], IPA 317
Well, another section of vowel that's mostly F2 transition. If you missed it as just transition, you have to explain why this vowel is so long when its pitch is clearly quite low (see how far apart the striations are compared to most of the preceding vowels-each of those striations is a glottal pulse). So I think this is actually two different vowels/syllables. In fact, two different words. I worked hard at not putting a glottal stop in this one, so I hope you appreciate the duplicity involved.
[z], IPA 133
So the striations continue, albeit in weaker form, all through the following amplitude dip (from about 700 msec to 750 msec or so?). So whatever it is, it's a consonant and it's voiced. But up above the voicing bar, there's no evidence of periodicity, so no resonance to speak of. So there must be a very tight constriction somewhere. And it's noisy, so it's a close constriction, but not a closure. So we're talking about a fricative. Voiced, but very noisy. The noise is not particularly organized into bands. In fact, it's one broad band. It's a trifle weaker in the lower frequencies than the higher frequencies (note the relative lightness of the noise just around and below 1000 Hz compared to anywhere above), so this looks like it's tilted to the high frequencies. Very high frequencies, without any tilt toward the F2 or F3 region. So there you go. [s]-shaped noise, but voiced.
[ɨ], IPA 317
And another short little vowel, overlapped in the high frequencies with a bit of the noise from the fricative. Or maybe the noise is coming from the upcoming closure. Or both. Hmm. So this is amazingly reduced.
[t], IPA 103
Nice sharp gap so obviously we're dealing with some kind of plosive. There's not a lot going on in terms of transitions suggesting anything in particular. On the other hand, if you look at the release noise burst, it's very sharp, broad band, and evidently [s]-shaped. Although this may be in part a product of the following frication. But whatever. Believe it's alveolar, or at least coronal, or remain agnostic. When it comes to parsing the upcoming fricative your choices will be limited.
[ʃ], IPA 134
So here we go. We've got some very loud friction here. No voicing bar, but with that much noise, you wouldn't really expect any voicing. The frication is very loud, but you'll notice it isn't one very broad band, but has some formant-like shaping to it. It's loudest not off the top of the spectrogram (i.e. between 4-6-8-12 kHz), but seems loudest in the F2-F3-F4 bands. And the F2 band is pretty noisy, while below it the energy drops off sharply. That's pretty typical of post-alveolar [ʃ].
[i], IPA 301
So it's tough to tell where F2 is. You have to surmise from that falling transition afterwards that it's really, really high, around 2200 Hz or so. It's almost merged with the F3, but that's not supposed to happen, so the combined band is still wider than you'd expect a single band to be, but at this bandwidth there's no telling where the separation is. So the edges of the filter overlap slightly. Get over it. So that's the F2, where's the F1? Low low low, I say. We could argue about that, but TMSAISTI.
[v], IPA 129
Another voiced fricative here, from 1075 to 1125 msec or thereabout. Nice striations at the bottom, but no periodicity to speak of above. This is a very loud fricative-it has about the same energy as the previous [z]. But spectrally, this looks different. It doesn't have any tilt to it at all. It just looks white, in the sense of having equal energy at all frequencies. Sort of unfiltered. Well, probably this is louder than it should be-I may have been spitting into the microphone or something. The unfiltered-ness is a huge clue though. In order to be unfiltered, your source has to be uncoupled from the resonators of the vocal tract. Which means it has to have a tight closure, and no vocal-tract-tubey-volumes in front for the energy to bounce around. So this has to be at the teeth or lips. Given that this is English, the lips (bilabial) is unlikely. It would be really helpful if the transitions on either side looked more labial, but they don't. Which might make us think coronal, just by default. But then we'd be wrong. So let's just keep both [v] and [ð] in mind until we can make a word out of it.
[ə], IPA 322
Very short, indeterminate vowel. Moving on.
[b], IPA 102
Another gap, this one rather long, although since we're approaching the end of the utterance that might be lengthening of the final syllable. There's a nice, clean gap in most frequencies, but if you look at the bottom, there's an awful lot of perseverative voicing. More than you'd get if there were a nice abduction gesture associated with an underlying voiceless stop. So this is probably voiced. It's a little annoying that the transitions are so ambiguous. The F2 in the preceding vowel seems to be coming down, well below the 1700-1800 Hz alveolar locus we usually look for with alveolars. So that looks labial. F3? Seems to be high, if anything. Ya gotta love coproduction messing up all your cues. So on the balance, I'm going to say bilabial. The F2 isn't even close to alveolar or velar looking. The F3 is ambiguous, but I'll attribute it to coproduction with ...
Tilde L (Dark L) + Syllabicity Mark
[ɫ̩], IPA 209 + 431
... the raised F3 of this segment, which is lateral. You can tell because of the raised F3. /r/s have greatly lowered F3s, /l/s tend to have slightly raised F3s, and/or sometimes F4s. With an F2 below 1000 Hz, this can only be described as back (or round), so it's dark as well. If you believe those first few pulses with energy in F3 and F4 and above are evidence of a separate vowel before lateral-contact, you're welcome to insert a schwa or something. But I tried to be careful and release the /b/ into the lateral. There are advantages to doing these things with your own voice....