Due to various delays, I decided to take a shortcut on this month's spectrogram. This one is composed of four words, one word each from the following pairs.
So the real trick here is to work out whether the first consonant is [h] or [ɫ][h] or [ʃ], whether the second onset is [tʃ] or [tʰɹ̥], etc.
For the sake of comparison, I've included the 'opposite' spectrogram at the bottom of the page.
I'll only be discussing differential cues this time, again, just because of time constraints. (I will leave it as an exercise for the reader to segment the 'known' segments and work out what cues there are as to their identities.)
[ʃ], IPA 134
From about 75 - 225 msec. This looks more like [ʃ] than [h] for a couple of reasons. The first is that it's too loud. This has absolute amplitude like a vowel rather than a consonant. So this very loud frication is tilted to the higher frequencies, typical of sibilants in general. This looks like [ʃ] rather than [s] since it has very little energy below F2, below which it drops off fairly sharply ([s] has broad band noise that may diminish at the lower frequencies, but it'll do so more gradually). The fact that it drops off right below F2 is suspicious, if you were wondering. Also an [s] would not have that strength specifically in F2/F3/F4, but presumably would have a single broad band much centered much higher. An [h] would have less energy over all, and wouldn't have any kind of discontinuity with the following vowel (except in terms of voicing). If you notice, the "F2" in the fricative doesn't match that in the vowel.
Lower-Case T + Right Superscript H, Turned R + Under-Ring
[tʰɹ̥], IPA 103 + 404, 151 + 402
350 msec to 450 msec or thereabouts. The choice here is really between an [ʃ] release to the affricate or a voiceless [ɹ̥]. I'll duck the whole question of segments and affricates and so on. Okay, so the gap for the plosive goes from about 325 msec to the release somewhere between 375 and 400 msec. The release frication probably runs from about the release for between 25 and 50 msec. The 'center' of the /r/ moment, if you follow me, is around 425 msec. Notice that by the time voicing kicks on at about 450, the formants are already moving fast. So our choices for this bit, from 400 to 450 msec or so are the /r/ (devoiced due to the aspiration) or Esh. Notice the intensitive of the noise—on release, it's nice and sibilant. It's centered pretty low, sibilant-wise, and looks a lot like the previous Esh. But you'll notice the intensity drops off fairly quickly, instead of being nice and sustained through the voicelessness, and also that the noise is in the shape of the following formants. The F2 starts up wherever it stars on release (around 1900 Hz or so), falling rapidly to just below 1500 Hz. The F3 falls out, but notice in the release how the corresponding band is definitely falling. Extrapolating or interpolating, or whatever, from the angles of the transitions on either side, it looks like the F3 drops to just below 2000 Hz, but there's not a lot of evidence that it really gets there. But those transitions in F3 can only be due to rhoticity. And the lowness of the F3 and the closeness of F2 and F3 together explain the esh-shaped-ness to the release noise--the center of the energy is being pulled down by the low formants. But this explains why there seem to be people who have a /tr/ goes to [tʃ] rule and/or /dr/ goes to "jr". For comparison, notice how, while diminishing, the esh-noise in the comparison spectrogram is more or less stable right through until the voicing kick in.
Lower Case W
[w], IPA 170
Approxiamant or nasal? We're looking at the fully-voiced segment from about 725 msec to just past 800 msec. It's got less energy in the voicing bar than in the following vowel, but that's typical of both nasals and close approximants. The transitions are mostly bilabial, although F3 isn't helping much. So nasal or not? Well, not. Nasals don't have to have 'sharp' edges, but prevocalically the usually do. See that moment near 800 msec in the comparison spectrogrma. The edge here is the velum closing--at that moment, the acoustic change suddenly. The energy that was being lost by the in the nasalization is suddenly regained, the main resonances change--notice how the formants 'pop on' without transitioning. Here, ther formants are all transition, suggesting something oral throughout, with continuously changing articulators transitioning from the /w/ moment to the following vowel.
Tilde L (Dark L)
[ɫ], IPA 209
Finally another lesson in approxmants, this time /r/ and /l/. We're looking at the moment that begins when the voicing kicks on around 1000 msec, and going utnil the upper frequency periodicity really becomes clear, at about 1075 msec. Again, this doesn't look nasal due to the continuity of the whole thing. That ahd the F1/voicing bar complex is too continuous (in both amplitude and frequency) to indicate a sudden addition or loss of a cavity. So what's the difference between a North American /r/ and an /l/? The F3. Lowered F3 for /r/, raised F3 (or sometimes F4--ideally both) for /l/. So where's the F3? F1 is down just below 500 Hz. F2 is just above 1000 Hz, and F3 is way up there around 2750 Hz. It's falling a little, so by the time the upper frequendcy periodicity kicks on it's already almost back down to 2500, but you can still see how high it was in the noisy, semiperiodic energy during the approximant. So that's it. Raised F3.