This year, I'm trying to start from basics, so I'm going to try to take nothing for granted except the basic acoustics and phonetics. If you don't know what a formant is, or what a plosive is, or what it means to be back, go read _A Course in Phonetics_, or at the very least, see my "How To" page.
So here's the thing. As I start writing this, I'm at gate 211 at YYZ (Pearson, Toronto) waiting to board my flight home, hopfeully in about 15 minutes. It's Sunday, 30 January, and I'm at the end of what I think was a successful conference. But it's also late-ish on my fifth straight day of being 'on' most of the day on about 3-5 hours of sleep a night. So since I have to put this up Tuesday morning, I'm grabbing my spare moments here to do this. I'll have to continue on the plane, and who knows when else tomorrow. So if the text of this is more disjointed than usual, you know why.
Small Capital I
[ɪ], IPA 319
Well, picking up after the unusual 150 msec of silence at the left edge (just ask anyone, 150 msec of silence from me is an unusual event), the first visible thing here is a vowel. Regular pulsing begins at about 150 msec and continues for about 75 msec. The energy is quite strong and goes all the way up the (visible portion of the) spectrum. So now we check the formant structure. The F1 is the lowest formant, and it seems to occupy the bottom half of the first 1000 Hz. So bandwidth being bandwidth, the centre of this formant is probably well below 500 Hz. Let's say 300-400 Hz or so. The F2 starts up not quite around 2000 Hz and falls to about 1750Hz. So this ain't an upgliding diphthong. F3 doesn't tell us much, sitting up around 2500 Hz and F4, if you care, is right where it's supposed to be around 3500. I've never looked so Average General American Male in my life. ANyway, we have a very low F1, so we have a quite high vowel. We have a quite high F2, so we have a distinctly front vowel. And it's short, and glides, if anything, backward rather than forward (I.e. the F2 indicates retreat from front to not-so-front), characteristics of front lax vowels. So this is a high, front, lax/short vowel.
[t], IPA 103
So what we have here, starting about 225 msec and running to about 300 (and a bit further in the lower frequencies) is a gap. An empty space in a spectrogram. A moment of relative silence. Which is usually associated with plosives. Now a good, strong, domain-initial plosive would have a nice closure transient and a nice strong release burst. But this one has neither, as far as I can tell. But that might just be because it's in a weak position, prosodically. So it's probably a plosive, and probably ina coda. It's definitely voiceless, with no striations at the bottom in the 'voicing bar' we look for. So we have a limited number of choices. (Quiz for beginners: What are the voiceless plosives in English?) So we need to look for clues as to place. With plosives, those are usually in transitions, into or out of the closure, in the release information, and in the top-down phonotactic knowledge that we all have such good command of. So if you look at the transitions into the closure, F4 isnt' doing much. F3 isn't doing much. F2 seems to be approaching about 1750 Hz. F1 doesn't seem to be doing much.
Brief excursus 1: There's "positive" evidence, i.e. this observation points to this conclusion, and then there's "negative" evidence, i.e. the absence of anything that points to something different. Positive evidence is better--there are some areas where negative evidence isn't even permissible. But this is spectrogram reading and we take what we can get.
So I'm going to guess /t/ here. It's voiceless, it's plosive, and it's consistent with an F2 transition target of around 1750 Hz, especially if you are fond of locus theory. If you're not, that isn't much evidence to go on, but there's no evidence of velar pinch and not real evidence of labiality (in the form of lowering of all formants, or at least one other than the F2 which is ambiguous, as far as 'lowering' goes). It's also a good guess statistically and phonotactically, just because coronals are so much more common than other plosives. The release characteristics, such as they are, are consistent with this guess--more consistent with a guess of coronal than of anything else.
Brief excursus 2: At this point, it's useful to start looking at top-down information. So how many words an you think of that start with /It/? How many of those are likely to start an English sentence? Good.
[s], IPA 132
So there's a brief bit of friction here, about 50 msec long straddling the 300 msec mark. This could just be the release of the preceding plosive, but the fact that it gets stronger and broader-band (involves more frequencies) at the right end rather than at the release of the preceding plosive suggests that this is not 'just' the release of the preceding /t/. So if it is something else, what is it? It has a broad band, i.e. the energy is distributed over a large and mostly continuous range of frequencies, and it's strongest off the top of this spectrogram (so its peak must be above 4400 Hz, probably at least 6-8 kHz. This is typical of siblant [s].
[ə], IPA 322
Well, we've got here a vowel. If you notice, the F1 is just a little higher than the previous vowel, so this must be vaguely mid. The F2 is sort of all transition, so it doesn't seem to show evidence of a 'target' of its own. That's a pretty good indicator of a reduce vowel, i.e. something which in English you'd just transcribe with a schwa and then move on. Which is what I'm going to do.
Lower Case W
[w], IPA 170
Well, the F2 is the real clue here. It's diving down to about 750 Hz or so, indicating something very round and/or very back. The reduction in energy in the frequencies above 1000 indicate a degree of stricture greater than for a vowel, but since we've got something apparently sonorant and fully voiced, it must be an approximant. A nasal would have the reduction in energy, but it would effect the low frequencies as well, to a greater degree than here. So we've got a backish roundish approximant. The F3 is being drawn down by the the coming transition, so it's not a good source of information, but the F4 is also lowered, again suggesting rounding.
Turned R + Syllabicity Mark
[ɹ̩], IPA 151 + 431
My favo(u)rite sound. Just look at that F3? What else needs to be said? You never see an F3 that low except for an English-type approximant /r/. The syllabicity you have to derive from the fact that you want to call the two flanking sounds 'consonants' so that this has to be the vowel.
Top-down alert: At this point, we can start hypothesizing. "itswer" is an unlikely word, but 'it' is a very likely beginning to a sentence. If 'it' is the subject of the sentence, we need to look for a third-person verb. If 's' is that, then we're looking for some kind of locative, predicate nominal, predicate adjective, or something like that...
[k], IPA 109
Another gap, so probalby another plosive. And voiceless. It has a double burst approaching 700 msec, which is most characteristic of velars. Velars also exhibit 'pinching' of the F2 and F3 frequencies, which we also see, although between the lowered F3 of the preceding sound and the raised F2 of the following sound, the apparent approximation of the F2 and F3 frequencies may not be the most useful cue here. Now, note that I said voiceless, but not *aspirated*. (Quiz for beginners: What is the significance of a voiceless plosive being unaspirated in intervocalic position?)
[ɨ], IPA 317
ANother teeny short vowel that's mostly transition. This one is fronter (higher F2) than the previou sone, so following Keating et al (1994), I transcribe it as barred-i.
[n], IPA 116
So this is what I meant above by the lowered amplitude applying to all the frequencies. Somewhere around 750 msec, two things happen--the amplitude drops off at all frequencies (with the sudden appearance of zeroes in several places) and the formants flatten out completely. So this is a nasal. For nasals, you want to look at the frequency of the first pole above the F1. Which would seem to be about 1500 Hz, which for my voice is consistent with the alveolar nasal. Bilabials have that pole closer to 1000 Hz, and velars usually evidence some degree of velar pinch, which if you look at the barred-i is not at all in evidence.
Top-down alert: If the previous syllable is 'work', then this syllable could well be 'ing' or rather "in'". But what are the odds? If this is so, then the phrase 'it's a working...' is plausible, but the next thing is likely to be a noun, modified by 'working'.
Lower-Case P + Right Superscript H
[pʰ], IPA 101 + 404
So we've got another short gap here. The only evidence of anything I can see is in the preceding nasal, which in terms of transitions makes no sense whatsoever.
[ɹ], IPA 151
There's another one of those low F3 things, this time on the periphery of a vowel rather than being one. Note the slight attenuation of the higher frequencies characteristic of approximants as opposed to vowels.
[ɑ], IPA 305
So we've got a vowel. Abstracting away from the transitions, we want to look at the stretch between about 1050 and 1100 msec, where the F1 and F2 are 'steady', and the F3 is as stable as it gets. And during that stretch, the F1 is very high, indicating a very low vowel. The F2 is very low, which as we said before indicates backness, rounding or both. So we're looking for a vowel which is about as far back as we can go and as low as we can go. So we're looking in the vicinity of the Cardinal 5.
[k], IPA 109
So the question is whether the falling F3 transition leading to this is another /r/, or if it's just pinch. I'd say it was just transition, but I'm not sure--it seems to me that the F3 woudln't bother to rise and be steady at all if there were flanking /r/. So since the gap here is followed by a nice double burst, and probably another velar, I'd say that this is all consistent with velar pinch.
[ɹ], IPA 151
I actually didn't intend this to be an r-ful spectrogram, but then I am the /r/ guy. Mostly I wanted to throw the Canadians in the audience by saying pr[ɹ]gress instead of pr[o]gress.
[ə], IPA 322
This looks to me like a schwa, although that's not what I *think* the vowel is phonemically. But whatever. The formants seem to be "about" 500, 1500, 2500 Hz (this last at least at the end as the voicing dies out).
[s], IPA 132
So here's another fricative, quite weak, given how very long it is, but I take that to be a function of its utterance-final position. It's broadband, and it's strongest in the highest frequencies (and best organized up there too--the low frequencies kind of come and go, but the upper frequencies are always there. So this is another sibilant [s].
So it turns out that that middle word wasn't 'working', but two words 'work in'. Always go back and reconfirm and retest previous hypotheses.