Before diving in, take a moment and work out some basics. How many syllables are we dealing with? Are there any cues that suggest where the lexical stresses and/or intonational accents and boundaries are? Any obvious segmental cues? Inexplicably long vowels? Gaps? Apparent nasalization? Sibilance?
[ð], IPA 131
So there's this voicing that starts around 50 msec in from the left edge. The resonances of the vowel don't really kick in until at least 125 msec, so that leaves us with almost 75 msec of voicing to do something with. Just 'cuz there's a vowel following, I'm voting for consonant. And not a very open consonant, although since this is the beginning of an utterance, I'd assume a fair amount of initial strengthening going on. If you look up to the 2500 range, and again at the 3500 range, what's that? That's noise. So this is probably a fricative. If it were sibilant, it would be louder. If it were /h/ it would have more F1/F2 involvement. Which leaves the labiodental and the interdental. And those transitions don't look labial. The F2 starts too high, although frankly the F3 and F4 are not helping.
[i], IPA 301
Well, there's this big transitional thingie, but I'm looing specifically at that moment between 200 and 225 msec. The F1 is low, so it's high. F2 is way high, so very front. So we're dealing with a high front vowel. Owing to another 150 msec more or less of vowel (even if the F2 is moving throughout), there's probably another vowel following, meaning we're at a hiatus moment here. Meaning this syllable must be open, which means this vowel, if high, is tense. Think about your phonotactics.
[ə], IPA 322
Well, the F2 is moving throughout. So deciding on a backness value is perhaps a waste of time. It's interesting that the vowel definitely is lower (though still not 'low') than the preceding vowel, so this is probably middish. But the F2 never really comes to a stop, or even slows down, so there's no evidence of a true 'target' for the F2. Which makes me think of vowel reduction. Which makes me happy because then I don't have to decide anything about this vowel.
[v], IPA 129
On the other hand, this F2 transition can't be anything but labial. It's way lower than the alveolar "locus" (we can argue about the earlier transition into the [i], but it looks vaguely alveolar, more alveolar than labial, if it comes to that). Perhaps most importantly, the F3 and F4 are clearly lowered--aside to Vineeta Chand: but not at all 'low' ;-)--suggesting some labialization in here somewhere, but nothing like real rounding. So this thing from 350 msec to 400 msec or so is vaguely labial, in the sense of not obviously being round, and being more labial than coronal or velar. So anyway, we've got a consonant, voiced, and if we look really closely, fricative. That could be the noise that you get with my mushy stops, but you don't get that kind of voicing even in my mushy stops. This being English, there aren't a lot of voiced labial fricatives to wonder about.
[ɛ], IPA 303
So for almost 100 msec, there's a vowel. The F1 is, well, higher than 500, but not by much. So this is mid or vaguely lower than mid. The F2 is, well, higher than 1500 but not my very much, so this is front, but not like front in high-and-front. So middish to lower-middish, and vaguely front.
[n], IPA 116
So for almost another 100 msec, there's ful voicing, but the amplitude drops suddenly at 500 msec. So we've got full vowel from 400 to about 500 msec, and then something less than fully open from 500 to almost 600 msec. Around 600 msec there's burst, so we'll have to jam in a plosive in here somewhere, but the voicing (and the upper frequency resonance) aren't consistent with a voiced stop. So this is osmething else. The sudden change in amplitude suggests nasal, although frankly I'm at a loss why it looks like it does. The transitions in the preceding vowel could go either alveolar or labial, depending on your mood, although the F4 is just sitting there, which might tip the scales toward coronal. But the pole, if that's what it is that stands in for F2 is moving (from about 1250 to around 1600 Hz), which is just odd. And there's no obvious zero. So exactly what to do this one is a mystery to me.
[t], IPA 103
So this burst at 600 msec needs explaining. I explain it thus. This is how homorganic nasal-stop clusters look. Nasal nasal nasal, with an oral burst. It's definitely alveolar, as the burst noise is 'acute' (higher in the high frequencies--i.e. looks like about 5 msec of an [s]). There's definitely a disruption to the regular voicing pattern, although how exactly we're supposed to realize voicing or voiclessness aligned with so 'instantaneous' a cue like this burst is again a mystery to me.
[ɨ], IPA 317
So after the bvurst, there's a short little vowel. Again, the F2 is just zooming, so I'd say there's no particular vowel target, or at least the vowel target gets overwhelmed by the overlap with the flanking transitions. But that's just a theory. It's reduced. Skip it.
[z], IPA 133
Okay, so looking at the higher frequencies, there's some frication going on. It's not organized in bands; it's one big band, centered really, really, high. Which makes it look sibilant, but it's not really that lound. But if you look down at the bottom, there's some voicing. It's dying away really fast, but it's there. which explains the relative weakness of the noise--hard to maintain sibilant airflow and voicing at the same time! So what we have is weak [s] noise, accompanied by voicing. That is, [z].
[ə], IPA 322
I'm not intending to make it a rule that if the F2 is just moving, you should ignore the vowel, but really? Is there *any* evidence that the F2 is going somewhere other than between where the consonants are pulling it? Okay, well, if it is, then this one has some kind of slightly low F2, so if you really want this to be back or round, fine with me.
Tilde L (Dark L)
[ɫ], IPA 209
But it's back because it's being coarticulatorily (?) velarized by this bit. From 825 msec or so for at least 100 msec, there's something consonantal happening. Now, I'd say this was a nasal. Look at those sharp edges. Look at those flat resonances. What's more, I'd say [m], because that nice little pole at 1000 Hz just screams [m]. But I'd be wrong, because I didn't consider the upper formants. Look, there they are. In spite of the lower and higher apparent zeroes, these look pretty strong. In a nasal, all the formants get relatively weaker. Than they would be in an oral vowel Compare the overall amplitude of the resonances to the vowel in the preceding [n]. And then there's the F3 frequency to contend with. It's raised. Compare the F3 here with that in any of the preceding vowels. That's raising. So that gives us a clue--this could be a lateral, and that 'pole' is just the very low resonance of a very back (velarized) lateral. Ooh. I think we're on to something.
Lower-Case O + Upsilon
[nʊ], IPA 307 + 321
Well, on the far side of the lateral is this thing. F3 starts around 500, maybe a little higher, and in the last 1/3 -rd or so it drops, so this vowel goes form sort of mid to sort of high. And it's way back and possibly round, judging from the low F2, and it gets backer/rounder. Easy.
[k], IPA 109
Well, this one is rough. The F2 and F3 are too far out of normal to provide much in the way of useful transitional information, at least traditional transitional information. We've got a mushy gap, so we're probably talking about a plosive. It's voiceless, so we're down to three possibilities. That's progress. The real giveaway is the burst. It's not [t] looking. It's got bands, and if anything it's got a weak bit up around 4200 Hz. There's some energy in F2, but not below. So this isn't really good for bilabial. So guess velar. Well, okay I can then convince myself that the F2 and F3 transitions into the following vowel are vaguely pinchy (it's a stretch, but there's always some leaps of faith in this enterprise). And then there's that clunk. See it? If the main burst is at 100 msec, then at about 1125 msec, int he F1/F2 region? See it? That clunk? What if that's the second burst of a double burst? Ooh, double bursts are usually characteristic of velars. Ooh, corroboration. Gotta love it, especially if it's all you've got.
Tilde L (Dark L) + Syllabicity Mark
[ɫ̩], IPA 209 + 431
Does this look familiar? Except this time it's between consonants.
Lower-Case T + Right Superscript H
[tʰ], IPA 103 + 404
So the first thing to notice about this is the release. It looks alveolar/[s]-like. And that VOT is amazing. Too amazing.
Turned R + Under-ring
[ɹ̥], IPA 151 + 402
This explains the length I guess. Aspiration tends to go along with the duration of the following approximant, if there is one. And there is one. There must be or there's no explanation for that F3. See how both F3 and F2, when the voicing finally kicks on, are both below 2000 Hz. That low F3 is a dead giveaway.
[ɨ], IPA 317
And here's another one of these stupid vowels.
[d], IPA 104
Well, here's another long gap. Quite a long gap, actually. And fully voiced. That's funny. The transitions are consistent with alveolar, but it's so hard to rely on those. But the release looks like another alveolar release. Except this time it's voiced, and the VOT is short, if it's greater than zero at all.
Small Capital I
[ɪ], IPA 319
Well, there's some high frequency noise, but given that we've got an alveolar plosive on one side and a clearly sibilant fricative, I think we can just call it coarticulation, or reverb or something. SO what have we got. Well, it's got to be a vowel. Probably high, or at least highish, judging from the F1. Quite front, judging from the F2, but not as front as the first vowel in this spectrogram. So this is probably [ɪ].
[ʃ], IPA 134
Ah, sibilants. This is high amplitude, relatively broad band, high frequency energy. This one is more 'shaped', by resonances/filters, than, say, the aforementioned [z] or the noise for the [t] releases. It's also centered a bit lower--most of the others have their center, at least conceptually, above 4000 Hz. THe center here is definitely lower, just above 3000 Hz. ANd the energy shuts off abruptly (relatively speaking) below the F2 of the surrounding vowels. That's pretty typical of [ʃ].
[ə], IPA 322
Last vowel of the spectrogram. Kind of short for a final vowel. Must not be that important. Seriously. Final syllables lengthen. If this is the lengthened version, how short would the unlengthened version be?
[n], IPA 116
And finally, something that has reduced overall amplitude, clear resonances, and definite zeroes. At least, something that's definitely nasal. The pole, what you can see of it, is up around 1500 Hz, which if you're looking at my voice is pretty clearly alveolar looking, especially without any hint of velar pinch in the incoming trnasitions.