Eth + Raising Sign
[ð̝], IPA 131 + 429
Starting from about 100 msec for about 75 msec there's some fairly serious voicing. And it doesn't look like stop voicing, but there's no real evidence of anything happening higher up until at least haflway through, then then only up around F4 should be (up above 3500 Hz or so). But there's a little bit of noise up there, so this may be a voiced fricative. Variations on Eth are often a good guess in this kind of case, just because it's English. (Work out the reasoning for yourself).
Lower-case E + Small Capital I
[eɪ], IPA 302 + 319
Diphthong! Ends with a very front vowel, so we're looking at something fronting. The F2 is way too high to be [a] or anything like a standard diphthong nucleus, so we're looking at something diphthongized. The F1 is low of neutral, so this is a fairly high vowel to begin with (even if it does seem to get higher after 250 msec or so), and the F2 is quite high, so we're looking at something front. So sorta high, quite front, and diphthongized (to the front). Knowing this is English, this is probably /e/. Since 'they' is a pronoun, and subject to a lot of reduction (due to frequency or whatever), I'm agnostic about whether there's an underlying glide at the end of this word. I'm inclined to believe there is, because frankly my /e/ F2 isn't usually that low. But we'd need to check this.
[n], IPA 116
Well, that sharpling falling F2 transition might suggest labial, but a) it starts out so high anyway, b) F3 doesn't seem to be doing anything, and c) it doesn't really fall that far. Taken individually, a) where is it supposed to go from that high, b) if it's labiality that is effecting the F2, it ought to be effecting F3 as well, and c) the transitions stay well above 1500 Hz ro so, and the 'locus' of alveolar transitions, if you believe in such a thing, is about 1700-1800 Hz more or less depending on the voice and who you read. So the transitions here are really suggesting an alveolar. The resonances (and zeroes) suggest a nasal, and the alveolarness (for my voice) is further confirmed by that F2 thing just a hair below 1500 Hz.
[i], IPA 301
Transitions aside (notice that the F2 looks alveolar on either side, although the F3 transition out of this vowel is a little ambiguous, this has the same low F1 and extremely high F2 of (in fact higher than) the previous offglide. So there you go. Must be [i].
[d], IPA 104
Well, this gap, starting about 475 msec and probably continuing to about 575 msec is kind of long, so it might be two things. So looking just at the left half (or so), it has alveolar transitions in (although the F3 loos like it's coming down, which might lead you to suppose this is bilabial--but the lowering in F3 doesn't look 'transitional', so much as it looks like it's 'modifying' from a higher general position). The first 50 msec or so of the gap is clearly voiced, so this is probably [d].
[t], IPA 103
On the other hand, the second half of this gap is probably voiceless. It's got a very strong, sibiliant ([s]-shaped) release, which again suggests alveolar (which is consistent with the transitions out). In spite of the apparent quick onset of voicing, is incredibly aspiratd. There seems to be some noise going on for almost 200 msec, which is just impossible. So I think this is actually pretty heavily aspirated. But since we usually define aspiration as VOT, rather than noise, I didn't transcribe it that way. I might change my mind about this next time.
[ə], IPA 322
Well, there's almost 100 msec of voicing in here, although the F2 is still fairly front and moves to neutral (or just a little lower), so following Keating et al (1994) as I usually do I probably should have transcribed this as a barred-i. But it doesn't make a lot of difference. Note the undifferentiated noise above 2000 Hz that just goes on and on.
[f], IPA 128
More noise, but this time it's voiceless. My first guess at this would be /h/, since there seems to be some resonance--F3 goes straight through, and you can see F2 moving from its low at about 700 msec up to where it is when the voicing kicks on at about 775 msecs. But there's no F1. Which is possible, but atypical of /h/. Hmm. The noise is undifferentiated, once you get past the frequencies below 1000 Hz, so there's very little in the way of high-frequency filtering going on, in spite of the apparent resonance. Hmm. Then there's that F2. Why is the F2 falling to that point around 700 msec, and then suddenly rising again after. There must be *something* there that is a target causing that. It can't be alveolar, because an alveolar fricative of any kind would be more [s] shaped. I suppose it could be postalveolar, given the absence of low-frequency energy, but it doesn't look [ʃ]-like, really. It just isn't loud enough, for one thing. So, getting back to the F2, we're looking for something with a low F2 target. And lower than an alveolar target. Frankly, that's as far as I can get. I'd ask you to *consider* [f], and move on.
Small Capital I
[ɪ], IPA 319
Well, at least a typical vowel. Okay. F1 is low, but not incredibly low, so we're looking at a higher than mid vowel, but probably not just plain high. The F2 is moving, but it starts high and travels slightly higher. It never gets anywhere near the range of the [i] vowel or the front offlglide we've already seen. But it's definitely front. So this oculd be [e] or [ɪ], and it just ain't long enough to be [e]. Now look at those F2 and F3 transitions and move on to the next sound.
[k], IPA 109
Ya gotta love velar pinch. And even though it doesn't really look like a stop (I think the reason my velars never look stoppy may be my overlarge uvula--don't get me started), it's got to be velar, which doesn't leave many choices. Pretty clearly voceless. Burst is a little low in frequency, if that's what that is just shy of 900 msec, but whatever.
[s], IPA 132
Okay, what was that I was saying about 'not being loud enough'? Obviously I was mistaken. Here we've got broad band, relatively (I guess) high amplitude noise, so this is probably a fricative. And voiceless, of course. I'd guess [ʃ] due to the apparent low-frequency zero, but the energy doesn't seem concentrated in the visible/present low frequencies the way I expect [ʃ] to look. So if it's sibilant, it must be [s]. Which is consistent with the distribution of energy, but I'd be happier if the higher frequencies (above 4000 Hz) were clearly higher in amplitude than the lower frequencies. I mean, I think they are, but it's arguable.
[ð], IPA 131
Well, there's something here. It looks slightly gappy, but there's some noise in the higher frequencies, and there's some...something at 1500 Hz and something weirdly burst-transient like just after 1000 msec. I don't know what to make of this, except if it's fricative, it ain't sibiliant, and it doesn't really look like anything else. I mean, a post-fricative plosive ought to have more 'plosion' to it, just because of the airflow. I guess. So bear in mind this is English, and play the odds.
[æ], IPA 325
Well, a nice high F1, indicating a rather low vowel (though I think the vowel coming up is lower), but with a mostly neutral-looking F2. So it's not amazingly front, but then lowish front vowels aren't.
Superscript Glottal Stop + Lower-case T
[ʔt], IPA 113 (superscripted) + 103
Well, technically, the last bit of the preceding vowel is creaky voiced, but I think that's really a manifestation of final-stop glottalization. I suppose it could be a low boundary tone of somekind, but then I'd expect it to be less creaky and jitter-y. Anyway, if this is glottalization of the stop rather than creakiness of the vowel, I've decided to transcribe this as pre-glottalization. For which there technically is no IPA character, so I improvised. The F1 transitions in the previous vowel indicate approaching closure, and the F2 looks like it's falling. Which would suggest bilabial, except that the F3 is just sitting there. I don't know why that is. It might be that labialization really does only effect F2, or that it's stronger on F2 than F3, if there's something else effecting F3. Or maybe it's just a fluke. I think I hear a release, but I can't see on the spectrogram now. Maybe the alveolar closure is a figment of my imagination, or wishful thinking, cuz there ain't much evidence for it here.
Lower-case P + Right Superscript H
[pʰ], IPA 101 + 404
On the other hand, there's definitely a gap somewhere in here, and it's got a very broad-band release. Note that it's pretty even in amplitude across the frequency range, and has none of the [s]-shaped frication that the release around 600 msec had. So no double burst, apparently down-pointing out-transition in at least F3 and possibly F2, depending on how you read that aspiration, so I'd guess bilabial. There's close to 50 msecs before the voicing kicks on (from a hair before 1275 to a hair after 1300 msecs at the voicing bar, and a bit later for the energy in F2, which is usually what I use to mark the beginning of the vowel). So this is aspirated [p].
[ɑ], IPA 305
Ah, high F1, and an F2 as low as it can get. My kind of [ɑ].
[ɾ], IPA 124
Well, there's something there. It's very short, and kind of noisy, but there's something that has a vaguely [s]-shaped release phase as it approaches 1500 msec. Anything that short can only be some kind of flap, so there you go.
[h], IPA 144
Well, it's noisy, but it's organized into bands like a vowel. Like a voiceless vowel. Like an [h].
[o], IPA 307
Well, the F1 is not high. It's a little high of neutral, but it's pretty mid-looking. The F2 is nice and low. So this is middish and roundish, and I have a western US voice so I really only have vowel back there this could be. Not a heckuva lot of diphthongization either, although that might be the backing environment of the following dark [l].
Lower-case L + Mid Tilde
[l̴] , IPA 155 + 428 (209 composed, Unicode character precomposed as [ɫ])
Well, there's change in the amplitude somewhere around 1700 msecs. THe F1 is, well, wherever it is. The F2 is low, indicating either backness or rounding (or both, as in the preceding [o]), and that F3 is, well, sort of high. So since this isn't likely to be [w] (the F1 and F2 would both presumably lower as the offglide in a word like 'hoe'), I'll take the slightly high F3 as an indicator of the lateral. Okay, technically, the IPA regards this as an [l], with velarization (darkness), which is marked by the Mid Tilde diacritic. It even uses the dark-l symbol as the example for using the Mid Tilde diacritic. But for some bizarre reason, the IPA assigned the composed dark-l symbol a number (209), instead of leaving it as a combination of two independent symbols. And the Unicode standards people assigned a Unicode number to every numbered IPA symbol. So there you go. In my systems, both on-screen and printed, the precomposed symbol always looks better than the composed one, but it doesn't seem like it should.