[s], IPA 132
Now this is what I'm talking about. Ignore the noise at the bottom. I think that's just me blowing into the microphone. But the noise above that is definitely weaker (it gets stronger over time as presumably the airflow picks up approaching voicing) but gets stronger as you go up in frequency. Seems to occupy one very wide band rather than being shaped by a the vocal tract, so this is a classic sibilant. Probably alveolar. And voiceless.
[ʌ], IPA 314
A while ago I swore to stop using this vowel, but this one really looks mid and back, rather than central and low (compare, well, it's coming later). What I mean is, this one has an F1 that starts and ends around, well, 600-700 Hz, and tops out probably below 800 Hz. So not as high as it could be, but definitely higher than absolutely mid, so we're dealing with a mid-open or an higher-low sort vowel. The F2 is actually quite low, around, oh, 1100-1200 Hz or thereabouts, indicating a quite back, or quite round, or both, vowel. Well, back there, there aren't a lot of vowels to choose from.
[m], IPA 114
From just before 300 msec to around 400 msec or so (ignoring for that other moment at around 375msec where 'something' happens) there's a longish stretch of voicing. The resonances above that indicate sonorance, but their relative weakness suggests a side chamber. So we're dealing with a nasal or something. Concentrating of the first two thirds or so of this stretch of voicing, the zero appears around 700-800 Hz, and the main resonance just above that. It's moving up in frequency a little, but let's pretend it's not and just say it's around 1000 Hz. WHich is the perfect location for a bilabial [m] resonance. The thing that happens after 375 msec suggests something has changed in the side chamber. So looking just at that bit, the zero has raised in frequency a little, and widened in bandwidth a touch, and the resonance above is somewhat higher, let's say 1300 Hz. Which is definitely not 'around' 1000 Hz, and closer to where I'd expect an aveolar [n] to have its resonance. This is a clue to the upcoming plosive, which I think is alveolar (oops, I gave it away), and an alveolar closure happens here in the middle of the nasal. Not sure this would have been audible, or if audible if it would have been perceivable, but whatever. I think that's what happened.
[tʰ], IPA 103 + 404
Lower-case T + Right Superscript H
Which brings us to this burst at 450 msec or so. The nasal resonance starts to fall apart just before the 400 msec mark, and the voicing falls apart a bit later, so there's a tiny little spec of noisy background noise (I guess that's what that is), that clearly isn't voicing, and there's a strong burst followed by a longish VOT. Proper voicing doesnt' really kick in in F2 (which for some reason is where we usually look for it, until you get to almost 500 msec, so we have at least 50 msec of voicing. So this is aspirated (and in English, so presumably voiceless, as aforesuggested). If we weren't sure it was alveolar because of what happens in the nasal, the burst noise is decidedly [s] shaped, i.e. broadband, strong, centered in the highest frequencies, which tells us this is an alveolar release.
[ɑɪ], IPA 305 + 319
Script A + Small Capital I
So when we get past the VOT, we have a moment where the F1 is still kind of flat and the the F2 reaches a mimimum, somewhere around the 550 msec mark. The F1 is way high, indicating an incredibly low (open) vowel. The F2 is very low, indicating something very, very back (or round). So here we have a very low, very back vowel. Following that moment, the F1 drops a little, suggesting a bit of raising, and the F2 starts to raise, indicating fronting. So this is a 'falling diphthong', meaning it starts low (open, or 'high sonority') and moves higher (close, or 'lower sonority', falling = high-to-low sonority), and which I would call a 'low-fronting diphthong', i.e. something of the [aɪ] family, starting low and moving frontwards.
[m], IPA 114
And here we have something voiced. Not a lot of evidenece of resonance above the voicing bar, but that nice strong voicing bar doesn't look like voicing during a closure, so this isn't likely to be a voiced stop. Call it a weak nasal and move on, except those lowering transitions in the last glottal pulse or two moving into the nasal make it look more bilabial than anything else.
[z], IPA 133
Hmm. Well. Voiced? Possibly. Fricative? Almost definitely, but it's kind of weak. Waht there is, between 750 and 800 msec (or thereabouts) is very high frequency, but weak. So this is probably /z/, if not actually [z].
[ð], IPA 131
On the other appendage, something defrerent is happening right around 800 msec. The very high frequency (for some value of 'very') noise disappears, or at least lessens a lot, and filtered (i.e. in bands) energy takes its place, in places roughly analogous to F1, F2 and F3. Still a little noisy, and still at least plausibly voiced. So this is fairly open in articulation. The transitions in the vowel look vaguely bilabial, but the noise doesn't match up with them. So I think taking the noise on its own (at around 500 Hz, 1600 Hz and, well, somewhere in the 2300-2400 Hz range, look sort of alveolar. Conceivably. So split the difference and go for dental. Which leaves something interdental or labiodental. Moving on.
[eɪ], IPA 302 + 319
Lower-case E + Small Capital I
So that transition probably isn't 'transition', it's the onset frequenc of F2. So we've got something with a mid-to-higher-mid frequency F2, suggesting something sort of front, moving up, suggesting that it moves fronter. The F1 starts sort of mid (around 500 Hz) and may move down, suggesting decreasing sonority. So waht we have here is a diphthongy thing of some kind, starting mid and front and moving fronter.
[ɦ], IPA 147
On the other hand, something definitely happens as the F2 reaches its maximum. See how the F3 fuzzes out, and the enrgy below F2 kind of dies, even in F1? So something's afoot, as they say. But still articulated as a vowel, for the most part, but noisy? How can something that noisy still be that open? If it's glottal noise, that's how. But it remains voiced, so that's how I transcribed it.
[æ], IPA 325
When the voicing comes back on in the lower frequencies, the F1 has clearly raised to (somewhere around) 700-800 Hz. Which is clearly higher than many vowels, though not quite as high as it could go. So we're talking about a fairly low vowel. F2 is sort of front-to-central, which given how low the vowel is suggests something fairly front. Work with me on this one.
[ʋ], IPA 150
Well, we have a longish gap beginning just before 1100 msec and going on to the release at 1200 msec. But the first bit is clearly voiced. Could be perserverative voicing, but, well, could be not. It's a longish gap to suppose it's just one thing, and the transitiongs in (all pointing down) don't jive with the alveolar looking release (tilted toward [s]), so I'm going to suppose that we've got something labial, and voiced at the beginning. [b] perhaps. Frictionless, certainly. And it will turn out not a stop. but there you go.
[t], IPA 103
Lower-case T So like I said, this clearly has an alveolar release, so this must be a [t]. Whether it's a /t/ or a /d/ remains to be seen. But English isn't famous for its /bt/ clusters. So assuming there's a syllable break in here, this has to be an unaspirated, voiceless plosive, i.e. [t], i.e. /d/ in an initial position.
[o], IPA 307
Vowel. F1 is around 500 Hz, so probably mid. F2 starts sort of central and trends down. So central but moving backer and/or rounder. F1 isn't moving a whole lot and F3 isn't telling us anything useful. So ths is some kind of /o/. It's a big creepy that F1 loses its, well, F-iness and the whole thing sort of flattens out (except for F4) making it look like an approximant. But whatever. I take the F1 thing to be an indication of increasing nasality, given...
[n], IPA 116
... the following nasal. Which looks fully voiced, in spite of not having a lot in the way of resonances above the zero. And the F3 and F4 transitions are pointing down, as if it were bilabial (it can't be velar, be cause F2 doesn't seem to be rising at all to 'pinch' with F3). But the transitions on the other side are, well, ambiguous. F2 may be rising, but F3, if anything is falling. F3 and F4 may be pinching, and that sometimes means something, but exactly what is in dispute. So it's probably a nasal, and it will turn out to be alveolar, but at this point I have no idea why. It is both a science and an art, folks.
[ɐ], IPA 324
So here's what I mean about this vowel. This doesn't seem to be a reduced vowel to me. I think this word is a compound, or at least acts like a compound, regardless of spelling. And this is a fairly low and vaguely central vowel. It is certainly lower and more central than the [ʌ] at the beginning of this spectrogram. So I transcribed it as lowish and centralish. Moving on.
[t], IPA 103
Gap. Not much in the way of voicing. Can't tell much from the transitions since the energy in the vowel starts to die halfway through the vowel. So try any plosive you like until you find one that makes a word.
[s], IPA 132
This would be easy to miss, but there's definitely some high(er) frequency noise in the spectrogram, suggesting a weak [s]. Presumably phonologically voiceless to go with the preceding plosive. Presumably a plural marker.