Lower Case W
[w], IPA 170
Well, the voicing starts at about 75 msec, but the upper formants don't really kick on for another 50 msec or so. So there's something here, something less 'open' than a vowel. But it doesn't look gappy or fricativey, so that leaves nasal or approximant. The fact that the F1 is 'full' and not damped in any obvious way suggests approximant. The F2 starts very (very, very) low, so this can't be a [j]. The F3 doesn't seem to be doing anything. Certainly not low enough to be an [ɹ], it doesn't look raised either. So it looks like back and round is the only good choice.
[i], IPA 301
So even though the F2 is apparently being pulled down on both sides, from the front by the preceding [w] and on the right by whatever that transition is doing, it reaches an extremum (in this case a maximum) well above 2000 Hz, mayb eeven 2200 Hz. Whenever you see a male voice with an F2 above 2200 Hz, it can only really be an [i].
So the astute spectrogram reader will be saying to itself, "[wi]. Hmm. And this is a declarative English sentence, so I'm probably looking for some kind of NP at the beginning. Hmm."
Lower-case P + Right Superscript H
[pʰ], IPA 101 + 404
Well, we've got a gap--a suddent cessation of resonance at all (or almost all) frequencies. There's some residual voicing in the low frequencies, but we can ignore that, and there's a little noise from somewhere near F3, but not really enough to make us pay attention too much. This is obviously some kind of plosive. So check the transitions. F1 doesn't tell us much, F2 is ambiguous--it's dropping at the left and rising at the right, but it's not pointing at a frequency low enough to be clearly clear of the alveolar locus, which for me is about 1700 or 1800 Hz. The F3 on the left isn't doing much, and the F4 is rising. On the right, the F3 and F4 seem to be rising. So on the right we've got things entirely consistent with a bilabial closure, and on the left we have, well, ambiguity. So I'll take the bilabial and run with it at least until I can't make a word out of it, or can't make a word with it that makes sense with anything else. Note the VOT, so this is apsirated.
Lower-case E + Small Capital I
[eɪ], IPA 302 + 319
Well, F1 is sort of mid, I guess, and moves slightly lower, suggesting a mid vowel that moves toward high. The F2 starts very high, indicating a front vowel, and moves up, indicating fronter. So [eɪ] is the best bet.
[d], IPA 104
Ah, another gap. But this one should strike you as very long. So maybe something is going on here. Look at that voicing. It's strong, as if it was really voicing and not just perseveration of the vowel's voicing. So this might be a voiced stop. Or part of this might be a voiced stop. I arbitrarily segmented the gap along with the voicing, just cuz, but that means we have to look at only the left-side transitions for a cue to place of this stop (since any right-side transitions will be covered up by the proposed following stop). So the F2 seems to be rising, but that last pulse looks like it's dropped al ittle. F3 doesn't seem to be doing much. F4 seems to be rising, sort of, but I'm not sure what that means. Well, at least we know it's voiced. Probably not velar. Not amazingly labial looking, and statistically [d] is more likely than [b] post-vocalically anyway.
So now the astute spectrogram reader (hereafter to be known as the ASR) will be thinking, "[wi] might be a pronoun, which might be a good subject, and now we have [pʰeɪd], which might just make a decent verb. Hmm."
Lower-case T + Right Superscript H
[tʰ], IPA 103 + 404
Small Capital I + Upsilon
[ɪʊ], IPA 319 + 321
The ASR at this point will be recalling that in my west-coast USA voice, /u/ is not particularly round or back, and following coronals I will have that merged /u-ju/ thing. And this is almost definitely post-coronal.
[m], IPA 114
Well, there's an abrupt discontinuity as I mentioned before, one that involves reduced amplitude and steadying of resonant frequencies for not quite 100 msec, when about 875 msec or so there's a 'symmetrical' moment, where the amplitude and formant movement suddenly start up again. So we've got something resonant, but of reduced amplitude (compared to the surrounding vowels). And unlike your average approximant, the edges are quite sharp, and there's no movement or anything happening during the 'closure'. Which is pretty good indication of a nasal. The transitions all suggest labial, as does the relatively low pole, or whatever that is, at about 800 or 900 Hz. (My coronal pole is closer to 1000 Hz.)
[ʌ], IPA 314
Well, without being distracted by the voicing bar, the F1 here is moving up from a middish kind of vowel to something that is pretty definitely low. THe movement may again just be transition from the preceding labial, but whatever. The F2 is defintely low to start with and moves, well, to the mid-range. F3 and F4 just don't tell us much. So this is a mid-to-lowish vowel of indeterminate back-to-centralness. How's that for a description?
[t], IPA 103
Gap. Plosive. Rising F2, but not much indication of a lowering F3, I guess, so whatever this is it probably ain't labial and it probably ain't velar. That and the release looks sibilant again.
[ʃ], IPA 134
[f], IPA 128
The ASR will be going crazing trying to make a quantity (that can be 'paid') out of the sequence [t] vowel [m] vowel [t] fricative [f], until it starts to sound it out.
Turned R + Syllabicity Mark
[ɹ̩], IPA 151 + 431
Well, look at that F3. Down there below 1800. Way low. Must be an /r/. Since it doesn't seem to have a steady state, it doesn't really look syllabic, but if I transcribed a vowel in here it would have to go on the wrong side, so I'm doing some finessing here. There's an /r/. There may be another vowel heading into that flappy thing, but, well, try making a word out of.
[ð], IPA 131
Turned V + Upsilon
[ʌʊ], IPA 314 + 321
Lower-case Z + Under-Ring
[z̥], IPA 133 + 402
Ah, an [s]. It's broad band, it's concentrated in the very high frequencies. But it's kinda short for something that's being phrase finally lengthened. And it's kind of weak for a phrase-finally boost. So maybe this is a [z], but devoiced. Which is what it is. This would explain a) the obvious lack of voicing, b) the weird length (short because its voiced and lengthened from short because it's phrase final) and c) the incredible lengthening of the preceding vowel. Calling this a devoiced (or voiceless) [z] is what we call an 'elegant solution'. Yeehaw.
So, ASR, what did you come up with?