There are different styles of reading--this left-to-right business is just how I do it for convenience. As time goes on, I'll be introducing other styles. One of the things that I always forget about, at least when sitting down to do these is the 'big' picture stuff. For instance, how many syllables (or at least vowels) are there in this? What evidence of segmentation do you see? Where? Can you see anything suggesting pitch peaks/lows, correlates of stress like amplitude, length, or pitch excursions? Once you've done that sort of thing,usually you go through and mark all the things that are obvious--the sibilants, the nasals, if you can see them, things that are obviously [i] or [a], that sort of thing. Then, once you've got the big picture, then you start in on specific cues.
Lower-Case A + Small Capital I
[aɪ], IPA 304 + 319
Well, from 100 msec to between 225 and 250 msec, there's a longish vowel, but with moving formants. The F1, the lowest one, starts around 800 or 900 Hz, and moving downward to near 500 Hz by the time you get to the end. So this vowel moves from very low to at least mid. The F2 starts very low as F2s go, especially relative to the high F1. So about 1250 Hz or so at the beginning, rising all the way to almost 2000 Hz by the end. So from relatively back, or at least back of central, moving very far forward. So at this point, you should have a pretty good idea of what this diphthong is. It's sort of interesting that the F1 has a failry steady state at the beginning, where the F2 doesn't, and the F2 has a fairly steady state at the end.
[m], IPA 114
From the end of the previous diphthong to somehwere past 300 msec, there's another segment Fully voiced throughout, suggesting [+voice]. Fully sonorant, i.e. with formants all the way up, suggesting something [+sonorant]. Sharp discontinuities on both sides, which is classically nasal (as the side (oral) cavity closes, leaving only the nasal cavity as the open channel, you suddenly get a sharp change in the acoustics. If you're lucky.) So this is probalby nasal. It's got what is probalby a zero around 1750 Hz or so (I'm not sure about the apparent narrow zero around 700 Hz, just because it continues for quite a ways--it looks like an artifact. But it seems to separate a pole around 1000 Hz, which if you know my voice is about right for a bilabial pole. There is an incling of something higher up, about 1300 Hz or something like that, which might be an indicator of a partial occlusion at the alveolar ridge or so, but I'll defer that to someone who actually knows something about the acoustics of these sorts of things.
Tilde L (Dark L)
[ɫ], IPA 209
I'll say it again, all North American /l/s are dark (velarized). True, some of them are darker than others, especially domain finally (syllable, word, phrase, etc.), but there ain't nothing light about this lateral. WHich we know is a lateral because of the F3, but I'm getting ahead of myself. 300-someting msec to about 400 msecs, where the F2 sharply changes, the amplitudes all weird out, and the F3 really kicks in. That's the extent of this thing we're looking at. Okay, F1, if that's what you want to call it, fairly low, indicating a fairly close constriction, so while not nasal, we're talking about something that is arguably a high vowel or closer. The F2 is moderately low, lower than for the beginning of the diphthong, so this is very back. As iin velar. So we've got two possibilities, something in the area of [w], and something in the area of [ɫ]. The difference, according to some sources, is the F3 or sometimes the F4. Typically, rounding will lower formants, so if we believe this is round, we'd want to see some lowering or at least not raising, of the upper formants. But the F3 of this is just high. At the edge of the nasal it's quite high, just about 3800-3900 Hz, and although it falls, it's still well above 3600 Hz before the amplitude starts to kick in for the next vowel. So this ain't round.
[i], IPA 301
Vowel. Starting from the release of the lateral, if that's what you want to call that moment up to 500 msec. But the last 50 msec or so are clearly transitional, so let's just worry about the apparent F2 extremum area. The F1 is pretty flat and fairly low throughout. The F2, as I've said has a maximum around 425 msec, at around 2250 Hz. That's freaking high for an F2, so this is about as front as this can get. So relatively high and outrageously front.
[v], IPA 129
Those diving transitions in F2, F3 and F4 (mirrored by rising trasnitions on the other side)! This is clearly labial. But not bilabial. English only has three bilabials, one is voiceless, and one is a nasal, so this could only be [b]. First of all, even though the overall amplitude is reduced here, the voicing is very strong and even throughout. And although a lot of my plosive closures are noisy, there not fricated the way this thing is. This is a fricative. But labial. Which for my variety of English only really leaves [v].
Small Capital I + Tilde
[ɪ̃], IPA 319 + 424
Well, the F1 seems almost mid-looking here, but at least this is not a low vowel. The F2 is really high again, but not so high as for the [i]. It's actually quite in the same range as the ofglide of that initial diphthong. So.
[ŋ], IPA 119
[tʰ], IPA 103 + 404
Well, some zero-ness starts to creep in really early, and then at about 700 msec the upper formants, or at least F4, just sjut off completely. And the other formants flatten out. So the zero is evidence enough of nasality, I suppose. The zero is right where you'd hope to see a pole, so the pole is either that shadow thing at about 800 Hz, which I'm suspicious of, just because there seems to be a lot of stuff just about that frequency, especially something that might be a harmonic in the [i]. The other candidate, which is much stronger, is up arround 2400-2500 Hz. Which is pretty high. But notice that the F2 and F3 transitions in the preceding vowel point right at it. So that's probably our puppy. The joint F2/F3 thing looks like velar pinch, albeit very front velar pinch, so this is probably velar.
[tʰ], IPA 103 + 404
Well, there's 25-50 msec of gap, which is plenty to count, after a nasal. Now if you're expecting this to be velar based on a blind faith in nasal place assimilation you're going to be disappointed. Because the release of this thing is not in the least velar-looking. It's broad band, quite sharp, and strongest in the highest frequencies (which in this context means between 3000 and something higher). The spectrum is strongest in the higher frequencies, well above the 'formant zone'. So this is basically an [s]-shaped release. Which makes this an alveolar. And voiceless, and probably aspirated, due to a VOT of about 50 msec. At least. Which is not outrageously long for a VOT, but long enough to count as aspirated. TMSAISTI.
[ə], IPA 322
Then there's a pulse or two of vowel. <snooze>
[m], IPA 114
Hey, another one of these. This one starts at about 850 msec, and lasts to about 925 msec. But otherwise it's pretty much the same as the previous [m], though a little weaker.
You'll notice there's full voicing and sonorance for 300 msecs starting around 925 msec. WHich is just too long to be one segment. And the formants are too mobile to be reflecting a single target. So before going further, think about how many segments there are here, and if you can't find the edges of each, can you find moments you want to call the 'center/re' of each? Go ahead. I'll wait.
Okay, so there's the F1 peak near the beginning; the moment where the F3 is lowest and the F2 is highest, around 1050 msec; and there's the funny dip in F2 at about 1125 msec. Those are the 'moments' I'm going to consider as evidence of at least three things in this stretch. So, on to the first.
Turned Script A
[ɒ], IPA 313
I never get to use this vowel. But thre it is. It's just possible this is round, and not just transitioning from-and-to roundness. So I transcribed it as round. Sue me. The F1 is high, the F2 is very, very low.
[ɹ], IPA 151
Okay, so the F3 is dipping below 2000 Hz. Barely, but there you go, if that's good enough for you.
[n], IPA 307
Okay, so the F1 has been falling fairly steadily since that moment in the [ɒ]. So this ain't low. But it doesn't seem to be heading down to well below 500 Hz as the hig vowels earlier on did. So this ain't particularly high. So probably mid. At least round here. F2 is low, as indicated at that dip. F3 doesn't tell us much. So mid and back. Only a couple of possibilities, and even on a good day one is probably better.
[n], IPA 116
So we've got something fully voiced and sonorant, but with zeroes, as before. So this is probably another nasal. But notice the polse. The bilabial poll was around 1000 Hz or just lower. This is definitely higher. And that harmonic/shadow thing just above it maybe means it's even a little higher--maybe there's just a harominc space there that makes it look like the edge of the pole is lower. But I don't know. Spectrally, this just ain't the same animal as any of the preceding nasals, which were bilabial, bilabial, and velar, respectively. So this must be something else. Few options, at least for English.
Lower-Case A + Small Capital I
[aɪ], IPA 304 + 319
There's another of these. What's odd is that although spectrally this is very similar to the initial diphthong, dynamically it's completely different. There's a nice clear staedy state in both F1 and F2, and the movement just doesn't make it as far, at least not until after the voicing starts to go. Maybe it does. Maybe it don't. But it's still something near [a], or [a], followed by something near [ɪ].
[t], IPA 103
The cruddy voicing towards the end of the preceding vowel is probably a combination of low pitch and glottalization. The glottalization probably tells us that this sound is probably a voiceless plosive underlyingly, and the release (and maybe the transitions) suggest a nice alveolar again.