To repeat: There are different styles of reading--this left-to-right business is just how I do it for convenience. As time goes on, I'll be introducing other styles. One of the things that I always forget about, at least when sitting down to do these is the 'big' picture stuff. For instance, how many syllables (or at least vowels) are there in this? What evidence of segmentation do you see? Where? Can you see anything suggesting pitch peaks/lows, correlates of stress like amplitude, length, or pitch excursions? Once you've done that sort of thing,usually you go through and mark all the things that are obvious--the sibilants, the nasals, if you can see them, things that are obviously [i] or [a], that sort of thing. Then, once you've got the big picture, then you start in on specific cues.
Lower-Case A + Small Capital I
[aɪ], IPA 304 + 319
I'm not sure what's going on in the voicing from the onset (around 125 msec) for the first 25 msec or so. Maybe it's residual glottalization. There's not much other explanation for what's going on here. And trying to force a consonant here won't get you very far--it would need to be voiceless, noisy, but have formants. Not something that is easy to do. Except for [h], you don't have many options, and it's tough to make a word, let alone a sentence, with an initial [h] here. So ignoring it, the F1 starts around 800-900 Hz or so, and falls. Low low low but moving up. The F2 starts around 1250 Hz or so and rises. So back back back (and probably not so round round round, since the f3 is a little high at the beginning at least) and moving forward. Sounds like a failry classic /aj/ dipthong. I go back and forth about transcribing these. In my voice, this can only have an [ɑ] as its nucleus.
Lower-Case K + Superscript Lower-case H
[kʰ], IPA 109 + 404
Well, it's mushy, as my (especially velar) stops are, but okay. The voicing preseverates just a little into the gap, but at least in the low frequencies there's a serious loss of energy starting at about 250 msec and going on to almost 325 msec. The transitions into and out of this gap clearly evidence velar pinch. The double burst is also a pretty good pointer to velar. I'm not sure now whether that VOT is really long enough to count as aspirated, but it's clearly a voiceless release.
[ə], IPA 322
My favo(u)rite. about 25 msec of vowel. Too short to do anything with, too long to ignore. Must be reduced due to extreme stresslessness. Funny how something so short can still get a pitch accent on it, tho. Hmm.
[n], IPA 116
Well, what's really interesting is this almost 100 msec of consonant. Sonorant, with formants and everything, and obviously very fully voiced, this clearly has less energy than your typical vowel. So it must have some kind of closure somewhere. But it's *long*. Okay, so this is probably a good candidate for a nasal. I just wish it had an obvious zero. Well, up near 3000, but that doesn't really count by itself. Oh well. The F1 is clearly being depleted by something. Let's call it a weak zero... and then the pole is nice and high, as poles go, up around 1500 Hz. So that's a good indication of an alveolar nasal.
[c], IPA 104
Well, really the only evidence of an obstruent moment here is the transient--be it release, burst, or clunk (that being the technical term for a moment like this that we would otherwise choose to ignore). But there it is, and if it isn't a clunk, we have to explain it. As I've observed before in these things, my (especially homorganic) nasal-plosive sequences tend to look like this, using Steriade's Aperture Theory, a nasal closure with an oral-looking release. So this is some kind of oral release. It looks slightly velar, concentrated in the middle frequencies rather than the upper frequencies (as would be more typical of an alveolar). But then it would be tough to make word out of. The transitions are not amazingly helpful, in that the preceding sound is a nasal, and the following is an /r/, which perturbs the formants beyond all useful visual cues. So know there's a plosive here and move on.
[ɹ], IPA 151
Well, this is perhaps the lowest F3 I've ever produced. So there you go. The F1 is very low (very close articulation), the F2 is quite low. And F3 is the so low it's threatening to move into the low *F2* range. Can't get any lower than that. Must be an /r/.
Lower-Case A + Small Capital I
[aɪ], IPA 304 + 319
This is rough. There's the transitions to deal with, and then you have to figure out where the formants are 'supposed' to be. So abstracting away from the /r/, it looks like the F1 reaches a maximum (sometimes we call these 'turning points' or 'target points'. I usually just refer to a local 'extremum') of about 800 Hz around 575 msec or so. So there's a low target here somewhere. The F2 at that moment is relatively low. So at that moment, we have a backish lowish vowel. Then the formants move. F1 drops. But it may do so just because it's approaching a closure. F2 rises and drops, so we have to account for the rising at least (the dropping may either be a 'cue' or it may just be transition. So we've got a lowish, backish vowel that moves (at least slightly forward). So if this is a diphthong, it must be approaching a front offglide. The F3 ends a little low, which combined with the low F4 and the diving F2 transition, suggests in general a labial transition. So that's something else to abstract from. Either this is some kind of low vowel, or this is a low vowel with a highish frontish offglide. Guess which.
[v], IPA 129
So I've already given it away. THere's clearly labial transitions moving into this bit of noise. And there is a very short bit of oral fricative here. So it must be labial. This being English, it's labiodental.
[h], IPA 146
On the other hand it opens immediately into something a bit louder, voiceless, but with evidence of formant structure. Classic [h] stuff.
[n], IPA 307
So when the voicing finally cicks on, we've got sort of a problem. I'm not entirely sure where F1 is. The F2 is that bit just under 1000 Hz. There's no zero-ish looking thing below it, so the F1 can't be jammed up too close to the F2, but it's not so low as to disappear into the voicing bar. So this isn't low and isn't high. Well, we have some mid vowels to play with. Note the lack of evidence of an offglide. For once. Don't let anyone tell you that 'tense' mid vowels in English are *always* anything. Whether they are or not is an empirical question.
[m], IPA 114
So starting around 875msec and going on for about 50 msec or so, there's this thing where the amplitude falls off. So there's something here. There's a pole more or less where the F2 in the preceding [o] was, but the F1 sort of disappears. There's a move to transition up for both formants, if that's what they are once the amplitude starts to kick back on, so all we're really looking at is this short lower amplitude bit. Which is lower amplitude because it's a nasal. And the pole is that F2 thing, down just below 1000 Hz, which is pretty typical of my bilabial nasals.
[E6], IPA 325
so again we have to abstract away from some odd transitions. The F1 doesn't reach its extremum until about 1000 msec, but it's a high, so this must be a fairly low vowel. But the slope of the movement toward the extremum doesn't look only transitional, so I wonder if there isn't another target floating around there. Then again, most English diphthongs don't have *low* offglides, so I'm probably just dreaming. The F2 is fairly flat. It's maximum occurs sort of early, and falls very slowly over the course of the vowel. There's evidence in the last 25 msec or so of downward pointing transitions (labial again), so maybe that trend in F2 is just transitional. But maybe not. So we've got something mid-to-low and centralish, and it's sort of long, so it might be moving to definitely low and very slightly back. Backer than front, but not flat out back, especially compared with the nuclei of the earlier diphthongs. So this is probably a vaguely front, but very low, vowel.
[f], IPA 128
So again we have something labial looking, this time with a burst. Which would make it a plosive of some kind. But the voicing or noise or whatever it is at the bottom is a little loud and a little flat to go with a truly closed plosive. Then again, you know my plosives are often sort of mushy. So as plosive as this looks, I'm going to suggest that it's worth paying attention to the teeny weeny, almost imaginary, bit of noise up in the very high frequencies, and suggest this is fricative. That and it makes a better word. This explains the noise, and the greater duration and frequency of the noise at the bottom, compared to what I want to call noise or perseverative voicing or whatever it is in that thing earlier I wanted to be a [k]. There's a lot of wishing going on in this spectrogram. But oh well. The bursty thing then isn't a burst, is a noisy transition between the [f] and the [t] closure. Which if you do it a few times, can get really sharp (noisy and short?). And the fact that it's sharpest in the mid frequencies I attribute to the high-attenuating properties of the labiality with the short front cavity formed by the closure at the alveolar ridge. TMSAISTI.
[tʰ], IPA 103 + 404
Well, here's gap. Even ignoring the 'explanation' above, there's a gap, a release, and some fairly significant aspiration noise. So there's voiceless aspirated plosive here. The noise is broad band, and very loud in the very high frequencies. Pretty classically alveolar. There's another concentration of energy in the mid frequencies, but I attribute that to the following context....
Turned R + Syllabicity Mark
[ɹ̩], IPA 151 + 431
Well, this is a more classic /r/. Nice mid-to-low F1, F2 sort of sitting there up against the low low F3. Flat as the prairies. Syllabic /r/.
[c], IPA 104
At last, a gap that looks like a gap and is one. There's some serious voicing going on, but it's properly attenuated, as if it were being produced in a closed space. Woo hoo. Okay, so about place. Nice sharp release. But that's about it. It looks a little velar, depending on whether or not you think that's velar pinch in the offset. The onset transitions don't look velar at all. The F4 is diving, who knows why. The F3 and F2 are basically just sitting there. So it's ambiguously velar. Which turns out to mean ambiguously not velar. IF it isn't velar, it must be alveolar, since there's no way those F3 transitions on either side look bilabial. Watch me be wrong about the next set of bilabial transitions that come up...
[ɑ], IPA 305
Well, the F1 extremum occurs sort of late in the thing I've marked off as the vowel (that mark is based on the weird amplitude/pitch thing that happens about where I've marked it, which is as arbitrary as anything else). But this is fairly flat F1 anyway, so oh well. It's a quite low vowel. The F2 extremum (a minimum this time) occurs more or less at the same moment, so it's a good bet that it means something. So this is very low and very back. And r-colo(u)red, judging by the F3, if that sort of thing matters to you.
[ɹ], IPA 151
Another low F3 deal. Not much else to say. Except look at those F2 and F3 transitions into the following stop.
[k], IPA 109