After we do the segmental stuff, we'll talk about the prosody.
[p ʰ], IPA 101 + 404
Lower-case P + Right Superscript H
Initial stops are hard to place, in part because you only get one set of cues rather than transitions into and out of it, the transitions you do get are often lost in release noise and aspiration, and, well, they're hard. So the sharp onset of energy at about 75 msec is a clue that we've had a plosive release, rather than just something starting with aspiration. In the best case, there'd be a nice release transient, but we can't always be that lucky. The absence of such a release burst might indicate something non-alveolar, since alveolars often have such releases. The sharpness of the release suggests something non-velar since velar releases are often slushier. Which leaves labial. So the transitions don't tell us much, but that really low frequency blip might also suggest bilabial. So on the balance, I'd say bilabial. But that's just a hypothesis. Aspirated, obviously. Aspiration that length is often indicative of a second-in-cluster approximant, but here I think it's just initial fortition.
[i˞], IPA 301 + 419
Lower-case I + Rhoticity Sign
Well the transitions through the aspiration suggest a very high F2 target (higher than the F2 locus of the transition in the aspiration, quickly falling. The only thing with an F2 that high is [i]. The falling is probably forced by the following segment's F3 target, hence the rhoticity mark. Oh, notice the F1, clearly separated from the voicing bar. Starts lower than neutral, and moves towards about 500 Hz s the F2 falls to its minimum.
[ɹ], IPA 151
There's two things here (about 250 msec) to notice. THe first is the F2 (and F3) minimum, i.e. there's a place here where the F3 changes direction and the F2 flattens out. The F1 fuzzes out a bit before this moment, but I'll take it to all be part of the same approximant moment. There's also the decreasing energy right up until this moment where the energy in all formants clicks back on and F1 and F2 flatten out and F3 definitely starts to rise again. This is one of those 'moments' that we hang a lot of stuff on. The low F3 is the giveaway here. Gotta be an [ɹ].
[ɨ], IPA 317
Well, one can see even in the spectrogram that this is very low pitch ( though th efollowing vowel is even lower, which means that it's likely that this vowel is unstressed relative to the first vowel. It looks mid, or slightly high of mid (F1 near or just below 500 Hz. F2 is pretty neutral (1500 or just above), and the F2 is sort of low, but it's still 'recovering' from being pulled down for the /r/. So this is sort of an r-colo(u)red schwa. Applying Keating et al (1994)'s F2-F3 distance metrick blindly, I transcribed this as barred-i.
[m], IPA 114
Ah, a nice nasal. Clearly voiced and sonorant, but of overall less intensity than the surrounding vowels, with nice sharp edges, and flat formant structure. Knowing this is my voice, I'd have said this was an alveolar [n], since the pole is a little above the 1000-1100 Hz range, where I expect bilabials to be. But look at those F2 and F3 transitions, clearly pointing down into the nasal (even the overall 'rising' F3 seems to hump down a little right at the onset of the nasal). Downy-pointing transitions have to be bilabial.
[ɨ], IPA 317
Very low pitch again (look at those striations--countable! even at this timescale. F1 is middish again, F2 is a little high, and F3 is neutral. Barred-i again.
[d], IPA 104
Plosive. Fully voiced, but the voicign bar is of lower amplitude than usual. No resonances. The F2 transition is pointin gdonw, but not quite as low as for the bilabial. F3 is just hanging there. F4 is actually rising. Rising F3 or F4 is often a cue to alveolar, especially if there's no hint of F2/F3 pinch. So we've got a voiced alveolar plosive, whit a weakish release at about 600 msec.
[z̥], IPA 133 + 402
Lower-case Z + Under-Ring
I decided this was voiceless, that noise at the bottom being noise, but I suppose that could be evidence of very ragged, weak voicing. The noise segment is short, which is usually a cue for underlying voicing, and mostly in the high frequencies, a cue for sibilance.
[ɨ˞], IPA 317 + 419
Barred I + Rhoticity Sign
I'm throwing these rhoticity signs around like mad, and I usually don't. But THe F3s are interfering with the interpretation of the F2-F3 distance thing that Keating et al (1994) recommend for transcribing schwa vs. barred-i, and I'm heding my bets. This vowel is too short to worry too much about, so there you go.
[ɹ], IPA 151
The /r/ here is required to explain this falling F3. Look at that F3. Almost 'pinch'ing into the F2. Hmm.
[kʰ], IPA 109 + 404
Lower-case K + Right Superscript H
See that double burst just ahead of 800 msec? That's a double burst, usually a very good indicator of a velar release. The fact that it's strongest in F2 or F3 is another, although the strength in the lower (than F2) frequencies might indicate bilabial. The F2 center, in the burst, is at about 1300 Hz and is definitely falling. The F3 in the burst is just above 2000 Hz, and is definitely rising. So we've have evidence of velar pinch on both sides of this plosive. So probably velar. Also consistent with this is the really long aspiration.
[ɑ], IPA 305
F1 and F2 are both sort of straddling 1000 Hz, which is pretty typical of low-back [ɑ]. F1 as high as it can get (considering the F2); F2 as low as it can get (considering the F1). Lowest and backest.
[m], IPA 114
Another nice nasal. This one obligingly has a clear resonance around 1000 Hz, so it must be a bilabial.
[ə], IPA 322
Short little, low pitched and probably stressless vowel. Call it schwa and move on.
[n], IPA 116
And here's another nasal. This one obligingly with a nice high resonance and basically nothing at 1000 Hz. Must be alveolar.
[ʃ], IPA 134
Sibilant fricative--look at all that high-amplitude noise. Notice it's darkest in the mid frequency range, down to F2, where it dies suddenly. The zero (or whatever it is) below F2, along with the relatively low center of gravity in the noise (relatively low compared to a very high-frequency centered [s]), is a pretty good cue for [ʃ].
[eɪ], IPA 302 + 319
Lower-case E + Small Capital I
The amplitude discontinuity at about 1400 msec is probably just my voice slipping from modal voice to fry or something, due to the low pitch, rather than going from oral to nasal, or vowel to approximant. Although in a sense I do go from vowel to approximant. The point is, it's not a nasal, in spite of how it looks. So looking at the F2, it starts a little high of neutral moves to neutral/mid fairly quickly. And stays there. The F2 starts quite front and moves fronter (higher). F3 is pretty flat and neutral. So this is middish vowel that starts front and moves fronter.
[p], IPA 101
The loss of voicing makes the transitiongs hard to see. All we really know is that there's a gap starting at about 1500 mxec and lasting to just shy of 1600 msec. That's quite a gap, all things considered. Apparently voicelss--at most two pulses of perseverative voicing, depending on exactly when you think the closure occurred. The release burst is a little ambiguous--it looks like a [t] burst, in terms of having a sibilant component, but then the following fricative is sibilant as well, so that might just be coproduction. The noisy blip at the bottom is a little worrying, since it's sort of like the noisy blip at the bottom of the initial realease in this utterance. Which, if I recall, I took then to be evidence of bilabial release, at least on the strength of knowing what the spectrogram was. So we've got something that probably isn't velar, could conceivably be either alveolar or bilabial, and at least it doesn't have the full-spectrum sharp release often associated with alveolars. So probably bilabial, but maybe we should just say 'voiceless stop' until we can get some lexical access in here.
[s], IPA 132
Well, this is a nice looking [s]. It's clearly noisy, and fairly high amplitude, strongest in the highest frequencies and apparently centered off the top of this spectrogrma, so well above 4000 Hz. And it forms a signle broad band, trailing off into the lwo frequencies (and not sharply shutting off below F2, like the previous [ʃ]. So this is probably an [s].
Okay, so let's talk prosody.
The Tones and Break Indices (ToBI) system is a set of notation conventions that are can be adapated for use in describing and analyzing the intonational patterns in a language. A ToBI transcription contains a pitch track, an 'orthographic tier', with a transcription time-aligned with the pitch track, a 'tone tier' with tonal autosegments indicated, and 'break index tier' indicating juncture. In my version, I time align everything to a spectrogram. I replace the orthography with phonetic transcription, time aligned with spectrographic landmarks rather than just word edges. I put tones and break indexes on the same tier, mostly to save space.
English conventions, broadly, recognize four levels of break index--roughly 0 for clitic boundaries, 1 for word boundaries, 3 for phase boundaries and 4 four utterance boundaries (aligned with the right edge, or 'end' of the constituent). 2s are used for 'anomalous' junctures--disfluencies, things that feel like phrase ends but don't get phrase-appropriate tone marks, things that have phrase-appropriate tone marks but don't have phrase appropriate timing, that sort of thing. The assumption is that these mark the right edges of strictly layered prosodic groupings (so a 4 corresponds to a 3 and a 1 simultaneously, since the end of an utterance must also be the end of a phrase and the end of a prosodic word).
In English, there are a number of * tones, notably H*, H*-L and *L, where the *ed autosegment is aligned (usually) with the stressed syllable of a prosodic word, so you'll usually get one for every non-0 BI (unless there's some deaccentuation or something under focus, or something like that). % tones (boundary tones) usually align with the right edge of a 3 or 4 BI, i.e. marking the boundary of the phrasal constituent. In English, the assumption is that boundary tones align to the BI, but 'spread' leftward to the end of the * tone.
The difference between H* followed by a L%, and a H*+L complex tone is subtle, and I chose to mark H*-L as sort of a cheat. In some ToBI systems, - tones are associated with phrasal boundaries (i.e. 3s) instead of % tones (limited to utterance or 4 boundaries). All three lexical words in this utterance seem to have the same tone pattern--the last one has a low final tone (from the utterance-final 4), and the first one can have a low -L associated with the 3 (which also accounts for the relative lenght of the last syllable), but the L second syllable of "common" is unaccountable. So I declared that one an H*+L. But that felt arbitrary. So I sort of compromised.
If you feel strongly about this sort of thing, feel free to discuss this further. ;-)