... but modeled on something like "The tenors, as a group, harmonize." Which is definitely not what I had in my head when I recorded it, but I don't remember what it really was now. Something about bananas and marmalade or something.
The point is that it's three phonological words/phrases, three syllables each, the first with stress on the middle syllable, the second with final stress, and the last with initial stress. So how are you supposed to know that? Well, recall the original instructions:
Parse the spectrogram into segments and syllables
This should have been easy. The red marks in the solution (above) show the edges of every major 'manner' change in the spectrogram. That's a sudden change in amplitude (indicated by how dark the energy is) accompanied (since this is reiterant 'mamama's by a sudden change in spectrum (i.e. formant frequencies). And since this is reiterant 'mamama', there's really no question whether a given C goes in the onset of the syllable (headed by the vowel to the right) or the coda of a syllable (headed by the vowel to the left).
Use the information in the spectrogram to determine relative amplitude and pitch of each syllable
So what information is there? Well, how dark are the vowels relative to each other? Just eyeballing quickly, the third syllable seems weaker than either of the preceding or following syllables. The third, fourth and fifth syllables, as a group, don't seem as strong as the surrounding syllables (it helps that 4 and 5 are short as well--we'll come back to duration in a bit). And the last two syllables seem pretty weak too. That's just looking at the amplitude (relative darkness) in the spectrogram.
Pitch is harder to tell what's going on. But that third syllable again has well-separated
striations, suggesting very low frequency of vibration (remember that every striation
is a glottal pulse). The last two also have wide striations. 1 2 and 7 seem to have
fairly high pitch. And look at 6. The striations seem to start close together, but then
they get further apart. (If you're really into these things, you can also see some of
the harmonic energy 'bleeding' into the wide band spectrogram. Just so you can see,
I've included below the bottom 2000 Hz of a narrow band spectrogram, just so you can
see the similarity with the pitch track (above), which is done using some other kind
Considering the the placement of pitch peaks, amplitude peaks, and durations, work out
the intonation and phrasing of the utterance
Okay, so let's talk about durations. In a regular utterance, there's all kinds of things to consider--lower vowels are almost always longer than higher vowels, vowels in open syllables are longer than in closed syllables, vowels closed by voiced consonants are longer than when closed by voiceless consonants and so forth. This being reiterant, you don't have to worry about any of that. Duration means one of two things: the result of stress, or the result of prosody (i.e. phrase-final lengthening). So the longest syllables seem to be (more or less in order) 9, 6, (7 or 2), (2 or 7), and maybe 8.
Which of those might be stressed? Stress has three principle correlates in English: amplitude, duration, and pitch excursion. We know that 2, 6 and 7 have pitch excursions. They're also impossibly long. So those are probably our stresses.
So why is 3 so long? It arguably has the lowest pitch except for the absolute final syllable, and it's very weak, even compared to the very very short vowels which follow. Maybe it's being lengthened by phrase final lengthening?
We know the last syllable (9) is also phrase (and utterance) final, which accounts for its length. ANd then there's 6. It's the longest syllable in the bunch (discounting) the final syllable). Maybe it's stressed and phrase final?
What else happens at the ends of phrases? There's often a low pitch excursion. If 6 were stressed and phrase final, it would be greatly lengthened, and it should get a high pitch excursion (an H* in ToBI terms), and a low pitch excursion (an L% in ToBI terms). And 6 is the one with the sharp change in apparent pitch (judging from the striations). Aha.
BTW, if you're not into ToBI, then ignore the *s and %s and Hs and Ls and whatnot, and just concentrate on whether the pitch is H(igh) or L(ow) and why.
So I think we have good evidence of three phonological words/phrases--the first three syllables, with a lexical/phrasal accent (H*)on the stressed 2nd syllable (accounting for its strength, duration, and pitch) and a word/phrase-final (and therefore long, but prosodically weak (unstressed) third syllable (probably with a phrase-final L on it). The next three syllables (the last stressed, getting the amplitude boost from lexical stress, the high pitch accent getting attracted to the stressed syllable, and an extra length boost (as well as a low final tone) from being final. And finall the last three syllables. The stress is on the first of these (getting the usual markers) and the length of the final syllable (and its low pitch) comes from it's being final in two domains (word/phrase and utterance).
For those of you who care (not in strict ToBI organization, but you'll see what I mean):
|Prosodic word phrase level||[||L-]||[||L-]||[||L-]|
|Phrasal/lexical accent level||[||H*||]||[||H*||]||[||H*||]|
Now as a final 'think about it' exercise, think about the relative durations and amplitutes of the [m]s. In considering syllables, we thought mostly in terms of what the vowel was doing. But what do we expect to happen to consonants, depending on whether they're in stressed or unstressed syllables? initial, medial or final in some domain or other? and can we see evidence of any of that in the spectrogram?