Identifying sounds in spectrograms

Let's look at how various kinds of sounds appear on a spectrogram.


Vowels usually have very clearly defined formant bars, as in the following:

In dipthongs, you can see the formants change frequency as the tongue body moves through the mouth:

You can't always tell reliably which formant you're looking at -- F1, F2, F3, etc. -- unless you already have a good idea of where to expect them. But the existence of formants is usually obvious enough that you can at least be sure you're looking at a vowel.

(There are some especially common difficulties in identifying formants. In [ɑ], and sometimes other back vowels, F1 and F2 are often so close together that they appear as a single wide formant band. In [i], F2 and F3 also often appear merged together in a single wide band.)


Fricatives are easy. The turbulent airstream of fricatives creates a chaotic mix of random frequencies, each lasting for a very brief time. The result sounds much like static noise, and on a spectrogram it looks like the kind of static noise you might see on a TV screen.

While each momentary burst of energy occurs at a random frequency, there are tendencies in which frequencies the random bursts cluster around. [s] has a higher average frequency than [ʃ] does; and both are higher than [f] or [θ].

Voiced fricatives show aspects of both regular vocal fold vibrations and a randomly turbulent airstream.


[h] is really a voiceless version of the preceding or following vowel. On a spectrogram, it looks a little like a cross between a fricative and a vowel. It will have a lot of random noise that looks like static, but through the static you can usually see the faint bands of the voiceless vowel's formants.


The medial phase of a voiceless plosive is complete silence. On a spectrogram, this will appear as a white blank.

The quiet vocal fold vibrations in a voiced plosive will sometimes appear as a faint band along the bottom of the spectrogram at the frequency of f0. (But very often you won't see anything there, either because the voicing got lost in the background noise or because the recording or computer equipment cut off frequencies that low.)

To tell the difference between plosives, listeners rely on the release burst and on formant transitions. On a spectrogram, the release burst looks like a very, very thin fricative. The formant transitions (if you can see them) look like the formants have been distorted away from the frequencies they have during most of the vowel.

Aspiration will look like a period of [h] between the blank gap and the vowel -- specifically, a voiceless version of the following vowel. (Recall that the tongue body is in position for the following vowel and that aspiration is just a delay in the onset of voicing.)

NB: Aspiration is not the same as the release burst. The period of aspiration (which only some voiceless plosives have) is much longer than the very short release burst (which all released plosives have).

The above spectrogram is of the English word attack [əˈtʰæk].

The periods of time labelled are:

A:   the initial schwa
B: the medial phase of the [t] (silence)
C: the release burst of the [t]
D: the aspiration (delay of the onset of voicing for [æ])
E: the [æ] -- voicing has finally started. Right at the end of the vowel, you can see F2 and F3 start to approach one another in a formant transition pattern (often called the "velar pinch") that usually marks the onset phase of a velar consonant.
F: the medial phase of the [k] (again, silence)
G: the release burst of the [k] (which I pronounced as released for the purposes of this spectrogram)

Nasals and [l]

Nasals and [l] usually look like quite faint vowels, without a lot of amplitude in the higher frequencies.

You can still see some things that look like formants. But the acoustic properties of tubes with branches and side-chambers are much more complicated, with anti-formants as well as formants, so the formant bands will appear in different positions and usually be fainter. Which nasal or lateral it is usually isn't something you can figure out looking at just a spectrogram.


Next:  --next--  | Previous:  Spectrograms  | Up:  Acoustic phonetics  | Home:  Home