Describing English vowels

Repeat each of the following vowels several times in succession, paying close attention to what your tongue body is doing as you move from one vowel to the other:

If you are like most English speakers, you should have noticed that your tongue body moves backward and forward in the [i]/[u] pair: it is further forward during [i] and further backward during [u]. Similarly, your tongue body is closer to the roof of your moth during [u] than during [ɑ] -- your jaw is probably also opening wider during the [ɑ] to help increase the distance between the tongue body and the roof of the mouth. Your tongue body is higher (i.e., closer to the roof of the mouth) during [i] than during [æ]. (You may also be able to feel the [i] as somewhat further forward than [æ].) Finally, your tongue body is further forward during [æ] than during [ɑ].

The dimensions for vowels

Height and frontness/backness

The most important property in the traditional classification scheme for vowels is the highest point reached by the body of the tongue, on both the front/back and high/low dimensions. Vowels are conventionally arranged on a two-dimensional diagram, where the vertical dimension indicates the distance of the tongue body from the roof of the mouth, and where the horizontal dimension indicates the forward or backward displacement of the tongue body (with left representing further forward). The four vowels [i], [u], [æ], and [ɑ].

[æ]      [ɑ]

Other vowels can be specified by the position of the tongue body relative to these four corners. In [e], for example, the tongue body is pushed forward, as it is during [i] and [æ], but it is further away from the roof of the mouth in [e] than in [i], and closer to the roof of the mouth than in [æ]. So we can place [e] on a vowel chart between [i] and [æ].

Including all the vowels of English, our diagram looks like:

[i] [u]
[ɪ] [ʊ]
[e] [ə] [o]
[ɛ] [ʌ] [ɔ]
[æ]     [a] [ɑ]

We distinguish three major degrees of height: high, mid, and low. We also distinguish three major degrees on the front/back dimension: front, central, and back. (Don't confuse this use of "central" with the "central" that is the opposite of "lateral".) Imposing these categories on the above diagram gives us the traditional vowel chart used in the North American linguistic tradition:

The schwa [ə] is in the exact centre of this chart. Schwa is often referred to as the neutral vowel, the vowel in which the vocal tract is in its neutral state and most closely resembles a perfect tube. All the other vowels require that the vocal tract be deformed by moving the tongue body away from its neutral position, either up or down, backward or forward.


We can distinguish most English vowels from each other in terms of the high/mid/low dimension and the front/central/back dimension. But the chart above still has four cells which contain two full (non-schwa) vowels apiece. So far we have no way to tell apart the following four pairs of vowels:

In each pair, one of the vowels is higher and less centralized (further front if a front vowel, further back if a back vowel), while the other is lower and closer to the position of [ə] on the horizontal dimension. Within each of these cells, the higher and less centralized vowel is referred to as tense; the lower and more centralized vowel is referred to as lax.

(Those speakers who don't have [ɔ] in their dialect can try to produce one by lowering and centralizing an [o] .)


There is another important difference among the vowels of English. When you say [u], your lips are rounded. When you say [i], your lips are spread. Vowels can be categorized according to whether they are rounded or unrounded. In English, the mid and high back vowels are rounded, the front and central vowels unrounded.

The [ɑ] vowel of the word [ˈfɑðɹ̩] is unrounded in most dialects of English, though in Canadian English it is often rounded at least a little.

Glides and diphthongs


When the tongue body is pushed up and forward for the high front vowel [i], it ends up underneath the hard palate. If we were to try to classify [i] as if it were a consonant, we would have to call it a voiced palatal approximant: the vocal tract is made narrower by the tongue body approaching the hard palate, but not close enough to cause a turbulent airstream. But we already have a symbol, [j], for a voiced palatal approximant.

In fact, there is very little real difference between [i] and [j]. Both can be made with the tongue in the same position. [i] acts as the central part of a syllable, and typically lasts somewhat longer than a [j]. [j] does not act as the central part of a syllable and is typically fairly short. Essentially, [j] is simply an [i] that is acting as a consonant instead of a vowel.

There is a similar relationship between the vowel [u] and the consonant [w]. The high back position of [u] puts it directly under the soft palate, where you would expct to find the velar half of a [w]. A [w] is essentially an [u] that is acting as a consonant rather a vowel.

Glide is the general term for a consonant which corresponds in this way to a vowel.


Three of the English vowels introduced earlier required a sequence of two IPA symbols: [aj], [aw], and [ɔj]. This might seem like a violation of the principle that there should be a one-to-one relationship between sounds and IPA symbols. But we can now see why [aj], [aw], and [ɔj] do not really act as single, simple vowels. For a vowel like [ɑ], the tongue body moves into a low and back position and remains there for the duration of the vowel. During [aj], on the other hand, the tongue body does not remain in one place -- it is (almost constantly) in motion from one position to another.

Complex vowels like [aj] which involve a movement of the tongue body from one position to another are called diphthongs. Simple vowels like [ɑ] which maintain a relatively constantly position throughout are called monophthongs.

In the transcription of a diphthong, the first symbol represents the starting point of the tongue body and the second symbol represents the direction of movement. (It is also position to use a vowel symbol for the second half of a diphthong, with a half-circle "non-syllabic" diacritic, to indicate the exact position of the tongue body at the end of the diphthong.)

In the diphthong [aj], the tongue body begins in a low, central position, represented by the symbol [a]. The tongue body almost immediately begins to move upward and forward, toward the position for an [i]. Usually, especially in facter speech, the tongue body does not have time to get all the way to the [i] position, so the diphthong often ends nearer to [ɪ] or even [e]. In a narrower transcription, we could record the precise ending position, as in [ai̯], [aɪ̯], or [ae̯]. None of these differences can change the meaning of an English word, so in a broad transcription we simply use [j], the symbol for the glide corresponding to [i], to represent the direction and approximate end-point of the diphthong.

In the diphthong [aw], the tongue body again begins in the low central position, [a], and then moves upward and backward toward the position of [u]. Often, the tongue body only manages to get part-way. We could transcribe the diphthong narrowly, as [au̯], [aʊ̯], or [ao̯], or broadly as [aw], using the symbol for the glide corresponding to [u].

In the diphthong [ɔj], the tongue body begins in the position of the lax mid back vowel [ɔ]. It moves upward and forward, toward the position of [i].

In most dialects of English, even the vowels of bait and boat, which we have been transcribing with the single symbols [e] and [o], are really diphthongs. They begin in the tense mid position but then proceed to move upward toward the position for [i] and [u] respectively. For this reason, you will often see [e] transcribed as [ej], [eɪ̯], or [ei̯], and [o] transcribed as [ow], [oʊ̯], or [ou̯].