The parts of the vocal tract are (mostly) independent of each other. Several parts are changing, but they aren't necessarily changing all at the same time. For example, for most English speakers saying [un], the soft palate lowers and lets air out through the nose long before the tongue tip raises and blocks off airflow through the mouth.
The whole set of movements by four parts during the word spoon [spun] might look like:
It certainly doesn't look like:
On a practical level, it's almost impossible to start work using the first kind of picture, even if it is more realistic. To begin studying phonetics, we have to idealize the real world. We have to act as if speech events look like the second picture rather than the first. We can divide the idealized picture up into segments which have well defined properties. Figure (2) has four segments:
When phoneticians and phonologists talk about a sound , they are usually referring to one of these idealized segments.
Ultimately, a theory of phonetics and phonology has to explain realistic pictures like (1) rather than simplified pictures like (2). But for many practical purposes, it's good enough if we use the idealization that any stream of speech can be broken up into a series of segments and each segment given a phonetic symbol.