Digital Audio Basics
In the digital world, everything is reduced to an on or off state so that it can be stored in computer memory as a single bit of information: 1 or 0. Complex real world things, like images and audio, cannot be directly represented in such a simple manner. An image is rarely composed of black and white dots and audio is rarely just on or off.
Reducing images and audio to a digital state requires an analog-to-digital conversion. Instead of using just one bit of information, many bits are used to more accurately store the state. By using 2 bits, for example, four states are possible: 00, 01, 10, 11. For images, that could be black (00), dark gray (01), light gray (10), and white (11). For audio, that would give four different levels of loudness. Typically, many more bits are used. Most computer video cards use 16 to 32 bits to store a single dot. Sound cards typically use 16 bits for audio levels.
The number of bits to use depends on human perception and bit alignment within computers. Computer tend to bundle bits in groups of 8, called bytes, so using 8, 16, 24, or 32 bits would fit nicely in 1, 2, 3, or 4 bytes respectively. For images, 16 bits do not provide enough states to make the transition from one state to the next imperceptible, so 24 or more bits are used. For audio, 16 bits are adequate, which is what a CD contains, but audio systems using 24 bits will be common in the future.
Samples
Digital audio is composed of thousands of numbers, called samples. Each sample holds the state, or amplitude (loudness), of a sound at a given instant in time. For images, each point of light, or pixel, has a certain brightness and location and all pixels combine to make a picture (see figure below). For digital audio, all the samples combine to make a waveform of the sound. Figure: Samples
When playing audio, each sample specifies the position of the speaker at a certain time. A small number moves the speaker in and a large number moves the speaker out. This movement occurs thousands of times per second, causing vibration, which we hear as sound.