News & Analysis

JPEG at heart of digital camera

Youngjun Yoo, Manager, Reference Camera Software, Imaging and Audio Group, Texas Instruments Inc., Dallas

1/6/2003 10:04 AM EST

JPEG at heart of digital camera

Analysts estimate that digital still cameras will account for close to half of all still-camera sales by 2005, a testament to the flexibility and enhanced features the DSCs provide over traditional cameras.

With DSCs firmly entrenched in the consumer mind-set, the industry has now settled on a de facto standard that will further propel penetration. Exchangeable Image File Format (Exif) is an industry-standard file format that uses core Joint Photographic Experts Group (JPEG) compression technology. For its relative simplicity and flexibility, "baseline" JPEG-the technology that includes the most fundamental features of the standard-is the foundation upon which DSC designers are building increasingly advanced camera designs.

The JPEG standard defines four variations for image compression:

- Sequential discrete cosine transform (DCT)-based mode. This includes the JPEG baseline format.

- Sequential lossless mode. This format is used in applications where it is important that no image detail is lost in the compression process.

- Progressive DCT-based mode. Very similar to baseline mode, the progressive format processes in a series of scans that render the image increasingly more accurate with each scan.

- Hierarchical mode. This format makes it possible to hold a number of images with different resolutions within one file. Like progressive DCT mode, it uses more than one scan to define the image.

With its simplicity and wide support, JPEG baseline is the de facto standard for DSC systems and all JPEG decoders must support it, even if they otherwise leverage different features within the standard.

Baseline JPEG can be applied to both gray-scale and color images. This is done by deconstructing the picture into several parts or components, denoting light intensity and color. Each component is made up of x columns and y rows of samples, which correspond to the "pixel" data of the image. The number of samples in baseline JPEG mirrors the resolution of the image. A single pass through a component, as is done in baseline compression, is known as a scan.

In JPEG, encoded data is divided into a given number of blocks referred to as minimum coded units (MCUs). These blocks are created by several 8 x 8-pixel sections from the source image. The purpose of MCUs is to make the image workable by breaking it into manageable blocks of data, which is particularly important for memory-constrained embedded-system applications. The MCU also permits manipulation of the image by the encoding algorithm.

As it processes each MCU, the algorithm always moves from left to right across the still-picture screen, then top to bottom through the component. There are two methods of processing MCUs and components: where each MCU has data from only one component, and where data from each component is interleaved within a single MCU. In the latter, each MCU has all the data for a specific physical section of an image instead of a number of MCUs having each component of that section. Furthermore, horizontal and vertical sampling factors determine how many 8 x 8-pixel sections are to be placed within an MCU when the component data is interleaved.

JPEG was designed to manipulate characteristics of the human visual system to achieve greater compression of image data. The standard, in fact, is specifically designed to discard information not easily recognizable to the human eye.

The human eye has a tendency to notice variations of brightness to a much greater extent than differences in color. Unlike changes in light and dark, slight changes in color are not perceived well. As a result, lossy DCT-based encoding tends to discard data conveying slight variances in color, since it is less important to the viewer.

The JPEG Color System neither assumes a standard color space nor requires the description of the color space information of the encoded color image as ancillary information to the compressed bit stream. The industry addressed this potential color-space compatibility issue by specifying a specific color space for JPEG in widely accepted file formats like JPEG File Interchange Format (JFIF) and Exif. Instead of the RGB (red, green and blue) primary-color spectrum, these file formats use the YCbCr (luminance and chrominance) color space defined by CCIR 601 for JPEG-compressed data.

Three steps involved

The JPEG baseline algorithm can actually be broadly broken down into three sections: DCT, quantization and entropy coding.

Spatial frequencies within an image are very important to human vision. With the discrete cosine transform, the image can be broken down into a set of waveforms, each with a particular spatial frequency. That allows the algorithm to get rid of information that is not perceptible to human vision. Conversely, it enables the system to keep information that is important.

The DCT operation is used to create uncorrelated digital-filter coefficients. Each coefficient can be treated independently without affecting compression efficiency. Additionally, JPEG enables the quantizing of the DCT coefficients using visually weighted quantization values to increase compression efficiency.

Applied to the 8 x 8 two-dimensional array of minimum-coded-unit samples, the DCT works on an 8-bit number representing the specific component currently being processed. On the DCT output, there is also an 8 x 8 set of spatial frequencies representing the input section of the image. But with the coefficients added, every number that the DCT outputs is a 12-bit number.

The second step of the JPEG algorithm is the quantization process. Here, the output of the DCT goes through the point-wise division by an integer matrix of the same dimension. Baseline JPEG's "lossy" nature comes from the quantization process. By ridding the image of unnecessary information, quantization enables one of JPEG's main compression features. Because images typically have low-spatial-frequency changes in location-specific areas, and because humans tend not to notice high-spatial-frequency changes, quantization allows many of those coefficients to be filtered.

In the quantization process, this is done by choosing the right quantization factors for the coefficient being worked on. The higher the quantization value, the closer to zero the coefficient is going to be. The lossy aspect of quantization comes additionally from the fact that values are always rounded to integers. This is also the reason that large quantization factors are more likely to discard information.

JPEG is unique in its ability to have different quantization factors, depending on the component. It follows that the ability to separate brightness and color in the YCbCr color scheme is an advantage for JPEG because the two qualities are differently perceived by the eye. Likewise, it is an advantage to maintain different quantization factors for brightness and chrominance values, because more chrominance can be lost without a perceptible difference to the viewer. It is common to see a normal 8 x 8-pixel section of an image represented by two or three numbers in the dc and low-frequency ac coefficients with the remaining quantized coefficients being zero. This format would lend itself to compression. However, this type of matrix format is not beneficial in preparing the quantized table for entropy encoding. Since the quantized coefficients are more likely to be zero as the spatial-frequency values increase, it is helpful to arrange the coefficients in this order.

The last DCT processing step is entropy coding, which is designed to achieve additional compression losslessly by encoding the quantized DCT coefficient more compactly based on statistical characteristics. JPEG defines two entropy-coding methods: Huffman coding and arithmetic coding. For the baseline sequential codec, Huffman coding is used.

Huffman coding

Entropy encoding takes place in two steps. The first is to convert the zigzag sequence of quantized coefficients into an intermediate sequence of symbols, and the second is to convert those symbols into a data stream where they lose externally identifiable boundaries.

Arithmetic coding is superior to Huffman coding in terms of compression efficiency. However, its simplicity makes Huffman coding a better choice for baseline JPEG. Huffman coding calls for a constant set of probabilities for each symbol prior to encryption. The codes are built using a tree structure. It is created using a sequence of pairing operations where the two least-probable symbols are joined at a node. The process goes on with each node having its probabilities combined and joined to the next smallest probable symbol until all are in the tree structure. Each branch of the tree is created by assigning it a 0 or a 1 bit. Then, the code for each symbol is read by taking each bit, starting at the center and extending the branch for which the symbol is defined.

This process yields a unique code for each symbol of optimal data size. Codes can be of different lengths, but one can never be mistaken for another when read from the most significant bit to the least. This is important in JPEG coding because bit codes are joined to each other without considering the code length. For decompression, the decoder uses an inverse of the encoding process.

The increasingly fluid nature of the DSC market is making programmable DSP-based solutions an attractive design option. While older designs put core features in hardwired devices, DSP solutions enable designers to build advanced features and functionality into DSP software. Benefits include fast time-to-market, value-added functionality and simple product upgrades.

See related chart





Please sign in to post comment

Navigate to related information

EE Buzz DesignCon

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form