Skip to main content

Full text of "NASA Technical Reports Server (NTRS) 19940024872: Image-adapted visually weighted quantization matrices for digital image compression"

See other formats


NASA CASE NO. ARC 12015-1 
PRINT FIG 1 

NOTICE 

The invention disclosed in this document resulted from research in aeronautical and 
space activities performed under programs of the National Aeronautics and Space 
Administration. The invention is owned by NASA and is, therefore, available for 
licensing in accordance with the NASA Patent Licensing Regulation (14 Code of 
Federal Regulations 1245.2). 

To encourage commercial utilization of NASA-owned inventions, it is NASA policy 
to grant licenses to commercial concerns. Although NASA encourages nonexclusive 
licensing to promote competition and achieve the widest possible utilization, NASA 
will consider the grant of a limited exclusive license, pursuant to the NASA Patent 
Licensing Regulations, when such a license will provide the necessary incentive to 
the licensee to achieve early practical application of the invention. 

Address inquiries and all applications for license for this invention to NASA/ Ames 
Research Center, Patent Counsel, Mail Stop 200-11, Moffett Field, CA 94035-1000. 

Serial Number: 08/186.366 

Filing Date: 1/25/94 

Patent No: 

Date Patent Issued: 

NASA/ARC 


(NASA-Case-ARC-12015-l) N 94-29375 

IMAGE-ADAPTED VISUALLY WEIGHTED 
QUANTIZATION MATRICES FOR DIGITAL 

IMAGE COMPRESSION Patent Unclas 

Application (NASA. Ames Research 
Center ) 28 p 


G3/60 0004070 



ARC 12015-1 


INVENTION ABSTRACT 

IMAGE-ADAPTED VISUALLY WEIGHTED QUANTIZATION MATRICES 
FOR DIGITAL IMAGE COMPRESSION 

The invention relates generally to the field of data 
compression of the digital information of digital images. 

The system 10 of the present invention is used for compression 
of digital images and has retrieval and storage modes of operation 
as most clearly shown in Fig. 1, and wherein the operation is 
determined by algorithms that control the digital compression and 
are most clearly shown in Figs. 2, 3 and 4. The operation of the 
system 10 generates a quantization matrix optimizer that is adapted 
to the individual image being compressed so that the image that is 
produced has high resolution and low preceptual error. The image 
being compressed is customized based on luminous masking, contrast 
masking, and error pooling-techniques. 

The system 10 not only provides for a compressed digital 
image, but also provides an image with high quality visual 
representation. The system 10 generates image-dependent 

quantization matrices that produce images that are vastly superior 
to those of the prior art generated by image independent matrices. 
Furthermore, the system 10 provides a digital compressed image 
having a minimum bit rate for a given preceptual error. The 
techniques employed by the system 10 provides for a digital 



2 


compressed image that has a high definition and accomplishes such 
using the minimum amount of digital bits so as to ease the 
infrastructure burden of a related computer network used in 
advanced high resolution television, thereby, assisting in the 
commercial realization of high resolution television. 


Inventor: Andrew B. Watson 

Serial No.: 08/186,366 

Filing Date: 1/25/94 


Employer: NASA Ames Research Center 



- 2 - 


ARC 12015-1 


PB§C»*N« PA3* BLANK NOT F»lWtt> 


ORIGIN OF THE DISCLOSURE 

The invention described herein was made by an employee of the National 
Aeronautics and Space Administration and it may be manufactured and used by an 
5 for the United States Government for governmental purposes without the paymen o 

royalties thereon or therefore. 


1 0 


A. 


BACKGROUND OF THE INVENTION 


TECHNICAL FIELD OF FIELD OF THE INVENTION: 


1 5 


20 


The present invention relates to an apparatus and method for coding images, 
and more particularly, to an apparatus and method for compressing images to a 
reduced number of bits by employing a Discrete Cosine Transform (DCT) in 
combination with visual masking including luminance and contrast techniques as we 
as error pooling techniques all to yield a quantization matrix optimizer that provides an 
image having a minimum perceptual error for a given bit rate, or a minimum bit rate for 

a given perceptual error. 


DESCRIPTION OF THE PRIOR ART: 


Considerable research has been conducted in the field of data compression, 

2 5 especially the compression of digital information of digital images. Digital images 

comprise a rapidly growing segment ol the digital information stored and 
communicated by science, commerce, industry and government. Digital images 
transmission has gained significant importance in highly advanced television systems, 
such as high definition television using digital information. Because a relatively large 

3 0 number of digital bits are required to represent digital images, a difficult burden is 

placed on the infrastructure of the computer communication networks involved with the 
cieation. nansmission and le-creation of digital images. Foi this reason, there is a 
need to compress digital images to a smaller number of bits, by reducing ledundancy 
and invisible image components of the images themselves. 

3 5 A system that performs image compression is disclosed in U.S. Patent 

5.121.216 of C.E. Chen et al, issued June 9, 1992. and herein incorporated by 



-3- 


reference. The '216 patent discloses a transform coding algorithm for a still image, 
wherein the image is divided into small blocks of pixels. For example, each block of 
pixels may be either an 8 x 8 or 16 x 16 block. Each block of pixels then undergoes a 
two dimensional transform to produce a two dimensional array of transform 
5 coefficients. For still image coding applications, a Discrete Cosine Transform (DCT) is 
utilized to provide the orthogonal transform. 

In addition to the '216 patent, the Discreet Cosine Transform is also employed 
in a number of current and future international standards, concerned with digital image 
compression, commonly referred to as JPEG and MPEG, which are acronyms for Joint 
1 0 Photographic Experts Group and Movie Pictures Experts Group, respectively. After a 
block of pixels of the '216 patent undergoes a Discrete Cosine Transform (D ), t e 
resulting transform coefficients are subject to compression by thresholding and 
quantization operations. Thresholding involves setting all coefficients whose 
magnitude is smaller than a threshold value equal to zero, whereas quantization 

1 5 involves scaling a coefficient by step size and rounding off to the nearest integer. 

Commonly, the quantization of each DCT coefficient is determined by an entry 
in a quantization matrix. It is this matrix that is primarily responsible for the perceived 
image quality and the bit rate of the transmission of the image. The perceived image 
quality is important because the human visual system can tolerate a certain amount o 

2 0 degradation of an image without being alerted to a noticeable error. Therefore, certain 

images can be transmitted at a low bit rate, whereas other images cannot tolerate any 
degradation and should be transmitted at a higher bit rate in order to preserve their 

informational content. 


25 


The ‘216 patent discloses a method for the compression of image information 
based on human visual sensitivity to quantization errors. In the method of '216 patent, 
there is a quantization characteristic associated with block to block components of an 
image. This quantization characteristic is based on a busyness measurement of the 
image. The method of ‘216 patent does not compute a complete quantization matrix. 

but rather only a single scaler quantizer. 


Two other methods are available for computing DCT quantization matrices 
based on human sensitivity. One is based on a mathematical formula for human 
contrast sensitivity function, scaled for viewing distance and display resolution, and is 
disclosed in U.S. Patent 4,780.716 of S.J. Daly et al. The second is based on a 
formula for the visibility of individual DCT basic functions, as a function of viewing 



distance, display resolution, and display luminance. The second formula is disclosed 
in both a first article entitled "Luminance-Model-Based DCT Quantization For Color 
Image Compression" of A.J. Ahumada et al. published in 1992 in the Human Visi on, 
Visual Processing end Digital Display 111 Proc. SPIE 1666, Paper 32. and a second 
technical article entitled "An Improved Detection Model for DCT Coefficient 
Quantization" of H.A. Peterson, et al., published in 1993, in Human Vision , Visu al 
Prnrpssinn and Digital Pisplav_Ml Proc. SPIE. Vol. 1913 pages 191-201. The 
methods described in the 761 patent and the two technical articles do not adapt the 
quantization matrix to the image being compressed, and do not therefore take 
advantage of masking techniques for quantization errors that utilize the image itself. 
Each of these techniques has features and benefits described below. 

First, visual thresholds increase with background luminance and this feature 
should be advantageously utilized. However, the formula given in the both referenced 
technical articles describes the threshold for DCT basic functions as a function of 
mean luminance. This would normally be taken as the mean luminance of the 
display. However, variations in local mean luminance within the image will in fact 
produce substantial variations in the DCT threshold quantities. These variations are 
referred to herein as "luminance masking" and should be fully taken into account. 

Second, threshold for a visual pattern is typically reduced in the presence of 
other patterns, particularly those of similar spatial frequency and orientation. This 
reduction phenomenon is usually called "contrast masking." This means that a 
threshold error in a particular DCT coefficient in a particular block of the image will be 
a function of the value of that coefficient in the original image. The knowledge of this 
function should be taken advantage of in order to compress the image while not 
reducing the quality of the compressed image. 

Third, the method disclosed in the two referenced technical articles ensures that 
a single error is below a predetermined threshold. However, in a typical image there 
are many errors of varying magnitudes that are not properly handled by a single 
threshold quantity. The visibility of this error ensemble selected to handle all varying 
magnitudes is not generally equal to the visibility of the largest error, but rather reflects 
a pooling of errors over both frequencies and blocks of the image. This pooling is 
herein term "error pooling" and is beneficial in compressing the digital information of 
the image while not degrading the quality of the image. 



- 5 - 


5 


1 0 


Fourlh, when all errors are kept below a perceptual threshold, a certain bit rale 
will result, but at times it may be desired to have an even lower bit rate. The two 
referenced technical articles do not disclose any method that would yield a minimu 
perceptual error lor a given bit rate, or a minimum bit rate for a given perceptual error. 
It is desired that such a method be provided to accommodate this need. 


Finally, it is desired that all ol the above prior art limitations and drawbacks be 
eliminated so that a digital image may be represented by a reduced number ol digila 
bits while at the same time providing an image having a low perceptual error. 


Accordingly, an object of the present invention is to provide a method to 
compress digital information yet provide a visually optimized image. 

Another object of the present invention is to provide a method of compressing a 
visual image based on luminance masking, contrast masking, and error pooling 


techniques. 

A further object of the present invention is to provide a quantization matrix that is 

1 5 adapted to the individual image being compressed so that the image that is 

reproduced has a high resolution and a low perceptual error. 

A still further object of the present invention is to provide a method that yields 
minimal perceptual error of an image for a given bit rate, or a minimum bit rate for a 
given perceptual error of the image. 

2 0 SUMMARY OF THE INVENTION 


25 


30 


The invention is directed to digital compression of images, comprising a 
plurality of blocks of pixels, that uses the DCT transform coefficients yielded from a 
Discrete Cosine Transform (DCT) of all the blocks as well as other display and 
perceptual parameters all to generate a quantization matrix which, in turn, yields a 
reproduced image having a low perceptual error. The invention adapts or customizes 
the individual quantization matrix to the image being compressed. 


The present invention transforms a block of pixels from an electronic image into 
» digital representation of that image and comprises the steps of applying a Discrete 
Cosine Transform (DCT), selecting a DCT mask (m ijk ) for each block of pixels, an 
selecting a quantization matrix (qy) for quantizing DCT transformation coefficients (c ijk ) 
aroduced by the DCT transformation. The application of a Discrete Cosine Transform 



- 6 - 


1 0 


1 5 


(DCT) transforms the block of pixels info a digifal signal represented by the DCT 
coefficients (c iik ). The DCT mask is based on parameters comprising DCT coefficients 
(Ciik). and display parameters. The selection of the quantization matrix (qy) comprises 
Ihi steps of: (i) selecting an initial value of qy; (ii) quantizing the DCT coefficient qy in 
each block k to form quantized coefficient u ijk ; (iii) inverse quantizing u iik by 
multiplying by qy; (iv) subtracting the reconstructed coefficient qiju ijk from cyy to 
compute the quantization error e ijk , (v) dividing ey k by the DCT mask m ijk to obtain 
perceptual errors; (vi) pooling the perceptual errors of one frequency y over all bloc s 
k to obtain an entry in a perceptual error matrix py; and (vii) repeating this process 
vi) for each frequency y. and (viii) adjusting the values of qy up or down until eac 
entry in the perceptual error matrix p,j is within a target range. 

The method preferably comprises a further step of entropy coding the digital 
representation of the image. In addition, the invention further comprises providing a 
computer network for implementing the practice of the present invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 


Fig. 1 is a block diagram of a computer network that may be used in the practice 
of the present invention. 

Fig. 2 schematically illustrates some of the steps involved with the method of the 
2 0 present invention. 

Fig. 3 schematically illustrates the steps involved, in one embodiment, with the 
formation of the quantization matrix optimizer of the present invention. 

Fig. 4 schematically illustrates the steps involved, in another embodiment, with 
the formation of the quantization matrix optimizer of the present invention. 

2 5 Fig. 5 illustrates a series of plots showing the variations of the luminance 

patterns of a digital image. 

Fig. 6 is a plot of the contrast masking function involved in the practice of the 
present invention. 

Fig 7 illustrates two plots each of a different digital image and each showing the 

3 0 relationship between the perceptual error and Ihe bit rale involved in image 

compression. 



-7- 


DETAILED DESCRIPTION OF THE INVENTION 

Referring now to the drawings wherein like reterenoe numerals designate like 
elements, there is shown in Fig. 1 a block diagram ot a computer network 10 that may 
5 be used in the practice or the present invention. The network 10 is particularly suited 
for performing the method of the present invention related to a still image that may e 
stored, retrieved or transmitted. For the embodiment shown in Fig. 1. a first group o 
channelized equipment 12 and a second group ot channelized equipment 14 are 

provided. Further, as will be further described, for the embodiment shown in ig. , e 

1 0 channelized equipment 12 is used to perform the storage mode 16/retrieval mo 

operations of the network 10 and, similarly, the channelized equipment 14 is used to 
perform the storage mode 16/retrieval mode 18 operations of the network 10. As will 
be further described, the storage mode 16 is shown as accessing each disk subsystem 
20, whereas the retrieval mode 18 is shown as recovering information from each dis 
1 5 subsystem 20. Each of the channelized equipments 12 and 14 may be a SUN SPARC 
computer station whose operation is disclosed in instruction manual Sun 
Microsystems Part # 800-5701-10. Each of the channelized equipments 12 and 14 is 
comprised of elements having the reference numbers given in Table 1. 



- 8 - 


Reference No z 


TABLE 1 
Element 


20 

22 

24 

26 

28 


Disk Subsystem 
Communication Channel 
CPU Processor 

Random Access Memory (RAM) 
Display Subsystem 


,n general and as to be more fully described. Ihe method of the present 
invention being run in the network 10, utilizes, in part, a Discrete Cos.ne Transform 
I 0 (DCT) discussed in the "Background- section, to accomplish image compression n 
1 ° he Ta e Tde 16, an original image 30, represented by a - 's 

received from a scanner or other source a, the common, cat, on channel 22 of he 

channelized eguipment 12. The image 30 is . 

data. The channelized equipment^^.n parUcu est|mates g quantizaljon 

1 5 DCT transformation, computes a DCT mask 

matrix optimizer The channelized equipment 12 then quant, zes the digital b its 

comprising the image 30, and performs run-length encoding and Huffman or anthme 
coding of the quantized DCT coefficients. Run-length encoding. arithmetic coding 

Huflman coding are well-known and "m. herein 

2 0 "r i;: ^ ^ 

"s then stored in coded form along with coded coefficient data, following , a JPEG 
oTothe^ standard. The compressed file Is then stored on the disk subsystem 20 of fhe 
channelized equipment 12. 

2 5 In the retrieval mode 18. the channelized equipment 12 (or 14) retrieves the 

compressed file from the disk subsystem 20, and decodes the quantization matrix a 
the DCT coefficient data. The channelized equipment 12, or the channelize 
equipment 14 to be described, then de-quan.izes the coefficients by mult, plica non of 
me quantization matrix and performs an inverse DCT. The resulting dig,, 

3 0 containing pixel data is available for display on the display subsystem o the 

channelized equipment 12 or can be transmitted to the channelized equipment 14 or 
elsewhere by the communication channel 22. The resulting digital file is shown m Fig. 
1 as 30' (IMAGE). The operation of the present invention may be further described 

with reference to Fig. 2. 



-9- 


5 


1 0 


1 5 


20 


2 5 


30 


3 5 


Fig. 2 is primarily segmented to illustrate the storage mode 16 and the retrieval 
mode 18 Fig. 2 illustrates that the storage mode 16 is accomplished in channelized 
equipment, such as channelized equipment 12. and the retrieval mode « 
accomplished in the same or another channelized equipment, such as channelized 
equipment 14. The channelized equipments 12 and 14 are interlaced to each other by 
the communication channel 22. The image 30 being compressed by the operation of 
the present invention comprises a two-dimensional array of pixels, e.g., 256 x 256 
pixels. This array ot pixels is composed of contiguous blocks; e.g., 8x8 blocks of pixe s 
representatively shown in segment 32. The storage mode 16 is segmented into the 
following steps; block 32. DCT 34, initial matrix, quantization matrix optimizer . 
quantize 38. and entropy code 40. The retrieval mode 18 is segmented into e 
following steps: entropy decode 42. de-quantize 44. inverse DCT 46. and un-block 48. 
The sleps shown in Fig. 2 (to be further discussed with reference to Fig. 3) are 
associated with Ihe image compression ol the present invention and, in order to more 
clearly describe such compression, reference is first made to Ihe quantities listed in the 
Table 2 having a general definition given therein. 


Quantities 

i.i 

k 

c ijk 

dij 

Ujjk 

©ijk 

apw [i.j.L.px.py — ] 


tjjk 

at 

Wjj 

mjjk 

djjk 

Pij 

Ps 

C00k 

Lo 

Coo 

V 


TABLE 2 

fieneral Definition 

indexes of the DCT frequency (or basis function) 

index of a block of the image 

DCT coefficients of an image 

quantization matrix 

quantized DCT coefficients 

DCT error 

DCT threshold matrix (based on global mean luminance) 
threshold formula of Peterson et al. given in the article Human 
Vision, Visual Processing and Digital Display VI (previously cited) 
DCT threshold matrix (based on local mean luminance cook) 
luminance masking exponent 
contrast masking exponent (Weber exponent) 

DCT Mask 

perceptual error in a particular frequency i, j and block k 

perceptual error matrix 

spatial error-pooling 

DC coefficient in block k 

mean luminance of the display 

Average DC coefficient, corresponding to Lo (typically 1024) 
target total perceptual error value 



- 10 - 


Each block (step 32) of the pixels is subjected to the application ot a Discrete 
Cosine Transform (DOT) (step 34) yielding related DCT coefficients e wo- 
dimensional Discrete Cosine Transform (DCT) is well known and may ,be s such .as 

5 described in the previously incorporated by reference US. Patent 5,121,216. 

coefficients of the DCT, herein termed o» obtained by the Discrete Cosine Transform 
(DCT) of each block of pixels comprise DC and AC components The DC “eff|c,ent is 
herein termed c ook (0.0) which represents the average intensity of the block, 
remainder of the coefficients c jjk are termed AC coefficients (0,1), (1,0). ..(ij). 

, o The DCT (step 34) of all blocks (step 32), along with the display and perceptual 

parameters (to be described) and an initial matrix, are all inputted into a quantization 
matrix optimizer 36. which is a process that creates an optimized quantization ma nx 
which is used to quantize (step 38) the DCT coefficients. The optimized quantizat 
matrix is also transferred, by the communication channel 22 of the channelize 

, 5 equipment 12, for its use in the retrieval mode 16 that is accomplished ,n the 
channelized equipment 14. The quantized DCT coefficients (c ijk ) are entropy coded 
(step 40) and then sent to the communication channel 22. Entropy coding is well- 
known in the communication art and is a technique wherein the amount of '"formaton 
in a message is based on log„. where n is the number of possible equivalent 

2 0 messages contained in such information. 

At the receiving channelized equipment 14, an inverse process occurs to 

reconstruct the original block of pixels thus, the received bit stre ^ m f d,g ‘^ 
information containing quantized DCT coefficients c ijk is entropy decoded (step ) 
and then are de-quantized (step 44), such as by multiplying by the quantization step 

2 5 size q.i to be described. An inverse transform, such as an inverse Discrete Cosine 

Transform (DCT), is then applied to the DCT coefficients (c ijk ) to reconstruct the bloc 
of pixels After the reconstruction, the block of pixels are unblocked so as to provide a 
reconstituted and reconstructed image 30'. The quantization matrix optimizer 36 is of 
particular importance to the present invention and may be described with reference to 

3 0 Fig. 3. 

The quantization optimizer matrix 36 is adapted to the particular image being 
compressed and, as will be further described, advantageously includes the functions 
of luminance masking, contrast masking, error pooling and selectable quality. All of 
which functions cooperate to yield a compressed image having a minimal perceptual 



error for a given bit rate, or minimum bit rate for a given perceptual error. The 
quantization matrix optimizer 36, in one embodiment, comprises a plurality of 
processing segments each having a reference number and nomenclature given in 

Table 3. 

TABLE 3 

Processing Segment Nomenclature 

Compute visual thresholds 
Adjust thresholds in each block for block 
mean luminance 

Adjust thresholds in each block for component 
contrast 
Quantize 

Compute quantization error 
Scale quantization error by DCT mask 
Pool error over blocks 

Pooled error matrix is approximate target error 
Adjust quantization matrix 


50 

52 

1 0 54 

56 

58 

60 

1 5 62 

64 
66 


The first step in the generation of the quantization optimizer matrix 36 is the 
2 0 derivation of a function DCT mask 70 which is accomplished by the operation of 
processing segments 50, 52 and 54 and is determined, in part, by the display and 
perceptual parameters 72 having typical values given in the below Table 4. 


2 5 Display and Perceptual 

at 

Ps 

Wjj 

L 0 

3 0 image grey levels 

c oo 

viewing distance 


3 5 


T ypical Values 

0.649 

4 

0.7 

65 cd/m 2 

256 

1024 

assumed to yield 32 pixels/degree, and 
for a 256 by 256 pixel image, this 
corresponds to a viewing distance of 
7.115 picture heights 


TABLE 4 
Parameters 



The display and perceptual parameters 72 are used to compute a matrix of DCT 
component visual thresholds by using the formula more fully described m the 
previously referenced first and second technical articles and which formula may be 

represented by expression 1 : 

t,j =apw [i,j,L,px,py,..] 

where apw represents the threshold formula of Table 2. i and j are indexes of the DCT 
frequency, px represents pixels per degree of visual angle horizontal and py 
represents pixels per degree of visual angle vertical. 

The visual thresholds values of expression (1) are then adjusted for mean block 
luminance in processing segment 52. The processing segment 52 receives only the 
DC coefficient of the DCT coefficients indicated by reference number 74, whereas 
segment 54 receives and uses the entire DCT coefficients. The formula use 
accomplish processing segment 52 is given by expression 2: 

l ,jk = *y( C OOl/0>o) 


where a, is a luminance-masking exponent having a typical value of 0.65, t ijk is the 
adjusted threshold. t„ is the un-adjusted threshold. ?«. is the average of the DC terms 
of the DC coefficients for the present image, or may be simply a nominal value of 1024, 
for an eight (8) bit image, and c„,is the DC term of the DCT for block . 


As seen in Fig. 3, the luminance-adjusted thresholds of segment 52 are then 
adjusted for component contrast by the operation of a routine having a relationship as 
given by the below expression 3: 


»>(/*= Max 


IH'i. 


1-M'„ 


Ujk • \ c ijk\ l ijk 


where m ijk is the contrast-adjusted threshold, c ijk is the DCT coefficient, t jjk is the 

corresponding threshold of expression 2, and w, is the exponent that lies between 0 

and 1 and typically has a value of 0.7. Because the exponent w sj may differ for different 

frequencies of the DCT coefficients c ijk , a matrix of exponents equal in size to the 

quantization matrix optimizer 36 may be provided for the derivation of wy. The result of 

the operations of processing segments 50, 52, and 54 is the derivation of the quantity 

m ... herein termed "DCT mask" 70 which is supplied to the processing segment 60 to 
l|K 

be described hereinafter. 



- 13 - 


After the calculation of the DCT mask 70 has been determined, an iterative 
process of estimating the quantization matrix optimizer 36 begins and is comprised of 
processing segments 56, 58. 60. 62, 64, and 66. The initial matrix 35, which is 
typically fixed and which may be any proper quantization matrix, is typically set to a 
5 maximum permissible quantization matrix entry (e.g., in the JPEG standard this 
maximum value is equal to 255) and is used in the quantization of the image as 

indicated in processing segment 56. 

Each transformed block of the image contained in the initial matrix 35 is then 
quantized in segments 56 by dividing it, coefficient by coefficient, by the quantization 

1 0 matrix (q^), and is rounded to the nearest integer as shown in expression 4. 

u ijk = Round[cyA-/ryy] 

Segment 58 then computes the quantization error ejj k in the DCT domain, 
which is equal to the difference between the de-quantized and un-quantized DCT 
coefficients Cjj k , and is shown by expression 5. 
j 5 eijk = c ijk ~ ttijk Qij 

From expressions 4 and 5, it may be shown that the maximum possible 
quantization error e^ is qjj/2. 

The output of segment 58 is then applied to segment 60, wherein the 
quantization error is scaled (divided) by the value of the DCT mask 70. This scaling is 

2 0 described by expression 6: 

c hjk ~ e ijk I m ijk 


25 


where d ijk is defined as the perceptual error at frequency ij in block k. The scaled 
quantization error is then applied to the processing segment 62. The processing 
segment 62 causes all the scaled errors to be pooled over all of the blocks, separately 
for each DCT frequency (ij). The term “error pooling" is meant to represent that the 
errors are combined over all of the DCT coefficients rather than having one relatively 
large error in one DCT coefficient dominating the other errors in the remaining DCT 
coefficients. The pooling is accomplish by a routine having a relationship of 
expression 7: 



\VP. 

) 


30 



- 14 - 


Where dyk is an error in a particular frequency i, j, and block k, p s is a pooling 
exponent having a typical value of 4. It is allowed that the routine of expression (7) 
provide a matrix of exponents p s since the pooling of errors may vary for different DCT 

coefficients. 

5 The matrix pjj of expression (7) is the "perceptual error matrix" and is a simple 

measure of the visibility of artifacts within each of the frequency bands defined by the 
DCT basic functions. More particularly, the perceptual error matrix is a good indication 
of whether or not the human eye can perceive a dilution of the image that is being 
compressed. The perceptual error py matrix developed by segments 56, 58, 60 and, 

1 0 finally, segment 62 is applied to processing segment 64. 

In processing segment 64, each element of the perceptual error matrix Py is 
compared to a target error parameter vy, which specifies a global perceptual quality of 
the compressed image. This global quality is somewhat like the entries in the 
perceptual error matrix and again is a good indication of the amount of degradation 

1 5 that the compressed image may suffer without being perceived by the human eye. If all 

quantities or errors generated by segment 62 and entered into segment 64 are within 
a delta of \\f, or if the errors of segment 62 are less than the target error parameter \\i 
and the corresponding quantization matrix entry is at a maximum (processing segment 
56), the search is terminated and the current element of quantization matrix is 

2 0 outputted to comprise an element of the final quantization matrix 78. Otherwise, if the 

element of the perceptual error matrix is less than the target parameter \\ /, the 
corresponding entry (segment 56) of the quantization matrix is incremented. 
Conversely, if the element of the perceptual error matrix is greater than the target 
parameter \j/, the corresponding entry (segment 56) of the quantization matrix is 

2 5 decremented. The incrementing and decrementing is accomplished by processing 

segment 66. 

A bisection method, performed in segment 66, is typically used to determine 
whether to increment or decrement the initial matrix 35 entered into step 56. In the 
bisection method a range is established for qy between lower and upper bounds, 

3 0 typically 1 and 255 respectively. The perceptual error matrix py is evaluated at the 

mid-point of the range. If py is greater than the target error parameter \|/, then the lower 
bound is reset to the mid-point, otherwise the upper bound is reset to the mid-point. 
This procedure is repeated until the mid-point no longer changes. As a practical 
matter, since the quantization matrix entries qy in the baseline JPEG standard are 



eight bit integers, the needed degree of accuracy is normally obtained in nine 
iterations from a starting range of 1-255 (initial entry into segment 56). The output of 
the program segment 66 is applied to the quantize segment 56 and then steps 56 - 66 
are repeated, if necessary, for the remaining elements in the initial matrix. The 
processing segments shown in Fig. 3 yield a compressed image with a resulting bit 
rate; however, if a particular bit rate is desired for the image, then the processing 
segments shown in Fig. 4 and given in the below Table 5 are applicable. 

TABLE 5 


Processing Seament 

Nomenclature 

80 

Select desired bit rate 

82 

Set initial target perceptual error 

84 

Optimize quantization matrix 
(56, 58, 60, 62, 64 and 66) 

86 

Quantize 

88 

Entropy code 

90 

Decision box (Is the bit rate - desired bit rate) 

92 

Adjust target perceptual error 

The processing segments 

86-92, shown in Fig. 4 and given in Table 5, allow for 


the attainment of a particular bit rate and utilizes a second, higher-order optimization 
which if the first optimization results in a bit rate which is greater than desired, the 
value of the target perceptual error parameter \\i of segment 92 is incremented. 
Conversely, if a bit rate results which is lower than desired, the value of the target 
perceptual error parameter \|/of segment 92 is decremented. 

The sequence of Fig. 4 starts with the selection (segment 80) of the desired bit 
rate followed by the setting (segment 82) of the initial target perceptual error. The 
output of segment 82, as well as the output of segment 92, is applied to segment 84 
which comprises segments 56. 58, 60, 62, 64 and 66, all previously described with 
reference to Fig. 3 and all of which contribute to provide an optimized quantization 
matrix in a manner also described with reference to Fig. 3. The output of segment 84 
is applied to the quantize segment 86 which operates in a similar manner as 
described for the quantize segment 38 of Fig. 2. The output of segment 86 is applied 
to the entropy code segment 88 which operates in a similar manner as described for 

the entropy code 40 of Fig. 2. 



- 1 6 - 


To accomplish the adjustment of the bit rate, the output processing segment 88 
is applied to a decision segment 90 in which the actual bit rate is compared against 
the desired bit rate and, the result of such comparison, determines the described 
incrementing or decrementing of the target perceptual error parameter \|/. After such 
5 incrementing or decrementing the processing steps 84 - 90 is repeated until the actual 
bit rate is equal to the desired bit rate, and the final quantization matrix 78 is created. 

It should now be appreciated that the practice of the present invention provides 
for a quantization matrix 78 that yields minimum perceptual error for a given bit or a 
minimum bit rate for a given perceptual error. The present invention, as already 
1 0 discussed, provides for visual masking by luminance and contrast techniques as well 
as by error pooling. The luminance masking feature may be further described with 

reference to Fig. 5. 

Fig. 5 has a Y axis given in the log function of tjj of expression (1) and a X axis 
given in display luminance L measured in cd/m 2 . The quantity tjj of the block of data 

1 5 shown in Fig. 5 is based on a maximum display luminance L of 100 cd/m 2 and a grey 

scale (reference scale for use in black-and-white television, consisting of several 
defined levels of brightness with neutral color) resolution of eight (8) bits. 

Two families of curves are shown in Fig. 5 with one being 94A, 94B, 94C, 94D 
and 94E (shown in solid representations), and the other being 96A, 96B, 96C, 96D 

2 0 and 96E (shown in phantom representations). The families 94A..94E, and 96A..96E 

are plots for the DCT coefficient frequencies given Table 6. 

TABLE 6 


2 5 Plots Frequency 

94A and 96A 7,7 

94B and 96B 0,7 

94C and 96C 0,0 

94D and 96D 0.3 

3 0 94E and 96E 0,1 


Detection threshold (tjj) for a luminance pattern typically depends upon mean 
luminance from the local image region. More particularly, the higher the background 
of the image being displayed, the higher, the luminance threshold. This is usually 
called "light adaptation" but it is called herein “luminance masking." 



- 17 - 


Fig. 5 illustrates this effect whereby higher background luminance yields higher 
luminance thresholds, wherein the solid plots 94A*«94E indicate this 
interdependency. In particular, it is seen that the value of tjj for each of the plots 

94A***94E increases with increasing values of luminance. The plots 94A*”94E 
5 illustrate that as much 0.5 log units in ty might be expected to occur within an image, 

due to variations in the mean luminance of a block. The present invention takes this 
variation into account, whereas known prior art techniques fail to consider this wide 

variation. 

The effect of mean luminance upon DCT coefficients of the quantization matrix 
1 0 qy is complex, involving both vertical and horizontal shifts of the contrast sensitivity 

function. The luminance-masked threshold may be determined by equation 8: 

t iJk = apwf/J.Lo C 00k /^ool 

where is the DC coefficient of DCT for block k, L 0 is the mean luminance of the 
display, and c 00 is the DC coefficient corresponding to L 0 (1024 for an eight (8) bit 

1 5 image). The solution is as complete and accurate as the underlying formula, but may 

be rather expensive to compute. For example, in the "Mathematica" language, using a 
compiled function, and running on channelized equipment 14 (SUN SPARC 2) of Fig. 
1, took about 1 second per block to compute this function. A second, simpler solution 
is to approximate the dependence of ty upon c 0 ok with the P ower function of the 

2 0 equation (2) previously given. 

In practice, the initial calculation of ty should be made assuming a selected 
displayed luminance L 0 . The parameter a t has a typical value of 0.649. It should be 
noted, that luminance masking may be suppressed by setting a t equal to 0. More 
generally, a t controls the degree to which the masking of Fig. 4 occurs. It should be 

2 5 further noted that the power function given in expression 2 makes it easy to 

incorporate a non-unity display Gamma, by multiplying a, by the Gamma exponent 

having a typical value of 2.3. 

The family of plots 96A««96E of Fig. 5 vary in accordance with the relationship 
of expression 2 and are relatively accurate for the parameters above about 10 cd/m 2 . 

3 0 Except for very dark sections of an image, this range should be more than adequate 

for most image compressions. The discrepancy or inaccuracy of plots 96A***96E is 
also greatest at lowest frequency, especially at the DC term (0,0) Coq^. This 



- 18 - 


discrepancy could be corrected by adopting a matrix of exponents, one for each 
frequency for the relationship given in expression (2). 

It should now be appreciated that the practice of the present invention provides 
luminance masking (shown in Fig. 5 and performed in processing segment 52 of Fig. 

5 3) which allows for an improved quality of the compressed image so that it may be 

more clearly reproduced and seen by the human eye. 

As previously discussed with reference to processing segment 54 of Fig. 3, the 
present invention also provides for contrast masking. Contrast masking refers to the 
reduction in the visibility of one image component by the presence of another. This 
1 0 masking is strongest when both components are of the same spatial frequency, 
orientation, and location within the digital image being compressed. Contrast masking 
is achieved in the present invention by expression (3) as previously described. The 
benefits of the contrast masking function is illustrated in Fig. 6. 

Fig. 6 has a Y axis given in the quantity m^ (DCT mask) and a X axis given in 

1 5 the quantity c ijk (DCT coefficient) and illustrates a response plot 98 of the DCT mask 

m ijk as a function of the DCT coefficient c jjk for the parameter wy = 0.7 and tjjk = 2 3 - 
Because the effect of the DC coefficient c 00 |< upon the luminance masking (see Fig. 5) 

has already been expressed, the plot 98 does not include an effect of the DC 
coefficient c ook and accomplishes such by setting the value of w 00 equal to 0. From 

2 0 Fig. 6 it may be seen that the DCT mask (m ijk ) linearly increases from cy k quantities of 

between about 2 to 10. This DCT mask (m jjk ) generated by processing segment 54 
adjusts each block for component contrast and is used in processing segment 60 to 
scale (divide) the quantization error with both functions of the DCT mask ensuring 
good digital compression, while still providing an image having good visual aspects. 

2 5 It should be appreciated that the practice of the invention provides contrast 

masking so as to provide for a high quality visual representation of compressed digital 
images as compared to other prior art techniques. 

The overall operation of the present invention is essentially illustrated in Fig. 7. 
Fig. 7 has a Y axis given in bits/pixel of the digital image and a X axis given in 

3 0 perceptual error. Fig. 7 illustrates two plots 100 and 102 for two different images that 

were compressed and reconstituted in accordance with the hereinbefore described 
principles of the present invention. From Fig. 7 it is seen that the increasing bits/pixel 
rate causes a decrease in the perception error. 



- 19 - 


The previously given description described herein yields a desired quantization 
matrix qjj with a specified perceptual error y. However, if desired one may have a 
quantization matrix qy that uses a given bit rate h 0 with a minimum perceptual error ¥ . 
This can be done iteratively by noting that the bit rate is a decreasing function of the 
5 perceptual error vj/, as shown in Fig. 7. In the practice of our present invention a 
second order interpolating polynomial fit to all previous estimated values of {h, \\i} to 
estimate a next candidate \|/, terminating when lh-h 0 l<Ah, where Ah is the desired 

accuracy in bit rate. On each iteration a complete estimation of is performed. 

It should now be appreciated that the practice of the present invention provides 
1 0 a perceptual error that incorporates visual masking by luminance and contrast 
techniques, and error pooling to estimate the matrix that has a minimum perceptual 
error for a given bit rate, or minimal bit rate for a given perceptual error. All told the 
present invention provides a digital compression technique that is useful in the 
transmission and reproduction of images particularly those found in high definition 
1 5 television applications. 

Further, although the invention has been described relative to a specific 
embodiment thereof, it is not so limited and many modifications and variations thereof 
now will be readily apparent for those skilled in the art in light of the above teachings. 



ABSTRACT 


A method for performing image compression that eliminates redundant and 
invisible image components. The image compression uses a Discrete Cosine 
5 Transform (DCT) and each DCT coefficient yielded by the transform is quantized by an 
entry in a quantization matrix which determines the perceived image quality and the bit 
rate of the image being compressed. The present invention adapts or customizes the 
quantization matrix to the image being compressed. The quantization matrix 
comprises visual masking by luminance and contrast techniques and by an error 
1 0 pooling technique all resulting in a minimum perceptual error for any given bit rate, or 
minimum bit rate for a given perceptual error. 


& o 



• ARC 12015-1 




DISPLAY 

SUBSYSTEM 


RAM 




1 


CPU 

PROCESSOR 


MODE) 
18(RETRIEVAL MODE) 

— 24 


L 


COMMUNICATIONS 


CHANNEL 

— -22 | 

— 


— 

"' — 12 

J 


- — 3 O’ (IMAGE) 


1 

— 

.—14 

✓ 

COMMUNICATIONS 

— 22 ' 


CHANNEL 

1 1 


30(IMAGE)- 


r 


3 O’ (IMAGE)- 


CPU 

PROCESSOR 


•24 


18 (RETRIEVAL MODE) 
|—16(STORAGE 


i 

L 


A 1 

DISPLAY 

SUBSYSTEM 


RAM 


DISK 

SUBSYSTEM 


FIG-1 

























ARC 12015-1 

















ARC 12015-1 




FIG-4 














PERCEPTUAL ERROR 


FIG- 7