2. Joint Work With 1
• Vasudev Bhaskaran
• Konstantinos Konstantinides
• Daniel T. Lee
• Ho John Lee
• Andrew H. Mutz
• Balas K. Natarajan
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
3. Outline 2
• Background for the standard
• Discrete nature of visual perception
• JPEG data compression
• Optimizing the JPEG compression
• Perceptually lossy compression
• Text sharpening
• Conclusions
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
4. Three Breakthroughs 3
• Digital imaging — compression algorithms
• Hardware cost / performance — SOHO market
• International standard — ITU-T T.42 Addendum
SOHO: Small Office — Home Office
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
5. Project Goal 4
Achieve same transmission time for full color
as for binary black & white
in the case of the 4CP01 test chart
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
6. The Color Facsimile Pipeline 5
‚ RGB to
CIELA
B
ƒ JPEG
coding
„ G3 enc
apsula
tion
¥
A size, 200dpi 4:1:1 ~226K bytes
…
11.22M bytes 5.6M bytes (default DQT)
24 bits/pixel 12 bits/pixel 0.48 bits/pixel
ding
3 deco
G
†
ssion
mpre
G deco
JPE
‡
IELAB
from C
CMYK
ˆ
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
7. Time Phases for T.30 Fax Transmission 6
Phase Calling fax Called fax
CNG beep
A CED
(call setup) DIS
DCS
B
Training, TCF
(pre-message)
CFR
C Training
(message)
Message
RTC
EOP
D
(post-message) MCF
DCN
E
(call release)
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
8. New G3 Entries to DCS Frames 7
Bit No. DCS
68 JPEG coding
69 Full color mode
70 Preferred Huffman tables
71 12 bits/pel/component
72 Extend field
73 No subsampling (1:1:1)
74 Custom illuminant (not used)
75 Custom gamut range
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
9. Four Categories of Business Images 8
1. Full color (pictorial, color photographs)
2. Multi-color (color charts & graphs)
3. Bi-color (documents marked up with red ink)
4. Mixed color (combination of 1–3, such as color pages of
magazines)
Source: NTT
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
10. Color Space Selection 9
17 spaces considered (Munsell, CMYK, YIQ,
CIE colorimetric spaces)
Evaluation criteria (source: Fuji-Xerox):
• ability to represent all colors
• numerical complexity
• device independence
• quantization error under compression
• compatibility with compression algorithms
• color stability with white point change
• …
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
11. Perception is Discrete 10
• When a stimulus is changed just by a small amount, an
observer will not notice a difference
• Psychophysical experiments: threshold value of where an
observer can detect a difference in stimuli
• jnd — just noticeable difference
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
12. Model for Perception Discretization 11
quantity
nature,
continuum
intensity
available
information
jnd
human visual system
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
13. Digital System: Sampling 12
nature,
continuum
sampled
information
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
14. Poor Sampling 13
digital,
sampled
available
information
jnd
human visual system
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
15. Good Sampling 14
digital,
sampled
available
information
jnd
human visual system
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
16. Result of the Color Space Evaluation 15
• 1931 Standard Colorimetric Observer
• CIE Standard Illuminant D50
• CIELAB color space
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
17. The CIELAB Color Space 16
• Based on the CIE 1931 Standard Colorimetric Observer:
device independent
• Based on Munsell color tree, von Kries adaptation, CIE XYZ
color space, and power law compression (n = 1/3): good
perceptual uniformity
• Easy to compute compared to other uniform spaces
• Widely used in printing industry
• Can be read directly with measurement instruments
• Design issue: choice of the best range for the chromatic
channels a* and b*
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
18. Data Compression 17
ISO/IEC IS 10918–1 (a.k.a. JPEG)
compressed
raster to
image stream
entropy
ƒ „
block DCT quantization
coding
translation
quantization Huffman
tables tables
“Critical knob”
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
19. The DCT and its Kernels 18
7 7
( 2x + 1 )kπ ( 2 y + 1 )lπ
1
∑∑
Y ( k, l ) = -- C ( k )C ( l ) S ( x, y ) cos --------------------------- cos --------------------------
- - -
4 16 16
x = 0y = 0
m n + -- π
1
-
2
[ C 8 ] mn = k m cos ---------------------------
-
8
The 64 kernels of the
discrete cosine transform:
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
20. The JPEG Compression 19
Quantization tables (DQT) are the key parameter
• Do not quantize where it can be seen
• Default parameters are for images on CRT displays
• spatial information in text is different
• printers have a much higher resolutions than CRTs
• The three worlds of spatial information:
1. Physical: energy in the signal
2. Perceptual: sensitivity of the visual system
3. Semantic: cognitive mechanisms
Last step: design the Huffman tables
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
21. Physical World 20
DCT transforms 2-dimensional data to a 64-dimensional space
• Each dimension represents a spatial pattern
• For a number of typical images:
• measure the energy in each of the 64 dimensions
• popular estimator for energy: statistical variance
• average over the images in the test set
• allocate bandwidth in proportion to the average energy
• For text documents with images, a much better
compression rate is achieved for a given image quality, than
with the default tables
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
22. L* Energy in Text vs. Pictorial Images 21
106
104
102
Average text
Average picture
100 Text on fancy back
Fax test image
Topographic map
10-2
0 8 16 24 32 40 48 56 64
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
23. L* Energy in Text vs. Pictorial Images 22
1000000
Average text
Average picture
Text on fancy back
Fax test image
Topographic map
10000
100
0 1 2 3 4 5 6 7 8 9 10
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
24. Traditional Bit Allocation 23
V y [ k, l ]
1
= -- ⋅ log -------------------- based on variance
• Bit allocation N k, l - -
D
2
B
1
∑ ( Y i [ k, l ] – M y [k,l] ) 2
V y [ k, l ] = var ( Y [ k, l ] ) = ---
-
B
i=1
To improve bits-per-pixel rate:
1. Brute force: uniform q-factor
2. Perceptual: increase DQT elements based on HVS
300
100
Contrast sensitivity
red–green (chromatic)
30
green (monochromatic)
10
3
1
0.03 0.1 0.3 1 3 10 30 100
HP Laboratories
Spatial frequency (cycles/degree)
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
25. Perceptual World — Simple Method 24
Image quality depends on the contrast visible at a given spatial
resolution: contrast sensitivity function (CSF)
• Discard spatial information above the CSF: it cannot be seen
anyway
• Standard method: weigh the DQT elements by the CSF
• Printer resolution is 600 dpi, same as visual system
• improved compression of images
• for text pattern recognition may be more important than CSF
• Not necessarily a good model for what happens in the
visual system
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
26. Compression Ratio 25
Image (10.7M B) Default DQT Custom DQT
Real estate flier with photo 52:1 (211K B) 82:1 (134K B)
Book page with photos and text 53:1 (207K B) 63:1 (174K B)
4CP01 test chart 47:1 (233K B) 63:1 (174K B)
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
27. Perceptual World — Complex Method 26
When two structures (textures) are present in an image, one
structure may hide the other
• Principle of visual masking
• When a new structure is added to an existing structure, the
new structure may mask the old structure or vice-versa
• Quantization noise is a structure that is added to the image
• Noise that is masked by the image is perfectly acceptable;
this allows for higher compression ratios
• This method is used mostly for image-dependent
compression (adaptive)
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
28. Results 27
1. If the method is applied to several images of the same
type, there is little variation in the obtained DQTs
• Hence, in the case of color facsimile we can use one DQT for
all images
2. Adjust each DQT element to reach the threshold
3. The iterative method converges faster when the
previous steps are performed
4. Lossy compression: any jnd value can be targetted
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
29. Semantic World 28
Reading performance of text is the speed at which text can be
read without errors
• When compressing, discard information that does not
impact reading performance
• Identify the parts of characters in fonts that affect reading
performance
• Discard prevalently spatial information not related to these
character parts
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
30. Visual Impact of Quantization 29
Depends on Image and Sub-Space
< fine
coarse >
< mixed
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
31. Some Typeface Parts in Times Roman 30
itag
bar ear
stress
stem
serif terminal
7 point 9 point 11 point 12 point 16 point
1.12 mm 1.44 mm 1.76 mm 1.95 mm 2.58 mm
X-Height in the Times New Roman typeface
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
32. Critical Feature Sizes in Color 31
Facsimile
Number of pixels 200 dpi 300 dpi 400 dpi
0.127 mm 0.085 mm 0.063 mm
1
0.254 mm 0.169 mm 0.127 mm
2
0.381 mm 0.254 mm 0.191 mm
3
0.508 mm 0.339 mm 0.254 mm
4
0.635 mm 0.423 mm 0.318 mm
5
0.762 mm 0.508 mm 0.381 mm
6
0.889 mm 0.593 mm 0.445 mm
7
1.016 mm 0.677 mm 0.508 mm
8
Red: Limit for visual acuity in the luminance channel
Green: Limit for visual acuity in the chrominance channels
Blue: Peak contrast sensitivity in the luminance channel
Pink: typical character part sizes (11 point Times New Roman)
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
33. Modified Bit Allocation Equation 32
w [ k, l ] V y [ k, l ]
1
= -- ⋅ log
N k, l - --------------------
-
• Weight w based on visual quality:
D
2
• Separate provisions for text, graphics, and pictorial images (smoothing)
• Detail of spatial frequency in image data, ind. of phase angle, resides almost
totally within 3 coefficients in the transform domain when a DCT is applied.
• Rudimentary example of a weight table:
1 1 1 3/4 1 3/4 1 1
1 3/4 1/2 1/4 1/2 1/4 1/2 1/2
1 1/2 1/4 1/8 1/4 1/8 1/4 1/4
3/4 1/4 1/8 1/8 1/8 1/8 1/8 1/8
1 1/2 1/4 1/8 1/8 1/8 1/8 1/8
3/4 1/4 1/8 1/8 1/8 1/8 1/8 1/8
1 1/2 1/4 1/8 1/8 1/8 1/8 1/8
1 1/2 1/4 1/8 1/8 1/8 1/8 1/8
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
34. Preliminary Experimental Results 33
• Compress two typical 300 dpi images
• Comparable visual quality
• q factor 200; use K.2 for the chrominance channels
Text 4CP01
K.1 NEW K.1 NEW
2550 × 3300
Bitmap size
338,776 277,617 707,819 258,765
Compressed size
(bytes)
0.32 0.26 0.67 0.25
Bits per pixel
1 : 75 1 : 91 1 : 36 1 : 98
Compression ratio
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
35. The Fuzzy Text Problem 34
Cost reduction introduces problems such as
• sensor misalignment
• optical blur and electric cross-talk
• halftoning at low resolution
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
36. Comparison of L* Energy in all Images 35
106
104
102
4CP01 test image
Average pictorial
Average text
Text on fancy back
150 lpi brochure
Newspaper
100 Group portrait
Realtor flier
10 pt sans text
12 pt sans text
10 pt serif text
12 pt serif text
Synthetic text
Topographic map
10-2
0 8 16 24 32 40 48 56 64
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
37. Comparison of Low Frequency Energy 36
106
4CP01 test image
Average pictorial
Average text
Text on fancy back
150 lpi brochure
Newspaper
Group portrait
Realtor flier
10 pt sans text
12 pt sans text
10 pt serif text
12 pt serif text
Synthetic text
104 Topographic map
102
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
38. L* Energy in Pictorial Images 37
1000000
10000
100
Average
150 lpi brochure
Text is
Newspaper
1 Group portrait
different
Realtor flier
Average text
0
0 8 16 24 32 40 48 56 64
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
39. L* Energy in Text Images 38
106
104
102
Average
10 pt sans
12 pt sans
100 10 pt serif
12 pt serif
(synthetic)
Robustness vs. font
10-2
0 8 16 24 32 40 48 56 64
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
40. Enhancement of 10 Point Serif Text 39
106
104
Variance
102
100 Unprocessed
Convolution Filter
New Algorithm
Synthetic
10-2
0 8 16 24 32 40 48 56 64
Basis
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
41. Sharpening an Image in the JPEG 40
Domain During the Encoding
generate compute
reference average
image energy
quantization
scale table
scaling matrix
compute
typical
average
scanned
energy
image
• Edge sharpening is achieved by using the original DQT for
the encoding, while at the receiving fax machine the
decoder uses the scaled DQT matrix
• The scaled matrix is the one included in the JPEG file
• Since only the scaled DQT is transmitted the original matrix
and the actual factors can remain a trade secret
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
42. Huffman Tables 41
• Example Huffman Table is for perceptually lossless
compression
• Color facsimile based on CIELAB color space
• Start with test chart
Ad hoc technique: all symbol probabilities less than 2–16 are
•
set to 2–16
• Improvement: 8% to 14%
• Only 0.5% compression rate loss for other images
• Average improvement: 11%
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96
43. Conclusions 42
• Digital images with text are not very robust with respect to
quantization errors
• Sufficient information should be preserved in documents to
preserve image quality when they are printed
• If the bandwidth is used judiciously, documents can be
compressed to a higher degree, allowing the use of better
resolutions or shorter transmission times
1. Encode color in a perceptually uniform color space such
as CIELAB
2. Compress the spatial information using JPEG
3. Design custom DQT tables for your document type
4. Design custom Huffman tables
HP Laboratories
10/17/96 Hiro:Documents:Giordano Beretta:Research:Chiba96:OHP:Chiba96