10. What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
11. What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
12. What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
13. What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source
16. How PDF is structured
Header PDF version
PDF file
17. How PDF is structured
Header PDF version
sequence of objets
Body
[Objects] fonts, drawing cmds, images,
words, bookmarks, form fields
PDF file
18. How PDF is structured
Header PDF version
sequence of objets
Body
[Objects] fonts, drawing cmds, images,
words, bookmarks, form fields
xRef Table mapping objID byte offset
PDF file
19. How PDF is structured
Header PDF version
sequence of objets
Body
[Objects] fonts, drawing cmds, images,
words, bookmarks, form fields
xRef Table mapping objID byte offset
Trailer root objID, xRef byte offset
PDF file root obj = ref to pages catalog
22. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
23. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
24. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
25. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
26. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Intermediate
• page.startRendering(graphics) Representation
• read & convert all PDF cmds ➟ IR
27. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Intermediate
• page.startRendering(graphics) Representation
• read & convert all PDF cmds ➟ IR PartialEvaluator
28. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Intermediate
• page.startRendering(graphics) Representation
• read & convert all PDF cmds ➟ IR PartialEvaluator
• load required objects (fonts, images)
29. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Intermediate
• page.startRendering(graphics) Representation
• read & convert all PDF cmds ➟ IR PartialEvaluator
• load required objects (fonts, images)
• graphics.executeIR(IR)
30. Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N) Intermediate
• page.startRendering(graphics) Representation
• read & convert all PDF cmds ➟ IR PartialEvaluator
• load required objects (fonts, images)
• graphics.executeIR(IR) CanvasGraphics
49. Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
50. Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
51. Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
➡ :( no direct memory access, postMessage
54. Main Web
Thread Worker
data Partial
Data
“get page 2” Evaluator
55. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
56. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
57. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
obj#3,
dict.x,
dict.y
)
58. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
obj#3,
dict.x,
dict.y
)
59. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw(
obj#3,
“foo”,
dict.x,
20,
dict.y
30
)
)
60. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw( IR
obj#3,
“foo”,
dict.x,
20,
dict.y
30
)
)
61. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw( IR
obj#3,
“foo”,
Graphics dict.x,
20,
dict.y
30
)
)
62. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw( IR
obj#3,
“foo”,
Graphics dict.x,
IR cmds 20,
dict.y
30
)
)
63. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw( IR
obj#3,
“foo”,
Graphics dict.x,
IR cmds 20,
dict.y
30
)
)
64. Main Web
Thread Worker
data Partial
Data Data
“get page 2” Evaluator
builds
draw(
draw( IR
obj#3,
“foo”,
Graphics dict.x,
IR cmds 20,
dict.y
30
)
draw on )
canvas
65.
66. 5 0 obj
<<
/Length 8 0 R
>>
stream
/GS1 gs
/F0 12 Tf
BT
100 700 Td
(Hello World!) Tj
ET
50 600 m
400 600 l
S
endstream
endobj
67. 5 0 obj PartialEvaluator
<<
/Length 8 0 R
>>
stream
/GS1 gs
/F0 12 Tf
BT
100 700 Td
(Hello World!) Tj
ET
50 600 m
400 600 l
S
endstream
endobj
68. 5 0 obj xRef, catalog,
+ resources PartialEvaluator
<<
/Length 8 0 R
>>
stream
/GS1 gs
/F0 12 Tf
BT
100 700 Td
(Hello World!) Tj
ET
50 600 m
400 600 l
S
endstream
endobj
69. 5 0 obj xRef, catalog,
+ resources PartialEvaluator
<<
/Length 8 0 R
>>
stream
/GS1 gs
/F0 12 Tf
BT
100 700 Td
(Hello World!) Tj
ET
50 600 m
400 600 l
S
endstream
endobj Graphics
83. Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
84. Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
• no native support for Jpeg 2000
85. Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
• no native support for Jpeg 2000
➡ use EMScripten: C-Lib ➟ JS
86. Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
• no native support for Jpeg 2000
➡ use EMScripten: C-Lib ➟ JS
‣ works, but not that performant
89. Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
90. Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS:
@font-face { font-family:'font0';
src:url(data:font/opentype;base64, ...)
91. Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS:
@font-face { font-family:'font0';
src:url(data:font/opentype;base64, ...)
• some fonts can’t be converted :(
92. Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS:
@font-face { font-family:'font0';
src:url(data:font/opentype;base64, ...)
• some fonts can’t be converted :(
• paint them
93. Fonts
Type I convert to Type II
Type II “use directly”
Type III paint ourself
CDI convert to Type II
94. Fonts
Type I convert to Type II
still need
Type II “use directly” to repair
fonts!
Type III paint ourself
CDI convert to Type II
111. Infrastructure
• AreWePdfYet?
• Take top100 PDFs from Google
• render the first 5 pages each
• compare to Preview
• http://people.mozilla.com/~bdahl/
corpusreport/test/ref/
131. More Testing
• use PDF.JS extension!
• http://mozilla.github.com/pdf.js/extensions/
firefox/pdf.js.xpi
• report broken PDFs!
• help us categorize issues