DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Zenika - iText in Action
1. iText in Action — 2nd Edition
Bruno Lowagie @ Zenika
March 10, 2011
1T3XT BVBA, the iText Company http://itextpdf.com/
2. About this talk
• 2010:
– History of iText: development & IP
– How to write a book
– Book preview
• 2011:
– Book overview
– Samples: code snippets, PDFs, techniques
– The future of iText
1T3XT BVBA, the iText Company http://itextpdf.com/
6. Chapter info on itextpdf.com
1T3XT BVBA, the iText Company http://itextpdf.com/
7. Part 1
Creating PDF from scratch
• Ch 1: Introducing PDF and iText
• Ch 2: Using iText’s basic building blocks
• Ch 3: Adding content at absolute
positions
• Ch 4: Organizing content in tables
• Ch 5: Table, cell, and page events
1T3XT BVBA, the iText Company http://itextpdf.com/
8. Creating PDF from scratch
Hello World
Creating PDF with iText // step 1
1. Create a Document Document document = new Document();
2. Create a Writer // step 2
3. Open the Document PdfWriter.getInstance(
4. Add content document, new FileOutputStream(filename));
// step 3
5. Close the Document
document.open();
// step 4
document.add(new Paragraph("Hello World!"));
// step 5
document.close();
1T3XT BVBA, the iText Company http://itextpdf.com/
14. Part 2
Manipulating existing PDF documents
• Ch 6: Working with existing PDFs
• Ch 7: Making documents interactive
• Ch 8: Filling out interactive forms
1T3XT BVBA, the iText Company http://itextpdf.com/
25. Fill out the form
• XFA
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
XfaForm xfa = form.getXfa();
xfa.fillXfaForm(new FileInputStream(xml));
stamper.close();
1T3XT BVBA, the iText Company http://itextpdf.com/
27. A look inside the form
1T3XT BVBA, the iText Company http://itextpdf.com/
28. Part 3
Essential iText skills
• Ch 9: Integrating iText in your web
application
• Ch 10: Brightening your document with
color and images
• Ch 11: Choosing the right font
• Ch 12: Protecting your PDF
1T3XT BVBA, the iText Company http://itextpdf.com/
29. Structure of a
PDF file %PDF-1.x
%âãÏ•
Ó
1 0 obj
...
A PDF file consists 2 0 obj
of a collection of ... (Hello World) Tj ...
objects. xref
A PDF files starts 0 81
0000000000 65535 f
with %PDF-1.x and 0000000015 00000 n
...
ends with %%EOF trailer
<< ... >>
startxref
15787
%%EOF
1T3XT BVBA, the iText Company http://itextpdf.com/
30. Changing the
content of a
PDF file %PDF-1.x
%âãÏ•
Ó
1 0 obj
...
2 0 obj
You can use ... (Hello People) Tj ...
software to change 121 0 obj
...
the content of a xref
PDF document: 0 85
0000000000 65535 f
change a stream, 0000000015 00000 n
add objects (e.g ...
trailer
annotations), and << ... >>
so on. startxref
16157
%%EOF
1T3XT BVBA, the iText Company http://itextpdf.com/
31. What are our concerns?
• Integrity—we want assurance that the
document hasn’t been changed
somewhere in the workflow
• Authenticity—we want assurance that
the author of the document is who we
think it is (and not somebody else)
• Non-repudiation—we want assurance
that the author can’t deny his authorship.
1T3XT BVBA, the iText Company http://itextpdf.com/
32. Integrity
• A digest is computed over a range of
bytes from the file.
• This ByteRange is signed using the private
key of the sender.
• This digest and the sender’s Certificate
are embedded in the PDF.
• The receiver compares the embedded
digest with the digest of the content.
1T3XT BVBA, the iText Company http://itextpdf.com/
33. Digital
Signature
field %PDF-1.x
%âãÏ•Ó
1 0 obj
...
2 0 obj
A signed PDF file <<
contains a signature /Type/Sig /Contents/...
>>
dictionary. ...
The binary value of xref
0 81
the PDF signature is 0000000000 65535 f
...
placed into the trailer
Contents entry of a << ... >>
startxref
signature dictionary. 15787
%%EOF
1T3XT BVBA, the iText Company http://itextpdf.com/
34. Embedded
Digital
Signature %PDF-1.x
%âãÏ•
Ó
...
2 0 obj
<<... /Type/Sig /Contents<
The digital
DIGITAL
signature isn’t part SIGNATURE
of the ByteRange. > ... >>
xref
There are no bytes 0 81
in the PDF that 0000000000 65535 f
...
aren’t covered, trailer
other than the PDF << ... >>
startxref
signature itself. 15787
%%EOF
1T3XT BVBA, the iText Company http://itextpdf.com/
35. Cryptography
• Symmetric key algorithms: the same key
is used to encrypt and decrypt content.
• Asymmetric key algorithms: a public key
is used to encrypt, a private key is used to
decrypt (for encryption purposes).
• Or, a private key is used to encrypt, a
public key is used to decrypt (for digital
signatures).
1T3XT BVBA, the iText Company http://itextpdf.com/
36. Obtain a public/private key
• Create your own keystore (with the
private key) and self-signed certificate
(with the public key); e.g. using keytool
• Ask a Certificate Authority (CA) to sign
your certificate to prove your identity
• A Certificate signed by a CA’s private key
can be decrypted with the CA’s root
certificate (stored in Adobe Reader)
1T3XT BVBA, the iText Company http://itextpdf.com/
37. Digital Signatures
Stored on the producer’s side Received by the consumer
• Certificate %PDF-1.x
– Public key ...
/ByteRange ...
– Identity info
/Contents<
• Private key DIGITAL SIGNATURE
• Original document • Certificate
• Signed Message Digest
ByteRange • Timestamp
>...
%%EOF
1T3XT BVBA, the iText Company http://itextpdf.com/
38. Possible architecture
Application Device
%PDF-1.x
...
DIGITAL SIGNATURE
• Certificate
• Signed Message
Digest
• Timestamp
...
%%EOF
Existing PDF document Fill out signature field Externally sign digest
Created by PDF producer Using iText created with iText
1T3XT BVBA, the iText Company http://itextpdf.com/
39. Displaying digital signatures
• Digital signatures are part of the file
structure: it isn’t mandatory for a digital
signature to be displayed on a page.
• Digital signatures are listed in the
signature panel.
• A digital signature can be visualized as a
field widget (this widget can consist of
graphics, text,...).
1T3XT BVBA, the iText Company http://itextpdf.com/
44. Important note
• A signature signs the complete
document.
• The concept of signing separate pages in
a document (“to initial a document”)
doesn’t exist in PDF.
• Legal issue: how to prove that a person
who signed for approval has read the
complete document?
1T3XT BVBA, the iText Company http://itextpdf.com/
45. Serial
signatures %PDF-1.x
% Original document
DIGITAL SIGNATURE 1
...
A PDF document can %%EOF Rev1
be signed more than % Additional content 1
once, but parallel ...
DIGITAL SIGNATURE 2
signatures aren’t
...
supported, only %%EOF Rev2
serial signatures: % Additional content 2
...
additional signatures
DIGITAL SIGNATURE 3
sign all previous
...
signatures. %%EOF Rev3
1T3XT BVBA, the iText Company http://itextpdf.com/
47. Types of signatures
• Certification (aka author) signature—
only possible for the first revision;
involves modification detection
permissions.
• Approval (aka recipient) signature—
workflow with subsequent signers.
• Usage Rights signature— involving
Adobe’s private key to Reader enable a
PDF (off-topic here).
1T3XT BVBA, the iText Company http://itextpdf.com/
48. Problems solved?
• Integrity—signature is invalidated if bytes
are changed
• Authenticity—Certificate Authority
verifies the identity of the owner of the
private key
• Non-repudiation—the author is the only
one who has access to the private key
1T3XT BVBA, the iText Company http://itextpdf.com/
49. What if?
• What if the author’s private key is
compromised?
• What if the author falsifies the creation
date of the document?
• What if the certificate expires too soon?
1T3XT BVBA, the iText Company http://itextpdf.com/
50. Revocation checking
• Certificate Revocation List (CRL)
The certificate is checked against a list of
revoked certificates.
• Online Certificate Status Protocol (OCSP)
The revokation status is obtained from a
server.
If the certificate was revoked, the
signature is invalid.
1T3XT BVBA, the iText Company http://itextpdf.com/
52. Timestamping
• The timestamp of a signature can be
based on the signer’s local machine time,
• Or the signer can involve a Time Stamp
Authority (TSA). The message digest is
sent to a trusted timestamp server. This
server adds a timestamp and signs the
resulting hash using the TSA’s private key.
• The signer can’t forge the time anymore.
1T3XT BVBA, the iText Company http://itextpdf.com/
54. PAdES - LTV
• PAdES: PDF Advanced Electronic Signatures
• LTV: Long Term Validation
• Requires extensions to ISO-32000-1
• Described by ETSI in TS 102 778 part 4
• Requires Document Security Store (DSS)
and Document Timestamp
• A new DSS+TS are added before expiration
of the last document timestamp
1T3XT BVBA, the iText Company http://itextpdf.com/
55. Part 4
Under the hood
• Ch 13: PDFs inside-out
• Ch 14: The imaging model
• Ch 15: Page content and structure
• Ch 16: PDF streams
1T3XT BVBA, the iText Company http://itextpdf.com/
66. Flash component in PDF
1T3XT BVBA, the iText Company http://itextpdf.com/
67. The future of iText
Five ideas for 2011
• The frustration of working with HTMLWorker
• Finally start working on XFA to PDF conversion
• Digital Signatures: PAdES, timestamps,...
• Eclipse plug-in for iText
• iText for Android
Additional ideas:
• Accessibility (Tagged PDF, PDF/UA?)
• GIS Options
1T3XT BVBA, the iText Company http://itextpdf.com/
68. HTMLWorker
• Support for straight forward HTML
– No URL to PDF conversion yet
– Support for more HTML tags and CSS styles
– Target for iText 5.1 (April 2011)
• HTML generated with FCKEditor and TinyMC
• “Rich Text” as defined in XFA and PDF specs
• Support for all HTML would be nice too
– Full blown HTML to PDF conversion
– Do what a browser does
1T3XT BVBA, the iText Company http://itextpdf.com/
69. XFA to PDF
• The new HTMLWorker will be based on a
new class XMLWorker
• XFA is the XML Forms Architecture
• With Adobe’s “Rich Text”, we’re already
implementing a small part of the XFA.
• Once iText 5.1 is released we’re ready to
start an XFA to PDF project, but...
• Is there a sponsor for such a project?
1T3XT BVBA, the iText Company http://itextpdf.com/
70. Digital Signatures
• PAdES: needs to be in future iText version
• Signing server: product?
• Timestamp server: service?
1T3XT BVBA, the iText Company http://itextpdf.com/
71. iText for Android
• iText light for phones
– Demo: Hello world
• iText full for tablet PCs
1T3XT BVBA, the iText Company http://itextpdf.com/