SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Generating PDF from Python web applications
Gaël LE MIGNOT
Pilot Systems
June 6, 2017
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Summary
1 Introduction
2 Tools
3 Tips, tricks and pitfalls
4 Conclusion
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Introduction
Pilot Systems
Free Software service provider
Python Web application development and hosting
Using Zope/Plone (since 2000) and Django (since 0.96)
All kind of customers (public/private, small/big, . . . )
Generating PDFs
Very frequently asked
Different purpose require different tools
Several pitfalls to avoid
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Weasyprint - presentation
What is weasyprint?
Free Software Python library
Convert HTML5 page (using a print CSS) into PDF
Also exists in command-line
When to use it?
To convert an existing HTML document
Consistency: same templating engine, same language
For simple page layouts
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Weasyprint - code details
Simple usage
from weasyprint import HTML, CSS
html = template()
data = HTML(string=html).write_pdf()
Some mangling with BeautifulSoup
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
bl = ('typography.com', 'logged-in.css')
for css in soup.findAll("link"):
for cssname in bl:
if cssname in css['href']:
css.extract()
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Weasyprint - code details
Add some page header/footer
@page {
margin: 3cm 2cm;
@bottom-right {
content: "Page " counter(page)
}
@top-center {
content: "Pilot Systems";
}
}
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Reportlab - presentation
What is reportlab?
Python library for generating PDF and graphs
Powerful RML templating language
Template and story concepts
Versions and tools
Complicated licensing
Reportlab PDF toolkit: limited Free Software version
Reportlab PLUS: non-free complete version
trml2pdf: free software, third-party implementation of RML
RMLPageTemplate: Zope integration of trml2pdf
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Reportlab - code example
Bright warning
<template>
<fill color="yellow"/>
<rect x="115mm" y="217mm"
width="90mm" height="18mm"
fill="yes" stroke="yes"/>
<frame id="warning" x1="115mm" y1="213mm"
width="90mm" height="24mm" />
</template>
<story>
<para>
TEMPORARY DOCUMENT - DO NOT PRINT
</para>
</story>
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
pdftk
What is pdftk?
Toolbox to manipulate PDF
Perform operation like extract pages, concatenate
Can also stamp a PDF on top of another
Command-line tool, so use subprocess
Use-case
Afdas - collect taxes and finance training
Companies make a yearly declaration
Take a background and fill cells
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
LATEX
What is LATEX?
Very powerful document composition system
Used for scientific publishing, among others
Used for those slides, too
How to use it?
Generate a .tex file
Can use a template, or intermediate language (like rst)
Then execute pdflatex
When to use it?
Rich formatting
Table of content, index, glossary, . . .
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Other tools
For the brave
Client-side rendering with JS libraries
Using LibreOffice with pyuno
Generate QR-code/datamatrix with elaphe
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
HTTP Headers
Don’t forget HTTP headers
Specify the content-type
Hint between displaying and downloading
Provide default filename
Code example
response.setHeader('Content-Type',
'application/pdf')
cd = 'attachment; filename="%s"' % filename
response.setHeader('Content-Disposition', cd)
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Handling long generation times
The problem
Generate a PDF report of 500 pages
It takes 10 minutes
Timeout or users get angry
Solutions
Increase timeouts, inform users
Use fork or threads to generate async
Use a scheduler like Celery
Send the result by email, with a link
Cleaning
find /path/to/pdfs -mtime +14 -delete
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Careful with search engines
Typical situation
Public website (using a CMS)
Button on each page to get a PDF version
A crawler comes... and boom.
Don’t panic
Use robots.txt file, but limited
Have the button do a POST
Use load-balancer like haproxy and pin PDF requests
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
CPU and RAM usage
PDF generation is expensive
PDF generation can be heavy both in CPU and RAM
Always estimate your volume before deploying
Task schedulers (like Celery) are great help
Be nice!
#!/bin/sh
PDFTK=/usr/bin/pdftk
exec nice -n 10 taskset -c 0 $PDFTK "$@"
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Accessing external resources
The problem
Restricted access CSS and images
Common with weasyprint, but can also happen with other
tools
Solutions
Reuse the user’s cookies in the sub-requests
Extract the resources to a temporary directory
Allow unprotected access from localhost (dangerous)
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Accessing external resources
Cookie code example
import urllib2
cookies = request.cookies.items()
cookies = [ '%s=%s' % (k,v) for k,v in cookies ]
cookiestr = "; ".join(cookies)
cookiestr = cookiestr.replace('n', '')
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', cookies))
html = opener.open(ressource_url).read()
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Encrypted PDFs
Typical use-case
User submitted a form with text fields and PDF
attachments
At the end the answers are contactened into a PDF
Or even all the answers of all users!
Use weasyprint + pdftk or LATEX
What happens
It works most of the time
But on some PDF it breaks weirdly
The culprit: DRM (Digital Restrictions Management)
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Encrypted PDFs
What to do?
Ensure the PDF is not DRM-protected
Use pdfinfo from poppler
Code example
out = subprocess.check_output([ 'pdfinfo',
pdffile ])
if re.search('Encrypted:.*yes', out):
raise ValueError, "DRM protected"
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Conclusion
Conclusion
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
Conclusion
Thanks for listening!
Any question?
Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications

Contenu connexe

Tendances

Tendances (20)

DaNode - A home made web server in D
DaNode - A home made web server in DDaNode - A home made web server in D
DaNode - A home made web server in D
 
Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.
 
Infrastructure as "Code" with Pulumi
Infrastructure as "Code" with PulumiInfrastructure as "Code" with Pulumi
Infrastructure as "Code" with Pulumi
 
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
 
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5
Getting started with Emscripten – Transpiling C / C++ to JavaScript / HTML5
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)
 
20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발
 
Beachhead implements new opcode on CLR JIT
Beachhead implements new opcode on CLR JITBeachhead implements new opcode on CLR JIT
Beachhead implements new opcode on CLR JIT
 
Nodejs intro
Nodejs introNodejs intro
Nodejs intro
 
#PDR15 - waf, wscript and Your Pebble App
#PDR15 - waf, wscript and Your Pebble App#PDR15 - waf, wscript and Your Pebble App
#PDR15 - waf, wscript and Your Pebble App
 
PyHEP 2018: Tools to bind to Python
PyHEP 2018:  Tools to bind to PythonPyHEP 2018:  Tools to bind to Python
PyHEP 2018: Tools to bind to Python
 
Data Management and Streaming Strategies in Drakensang Online
Data Management and Streaming Strategies in Drakensang OnlineData Management and Streaming Strategies in Drakensang Online
Data Management and Streaming Strategies in Drakensang Online
 
Node js introduction
Node js introductionNode js introduction
Node js introduction
 
An Introduction of Node Package Manager (NPM)
An Introduction of Node Package Manager (NPM)An Introduction of Node Package Manager (NPM)
An Introduction of Node Package Manager (NPM)
 
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite PreviewMachine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview
 
Puppetizing Your Organization
Puppetizing Your OrganizationPuppetizing Your Organization
Puppetizing Your Organization
 
Real-Time Web Apps & Symfony. What are your options?
Real-Time Web Apps & Symfony. What are your options?Real-Time Web Apps & Symfony. What are your options?
Real-Time Web Apps & Symfony. What are your options?
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 
node.js dao
node.js daonode.js dao
node.js dao
 
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
Escape the Walls of PaaS: Unlock the Power & Flexibility of DigitalOcean App ...
 

Similaire à Ways to generate PDF from Python Web applications, Gaël Le Mignot

Similaire à Ways to generate PDF from Python Web applications, Gaël Le Mignot (20)

PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemo
 
Django by rj
Django by rjDjango by rj
Django by rj
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Introduction to Google App Engine with Python
Introduction to Google App Engine with PythonIntroduction to Google App Engine with Python
Introduction to Google App Engine with Python
 
Tool overview – how to capture – how to create basic workflow .pptx
Tool overview – how to capture – how to create basic workflow .pptxTool overview – how to capture – how to create basic workflow .pptx
Tool overview – how to capture – how to create basic workflow .pptx
 
CGI by rj
CGI by rjCGI by rj
CGI by rj
 
Continuous Delivery for Python Developers – PyCon Otto
Continuous Delivery for Python Developers – PyCon OttoContinuous Delivery for Python Developers – PyCon Otto
Continuous Delivery for Python Developers – PyCon Otto
 
Company Visitor Management System Report.docx
Company Visitor Management System Report.docxCompany Visitor Management System Report.docx
Company Visitor Management System Report.docx
 
بررسی چارچوب جنگو
بررسی چارچوب جنگوبررسی چارچوب جنگو
بررسی چارچوب جنگو
 
Lamp Zend Security
Lamp Zend SecurityLamp Zend Security
Lamp Zend Security
 
Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024Software Quality Assurance Tooling - Wintersession 2024
Software Quality Assurance Tooling - Wintersession 2024
 
Taking Your FDM Application to the Next Level with Advanced Scripting
Taking Your FDM Application to the Next Level with Advanced ScriptingTaking Your FDM Application to the Next Level with Advanced Scripting
Taking Your FDM Application to the Next Level with Advanced Scripting
 
Programmable infrastructure with FlyScript
Programmable infrastructure with FlyScriptProgrammable infrastructure with FlyScript
Programmable infrastructure with FlyScript
 
How do we do it
How do we do itHow do we do it
How do we do it
 
Cloud Native Development
Cloud Native DevelopmentCloud Native Development
Cloud Native Development
 
From localhost to the cloud: A Journey of Deployments
From localhost to the cloud: A Journey of DeploymentsFrom localhost to the cloud: A Journey of Deployments
From localhost to the cloud: A Journey of Deployments
 
Pre press workflow
Pre press workflowPre press workflow
Pre press workflow
 
Php Development Stack
Php Development StackPhp Development Stack
Php Development Stack
 
Php Development Stack
Php Development StackPhp Development Stack
Php Development Stack
 
Improving code quality using CI
Improving code quality using CIImproving code quality using CI
Improving code quality using CI
 

Plus de Pôle Systematic Paris-Region

Plus de Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Ways to generate PDF from Python Web applications, Gaël Le Mignot

  • 1. Generating PDF from Python web applications Gaël LE MIGNOT Pilot Systems June 6, 2017 Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 2. Summary 1 Introduction 2 Tools 3 Tips, tricks and pitfalls 4 Conclusion Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 3. Introduction Pilot Systems Free Software service provider Python Web application development and hosting Using Zope/Plone (since 2000) and Django (since 0.96) All kind of customers (public/private, small/big, . . . ) Generating PDFs Very frequently asked Different purpose require different tools Several pitfalls to avoid Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 4. Weasyprint - presentation What is weasyprint? Free Software Python library Convert HTML5 page (using a print CSS) into PDF Also exists in command-line When to use it? To convert an existing HTML document Consistency: same templating engine, same language For simple page layouts Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 5. Weasyprint - code details Simple usage from weasyprint import HTML, CSS html = template() data = HTML(string=html).write_pdf() Some mangling with BeautifulSoup from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') bl = ('typography.com', 'logged-in.css') for css in soup.findAll("link"): for cssname in bl: if cssname in css['href']: css.extract() Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 6. Weasyprint - code details Add some page header/footer @page { margin: 3cm 2cm; @bottom-right { content: "Page " counter(page) } @top-center { content: "Pilot Systems"; } } Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 7. Reportlab - presentation What is reportlab? Python library for generating PDF and graphs Powerful RML templating language Template and story concepts Versions and tools Complicated licensing Reportlab PDF toolkit: limited Free Software version Reportlab PLUS: non-free complete version trml2pdf: free software, third-party implementation of RML RMLPageTemplate: Zope integration of trml2pdf Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 8. Reportlab - code example Bright warning <template> <fill color="yellow"/> <rect x="115mm" y="217mm" width="90mm" height="18mm" fill="yes" stroke="yes"/> <frame id="warning" x1="115mm" y1="213mm" width="90mm" height="24mm" /> </template> <story> <para> TEMPORARY DOCUMENT - DO NOT PRINT </para> </story> Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 9. pdftk What is pdftk? Toolbox to manipulate PDF Perform operation like extract pages, concatenate Can also stamp a PDF on top of another Command-line tool, so use subprocess Use-case Afdas - collect taxes and finance training Companies make a yearly declaration Take a background and fill cells Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 10. LATEX What is LATEX? Very powerful document composition system Used for scientific publishing, among others Used for those slides, too How to use it? Generate a .tex file Can use a template, or intermediate language (like rst) Then execute pdflatex When to use it? Rich formatting Table of content, index, glossary, . . . Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 11. Other tools For the brave Client-side rendering with JS libraries Using LibreOffice with pyuno Generate QR-code/datamatrix with elaphe Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 12. HTTP Headers Don’t forget HTTP headers Specify the content-type Hint between displaying and downloading Provide default filename Code example response.setHeader('Content-Type', 'application/pdf') cd = 'attachment; filename="%s"' % filename response.setHeader('Content-Disposition', cd) Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 13. Handling long generation times The problem Generate a PDF report of 500 pages It takes 10 minutes Timeout or users get angry Solutions Increase timeouts, inform users Use fork or threads to generate async Use a scheduler like Celery Send the result by email, with a link Cleaning find /path/to/pdfs -mtime +14 -delete Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 14. Careful with search engines Typical situation Public website (using a CMS) Button on each page to get a PDF version A crawler comes... and boom. Don’t panic Use robots.txt file, but limited Have the button do a POST Use load-balancer like haproxy and pin PDF requests Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 15. CPU and RAM usage PDF generation is expensive PDF generation can be heavy both in CPU and RAM Always estimate your volume before deploying Task schedulers (like Celery) are great help Be nice! #!/bin/sh PDFTK=/usr/bin/pdftk exec nice -n 10 taskset -c 0 $PDFTK "$@" Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 16. Accessing external resources The problem Restricted access CSS and images Common with weasyprint, but can also happen with other tools Solutions Reuse the user’s cookies in the sub-requests Extract the resources to a temporary directory Allow unprotected access from localhost (dangerous) Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 17. Accessing external resources Cookie code example import urllib2 cookies = request.cookies.items() cookies = [ '%s=%s' % (k,v) for k,v in cookies ] cookiestr = "; ".join(cookies) cookiestr = cookiestr.replace('n', '') opener = urllib2.build_opener() opener.addheaders.append(('Cookie', cookies)) html = opener.open(ressource_url).read() Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 18. Encrypted PDFs Typical use-case User submitted a form with text fields and PDF attachments At the end the answers are contactened into a PDF Or even all the answers of all users! Use weasyprint + pdftk or LATEX What happens It works most of the time But on some PDF it breaks weirdly The culprit: DRM (Digital Restrictions Management) Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 19. Encrypted PDFs What to do? Ensure the PDF is not DRM-protected Use pdfinfo from poppler Code example out = subprocess.check_output([ 'pdfinfo', pdffile ]) if re.search('Encrypted:.*yes', out): raise ValueError, "DRM protected" Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 20. Conclusion Conclusion Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications
  • 21. Conclusion Thanks for listening! Any question? Gaël LE MIGNOT Pilot Systems Generating PDF from Python web applications