SlideShare a Scribd company logo
Ruby on Redis	

Pascal Weemaels	

Koen Handekyn	

Oct 2013
Target	

Create a Zip file of PDF’s
based on a CSV data file	

‣  Linear version	

‣  Making it scale with Redis	


parse csv
	


create pdf
	

create pdf
	


...	


create pdf
	


zip
Step 1: linear 	

‣  Parse CSV	

•  std lib : require ‘csv’	

•  docs = CSV.read("#{DATA}.csv")
Simple Templating with String Interpolation	

invoice.html	

<<Q	

<div class="title">	

INVOICE #{invoice_nr}	


‣  Merge data into HTML	

• 

template =
File.new('invoice.html').
read

• 

html =
eval("<<QQQn#{template}
nQQQ”)

</div>	

<div class="address">	

#{name}</br>	

#{street}</br>	

#{zip} #{city}</br>	

</div>	

Q
Step 1: linear 	

‣  Create PDF	

•  prince xml using princely gem	

•  http://www.princexml.com	

•  p = Princely.new
p.add_style_sheets('invoice.css')
p.pdf_from_string(html)
Step 1: linear	

‣  Create ZIP	

•  Zip::ZipOutputstream.
open(zipfile_name)do |zos|
files.each do |file, content|
zos.new_entry(file)
zos.puts content
end
end
Full Code
	

require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval("<<WTFMFn#{template}nWTFMF")!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry "#{name}.pdf"!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!
DEMO
Step 2: from linear ...	

parse csv
	


create pdf
	

create pdf
	


...	


create pdf
	


zip
Step 2: ...to parallel	

parse csv
	


create pdf
	


create pdf
	


zip
	


Threads
	

?
	


create pdf
Multi Threaded	

‣  Advantage	

•  Lightweight (minimal overhead)	

‣  Challenges (or why is it hard)	

•  Hard to code: most data structures are not thread safe by default, they
need synchronized access	


•  Hard to test: different execution paths , timings	

•  Hard to maintain	

‣  Limitation	

•  single machine - not a solution for horizontal scalability 
beyond the multi core cpu
Step 2: ...to parallel	

parse csv
	

?
	


create pdf
	


create pdf
	


zip
	


create pdf
Multi Process	

• scale across machines	

•  advanced support for debugging and monitoring at the
OS level	


• simpler (code, testing, debugging, ...)	

•  slightly more overhead 	


	

BUT
But	

all this assumes	

“shared state across processes”	


MemCached	


parse csv
	


SQL?	


shared state
	


create pdf
	


create pdf
	


create pdf
	


shared state
	


File System
	


zip
	

… OR …
	


Terra Cotta
Hello Redis	

‣  Shared Memory Key Value Store with
High Level Data Structure support 	

•  String (String, Int, Float)	

•  Hash (Map, Dictionary) 	

•  List (Queue) 	

•  Set 	

•  ZSet (ordered by member or score)
About Redis	

•  Single threaded : 1 thread to serve them all	

•  (fit) Everything in memory	

• 

“Transactions” (multi exec)	


• 

Expiring keys	


• 

LUA Scripting	


• 

Publisher-Subscriber	


• 

Auto Create and Destroy	


• 

Pipelining	


• 

But … full clustering (master-master) is not available (yet)
Hello Redis	

‣  redis-cli	

• 
• 
• 
• 

set name “pascal” =
“pascal”
incr counter = 1
incr counter = 2
hset pascal name
“pascal”

• 

hset pascal address
“merelbeke”

• 
• 

sadd persons pascal
smembers persons =
[pascal]

• 
• 
• 
• 
• 
• 
• 

keys *
type pascal = hash
lpush todo “read” = 1
lpush todo “eat” = 2
lpop todo = “eat”
rpoplpush todo done =
“read”
lrange done 0 -1 =
“read”
Let Redis Distribute	

parse csv
	


create pdf
	


process	


process	


create pdf
	


process	


zip
	


...
Spread the Work	

parse csv
	


process	


1
	


zip
	


counter
	


Queue with data
	


create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Put PDF Create Input data on a Queue and do the counter
bookkeeping	


!
docs.each do |doc|!
data = YAML::dump(doc)!
!r.lpush 'pdf:queue’, data!
r.incr ctr” # bookkeeping!
end!
Create PDF’s	

process	


parse csv
	


zip
	


counter
	

Queue with data
	

Hash with pdfs
	


2	


1	

create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Read PDF input data from Queue and do the counter bookkeeping
and put each created PDF in a Redis hash and signal if ready	


while (true)!
_, msg = r.brpop 'pdf:queue’!
!doc = YAML::load(msg)!
#name of hash, key=docname, value=pdf!
r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc))
!
ctr = r.decr ‘ctr’

!

r.rpush ready, done if ctr == 0!
end!
Zip When Done	

parse csv
	


process	


ready
	


zip
	


3
Hash with pdfs
	


create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Wait for the ready signal 
Fetch all pdf ’s
And zip them	


!
r.brpop ready“ # wait for signal!
pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash!
create_zip pdfs # zip it
More Parallelism 	

parse csv
	


zip
	

ready 	

	

ready 	

ready

counter
counter	

counter	

	

hash 	

	

hash Pdfs
Hash with
	


Queue with data
	


create pdf
	


create pdf
	


...
Ruby on Redis	

‣ 

Put PDF Create Input data on a Queue and do the counter
bookkeeping	


# unique id for this input file!
UUID = SecureRandom.uuid!
docs.each do |doc|!
data = YAML::dump([UUID, doc])!
!r.lpush 'pdf:queue’, data!
r.incr ctr:#{UUID}” # bookkeeping!
end!
Ruby on Redis	

‣ 

Read PDF input data from Queue and do the counter bookkeeping and
put each created PDF in a Redis hash	


while (true)!
_, msg = r.brpop 'pdf:queue’!
uuid, doc = YAML::load(msg)!
r.hset(uuid, doc[0], create_pdf(*doc))!
ctr = r.decr ctr:#{uuid}

!

r.rpush ready:#{uuid}, done if ctr == 0
end!

!
Ruby on Redis	

‣ 

Wait for the ready signal 
Fetch all pdf ’s
And zip them	


!
r.brpop ready:#{UUID}“ # wait for signal!
pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash!
create_zip(pdfs) # zip it
Full Code
	

require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!

LINEAR	


require 'csv’!
require 'zip/zip'!
require 'redis'!
require 'yaml'!
require 'securerandom'!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)!
UUID = SecureRandom.uuid!
!
r = Redis.new!
my_counter = ctr:#{UUID}!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
docs.each do |doc| # distribute!!
r.lpush 'pdf:queue' , YAML::dump([UUID, doc])!
r.incr my_counter!
end!
!
r.brpop ready:#{UUID} #collect!!
create_zip(r.hgetall(UUID)) !
!
# clean up!
r.del my_counter!
r.del UUID !
puts All done!”!

MAIN	


require 'redis'!
require 'princely'!
require 'yaml’!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
r = Redis.new!
while (true)!
_, msg = r.brpop 'pdf:queue'!
uuid, doc = YAML::load(msg)!
r.hset(uuid , doc[0] , create_pdf(*doc))!
ctr = r.decr ctr:#{uuid} !
r.rpush ready:#{uuid}, done if ctr == 0!
end!

WORKER	


Key functions (create pdf and create zip)
remain unchanged.	

	

Distribution code highlighted
DEMO 2
Multi Language Participants	

parse csv
	


zip
	


counter
counter	

counter	

	

Queue with data
	


create pdf
	


hash 	

	

hash pdfs
Hash with
	


create pdf
	


...
Conclusions	

From Linear To Multi Process Distributed	

Is easy with	

Redis Shared Memory High Level Data Structures	

	

Atomic Counter for bookkeeping	

Queue for work distribution	

Queue as Signal	

Hash for result sets

More Related Content

Viewers also liked (11)

Rituales a la diosa Hécate
Rituales a la diosa HécateRituales a la diosa Hécate
Rituales a la diosa Hécate
 
Algoritma dan Struktur Data - antrian
Algoritma dan Struktur Data - antrianAlgoritma dan Struktur Data - antrian
Algoritma dan Struktur Data - antrian
 
Algoritma dan pengetahuan terkait (menghitung, konversi, dll)
Algoritma dan pengetahuan terkait (menghitung, konversi, dll) Algoritma dan pengetahuan terkait (menghitung, konversi, dll)
Algoritma dan pengetahuan terkait (menghitung, konversi, dll)
 
Queue antrian
Queue antrian Queue antrian
Queue antrian
 
Implementasi queue
Implementasi queueImplementasi queue
Implementasi queue
 
Materi Struktur data QUEUE
Materi Struktur data QUEUEMateri Struktur data QUEUE
Materi Struktur data QUEUE
 
Makalah Or Antrian
Makalah Or  AntrianMakalah Or  Antrian
Makalah Or Antrian
 
2894065
28940652894065
2894065
 
Data Structure (Queue)
Data Structure (Queue)Data Structure (Queue)
Data Structure (Queue)
 
Queue
QueueQueue
Queue
 
Queue as data_structure
Queue as data_structureQueue as data_structure
Queue as data_structure
 

Similar to Ruby on Redis

Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
dotCloud
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?
Docker, Inc.
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
Andrew Brust
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
Robert Grossman
 

Similar to Ruby on Redis (20)

Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
 
Front End Development Automation with Grunt
Front End Development Automation with GruntFront End Development Automation with Grunt
Front End Development Automation with Grunt
 
Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?Deploying PHP on PaaS: Why and How?
Deploying PHP on PaaS: Why and How?
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projects
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
Introduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCatsIntroduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCats
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Streams
StreamsStreams
Streams
 
מיכאל
מיכאלמיכאל
מיכאל
 
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaPyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
PDF made easy with iText 7
PDF made easy with iText 7PDF made easy with iText 7
PDF made easy with iText 7
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 

Ruby on Redis

  • 1. Ruby on Redis Pascal Weemaels Koen Handekyn Oct 2013
  • 2. Target Create a Zip file of PDF’s based on a CSV data file ‣  Linear version ‣  Making it scale with Redis parse csv create pdf create pdf ... create pdf zip
  • 3. Step 1: linear ‣  Parse CSV •  std lib : require ‘csv’ •  docs = CSV.read("#{DATA}.csv")
  • 4. Simple Templating with String Interpolation invoice.html <<Q <div class="title"> INVOICE #{invoice_nr} ‣  Merge data into HTML •  template = File.new('invoice.html'). read •  html = eval("<<QQQn#{template} nQQQ”) </div> <div class="address"> #{name}</br> #{street}</br> #{zip} #{city}</br> </div> Q
  • 5. Step 1: linear ‣  Create PDF •  prince xml using princely gem •  http://www.princexml.com •  p = Princely.new p.add_style_sheets('invoice.css') p.pdf_from_string(html)
  • 6. Step 1: linear ‣  Create ZIP •  Zip::ZipOutputstream. open(zipfile_name)do |zos| files.each do |file, content| zos.new_entry(file) zos.puts content end end
  • 7. Full Code require 'csv'! require 'princely'! require 'zip/zip’! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval("<<WTFMFn#{template}nWTFMF")! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry "#{name}.pdf"! zos.puts content! end! end! zipfile_name! end! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! # create a pdf for each line in the csv ! # and put it in a hash! files_h = docs.inject({}) do |files_h, doc|! files_h[doc[0]] = create_pdf(*doc)! files_h! end! ! # zip all pfd's from the hash ! create_zip files_h! !
  • 9. Step 2: from linear ... parse csv create pdf create pdf ... create pdf zip
  • 10. Step 2: ...to parallel parse csv create pdf create pdf zip Threads ? create pdf
  • 11. Multi Threaded ‣  Advantage •  Lightweight (minimal overhead) ‣  Challenges (or why is it hard) •  Hard to code: most data structures are not thread safe by default, they need synchronized access •  Hard to test: different execution paths , timings •  Hard to maintain ‣  Limitation •  single machine - not a solution for horizontal scalability beyond the multi core cpu
  • 12. Step 2: ...to parallel parse csv ? create pdf create pdf zip create pdf
  • 13. Multi Process • scale across machines •  advanced support for debugging and monitoring at the OS level • simpler (code, testing, debugging, ...) •  slightly more overhead BUT
  • 14. But all this assumes “shared state across processes” MemCached parse csv SQL? shared state create pdf create pdf create pdf shared state File System zip … OR … Terra Cotta
  • 15. Hello Redis ‣  Shared Memory Key Value Store with High Level Data Structure support •  String (String, Int, Float) •  Hash (Map, Dictionary) •  List (Queue) •  Set •  ZSet (ordered by member or score)
  • 16. About Redis •  Single threaded : 1 thread to serve them all •  (fit) Everything in memory •  “Transactions” (multi exec) •  Expiring keys •  LUA Scripting •  Publisher-Subscriber •  Auto Create and Destroy •  Pipelining •  But … full clustering (master-master) is not available (yet)
  • 17. Hello Redis ‣  redis-cli •  •  •  •  set name “pascal” = “pascal” incr counter = 1 incr counter = 2 hset pascal name “pascal” •  hset pascal address “merelbeke” •  •  sadd persons pascal smembers persons = [pascal] •  •  •  •  •  •  •  keys * type pascal = hash lpush todo “read” = 1 lpush todo “eat” = 2 lpop todo = “eat” rpoplpush todo done = “read” lrange done 0 -1 = “read”
  • 18. Let Redis Distribute parse csv create pdf process process create pdf process zip ...
  • 19. Spread the Work parse csv process 1 zip counter Queue with data create pdf process create pdf process ...
  • 20. Ruby on Redis ‣  Put PDF Create Input data on a Queue and do the counter bookkeeping ! docs.each do |doc|! data = YAML::dump(doc)! !r.lpush 'pdf:queue’, data! r.incr ctr” # bookkeeping! end!
  • 21. Create PDF’s process parse csv zip counter Queue with data Hash with pdfs 2 1 create pdf process create pdf process ...
  • 22. Ruby on Redis ‣  Read PDF input data from Queue and do the counter bookkeeping and put each created PDF in a Redis hash and signal if ready while (true)! _, msg = r.brpop 'pdf:queue’! !doc = YAML::load(msg)! #name of hash, key=docname, value=pdf! r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc)) ! ctr = r.decr ‘ctr’ ! r.rpush ready, done if ctr == 0! end!
  • 23. Zip When Done parse csv process ready zip 3 Hash with pdfs create pdf process create pdf process ...
  • 24. Ruby on Redis ‣  Wait for the ready signal Fetch all pdf ’s And zip them ! r.brpop ready“ # wait for signal! pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash! create_zip pdfs # zip it
  • 25. More Parallelism parse csv zip ready ready ready counter counter counter hash hash Pdfs Hash with Queue with data create pdf create pdf ...
  • 26. Ruby on Redis ‣  Put PDF Create Input data on a Queue and do the counter bookkeeping # unique id for this input file! UUID = SecureRandom.uuid! docs.each do |doc|! data = YAML::dump([UUID, doc])! !r.lpush 'pdf:queue’, data! r.incr ctr:#{UUID}” # bookkeeping! end!
  • 27. Ruby on Redis ‣  Read PDF input data from Queue and do the counter bookkeeping and put each created PDF in a Redis hash while (true)! _, msg = r.brpop 'pdf:queue’! uuid, doc = YAML::load(msg)! r.hset(uuid, doc[0], create_pdf(*doc))! ctr = r.decr ctr:#{uuid} ! r.rpush ready:#{uuid}, done if ctr == 0 end! !
  • 28. Ruby on Redis ‣  Wait for the ready signal Fetch all pdf ’s And zip them ! r.brpop ready:#{UUID}“ # wait for signal! pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash! create_zip(pdfs) # zip it
  • 29. Full Code require 'csv'! require 'princely'! require 'zip/zip’! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval(WTFMFn#{template}nWTFMF)! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry #{name}.pdf! zos.puts content! end! end! zipfile_name! end! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! # create a pdf for each line in the csv ! # and put it in a hash! files_h = docs.inject({}) do |files_h, doc|! files_h[doc[0]] = create_pdf(*doc)! files_h! end! ! # zip all pfd's from the hash ! create_zip files_h! ! LINEAR require 'csv’! require 'zip/zip'! require 'redis'! require 'yaml'! require 'securerandom'! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry #{name}.pdf! zos.puts content! end! end! zipfile_name! end! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)! UUID = SecureRandom.uuid! ! r = Redis.new! my_counter = ctr:#{UUID}! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! docs.each do |doc| # distribute!! r.lpush 'pdf:queue' , YAML::dump([UUID, doc])! r.incr my_counter! end! ! r.brpop ready:#{UUID} #collect!! create_zip(r.hgetall(UUID)) ! ! # clean up! r.del my_counter! r.del UUID ! puts All done!”! MAIN require 'redis'! require 'princely'! require 'yaml’! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval(WTFMFn#{template}nWTFMF)! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! r = Redis.new! while (true)! _, msg = r.brpop 'pdf:queue'! uuid, doc = YAML::load(msg)! r.hset(uuid , doc[0] , create_pdf(*doc))! ctr = r.decr ctr:#{uuid} ! r.rpush ready:#{uuid}, done if ctr == 0! end! WORKER Key functions (create pdf and create zip) remain unchanged. Distribution code highlighted
  • 31. Multi Language Participants parse csv zip counter counter counter Queue with data create pdf hash hash pdfs Hash with create pdf ...
  • 32. Conclusions From Linear To Multi Process Distributed Is easy with Redis Shared Memory High Level Data Structures Atomic Counter for bookkeeping Queue for work distribution Queue as Signal Hash for result sets