5. Decision
impact
(also proportional
to risk)
Decision rate
2000’s – “How often can we run a permission-based email mktg. campaign?” Rules-based alerts
2010’s – Millions of decisions and actions taken, all in less than a blink of an eye
1
2
3
1990’s – “Should we advertise on the Superbowl? Should we run direct mail this qtr.?” Batch mode1
2
3
Evolution of real time decisions
Business
Impact
6. Plethora of Tools
Glacier
S3 DynamoDB
RDS
EMR
Redshift
Data Pipeline
Kinesis
Cassandra CloudSearch
Kinesis-
enabled
app
8. Use the right tool for the right job
App/Web Tier
Client Tier
Database & Storage Tier
Amazon RDSAmazon
DynamoDB
Amazon
ElastiCache
Amazon S3
Amazon
Glacier
Amazon EMR
Amazon Redshift
9. Histoire d’une migration vers
Amazon Redshift
Nicolas Baron – CTO, FollowAnalytics
@nico_b
www.linkedin.com/in/nicolasbaron
nicolas@followanalytics.com
10. FollowAnalytics
En quelques mots
Mobile Marketing Automation
SaaS Platform
Startup créée à Paris
HQ à San Francisco
Positionnement Fortune 1000 / SBF 120
SAP & Hummer Winblad
18. Adoption d’Amazon Redshift
16 Noeuds – dw2.Large
– 2.5 To (SSD)
• Test sur plusieurs milliards de lignes de logs
• 90% des requêtes en moins de 10 secondes
Pro tip: design du schéma !
Avant (MongoDB) Après (Amazon Redshift)
Millions Milliards
Pré-calcul systématique < 10 secondes
19. Quels bénéfices ?
“Right tool for
the right job”
Pas de nouvelles
compétences
Service AWS managé
21. Ingest Store Process Visualize
Stages of Big Data Processing
Batch analysis – one set of tools
Real time analysis – another set of tools
Minutes/Hours
Seconds
22.
23. Types of Data Ingest
• Transactional
– Database reads/writes (structured
data)
• File
– Logs (unstructured data)
Database
Cloud
Storage
Data has to be extracted from multiple source to be processed periodically
24. Types of Data Ingest
• Stream
– Click-stream logs
– Mobile analytics
– IoT
– Telemetry
– Any real-time data from any producer
Stream
Storage
Data is streamed and can be processed continuously
25. What is a good ingest tool?
• Sequential streams are easier to process
• Need to scale
• Need to persist
• Architectural flexibility
• Real time! Processing
Kafka
Or
Kinesis
Processing
IngestTool
28. Amazon Kinesis
• Streams contain Shards. Each Shard
ingests data up to 1MB/sec, and up to
1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by adding or
removing Shards
• Replay data inside of 24Hr. Window
• Fully managed & low cost
29. 13.4 Mo/s
500 Millions tweet a day @2.4 Ko per tweetHypothesis:
577 $ / month
Source: dioncosales. Pricing example is for Amazon Kinesis Only
30. “Amazon Kinesis also offloads a lot of developer burden in building a real-time,
streaming data ingestion platform, and enables Supercell to focus on delivering
games that delight players worldwide."
Sami Yliharju, Services Lead
32. Which Stream Store Should I Use?
• Amazon Kinesis and Kafka have many similarities
– Multiple consumers
– Ordering of records
– Streaming MapReduce
– Low latency. Highly durable, available, and scalable
• Differences
– Record lifetime: 24 hours in Amazon Kinesis, configurable in Kafka
– Record size: 50 KB in Amazon Kinesis, configurable in Kafka
– Amazon Kinesis is a fully managed service – easier to provision, manage,
and scale
33.
34. What Database and Storage Should I Use?
• Data structure
• Query complexity
• Use case
• Workload
• Data characteristics: hot, warm, cold
35.
36. Process
• Answering questions about data
• Questions
– Analytics: Think SQL/data warehouse
– Classification: Think sentiment analysis
• Who is asking them
– Data scientist
– Business owners
• When do you need them
– In seconds
– Weekly/Monthly