SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Big & Fast:
A quest for relevant and real-time analytics
Natalino Busa
@natalinobusa
Parallelism Mathematics Programming
Languages Machine Learning Statistics
Big Data Algorithms Cloud Computing
Natalino Busa
@natalinobusa
www.natalinobusa.com
Big and Fast.
Methodology Architecture Roles and organization
Conversion is the ultimate form of
permission marketing
Permission marketing is about the honour of being
heard.
How to earn it ?
Provide the right suggestions, at the right time.
This is what makes data analysis valuable
When do you really know your customer ?
know about last unique:
5 songs?
100 songs?
10’000 songs?
Old & New stuff.
We evolve slowly, our personality, our habits.
But events and trends can affect us on a short notice
How do you combine old with new?
The customer’s context
Complex on many dimensions:
Personal history:
amount of transactions ever done
Long term Interaction:
how the users’ action correlate with others
Real time events:
Trends and recent events
The customer’s context
context is related to time:
slow changing: the defining characteristic of a person
fast changing: events which influence our lives, trends
Require very different
technology solutions !!!
Challenges
millions of billions of
Not much time to react
window of opportunity sometimes is just a few seconds
Load of information to process
you want to understand well the user history
Slow and fast
ranking and preference
analysis
segmentation and clustering
short term trending topics
rule-based recommendations
10’s Terabytes of Data.
This can take hours ….
100’s of events per second.
This must be fast ….
Hadoop: Distributed Data OS
Reliable
Distributed, Replicated File System
Low cost
↓ Cost vs ↑ Performance/Storage
Computing Powerhouse
All clusters CPU’s working in parallel
for running queries
Scala / Akka / Spray:
a WEB API reactive framework
Actor
A Actor
B
Actor
C
msg 1
msg 2
msg 3
msg 4
● it scales horizontally (can run in cluster mode)
● maximum use of the available cores/memory
1. processing is non-blocking, threads are re-used
2. can parallelize computing power across many actors
Very fast: 1000’s messages/sec
Very reliable: auto recovery
Distributed computing:
lambda architecture
Batch
Computing
HTTP RESTful API
In-Memory
Distributed Database
In-memory
Distributed DB’s
Lambda Architecture
Batch + Streaming
low-latency
Web API services
Streaming
Computing
Data Warehouses Messaging Busses
Distributed computing: some techs
Hadoop
Cassandra
millions of billions of
λ=
conversions
( lamda )
All Things Distributed
Distributing computing and storage
more machines = more storage/computing
Open Source software solutions
mature enough for pragmatic adopters
Near realtime + big data technologies
Hadoop, Scala, Akka, Spray, Cassandra
Science & Engineering
Statistics,
Data Science
Python
R
Visualization
IT Infra
Big Data
Java
Scala
SQL
Hadoop: Big Data Infrastructure, Data Science on large datasets
Big Data and Fast Data
requires different profiles to be able to
achieve the best results
Parallelism Mathematics Programming
Languages Machine Learning Statistics
Big Data Algorithms Cloud Computing
Natalino Busa
@natalinobusa
www.natalinobusa.com
Thanks !
Any questions?
Natalino Busa
@natalinobusa

Contenu connexe

Tendances

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...
Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...
Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...Matt Stubbs
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data StoreRommel Garcia
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Rommel Garcia
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Introduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigDataIntroduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigDataNilay Mishra
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Onyxfish
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsDatabricks
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Lynn Langit
 

Tendances (20)

Final deck
Final deckFinal deck
Final deck
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...
Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...
Big Data LDN 2016: Out of the Data Warehouses, and into the Data Lakes and St...
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Big Data - Part I
Big Data - Part IBig Data - Part I
Big Data - Part I
 
What's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache SparkWhat's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache Spark
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
Microservices Live
Microservices LiveMicroservices Live
Microservices Live
 
OSCON 2015
OSCON 2015OSCON 2015
OSCON 2015
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Introduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigDataIntroduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigData
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data StreamsBlue Pill/Red Pill: The Matrix of Thousands of Data Streams
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011
 

En vedette

Lunar lander - The Game
Lunar lander - The GameLunar lander - The Game
Lunar lander - The GameThiago Santos
 
Historyofcomputers
HistoryofcomputersHistoryofcomputers
Historyofcomputersca999
 
software History
software Historysoftware History
software HistoryAvinash Avi
 
Software Testing: History, Trends, Perspectives - a Brief Overview
Software Testing: History, Trends, Perspectives - a Brief OverviewSoftware Testing: History, Trends, Perspectives - a Brief Overview
Software Testing: History, Trends, Perspectives - a Brief OverviewSoftheme
 
Advanced API Design: how an awesome API can help you make friends, get rich, ...
Advanced API Design: how an awesome API can help you make friends, get rich, ...Advanced API Design: how an awesome API can help you make friends, get rich, ...
Advanced API Design: how an awesome API can help you make friends, get rich, ...Jonathan Dahl
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Computer Software introduction
Computer  Software introductionComputer  Software introduction
Computer Software introductionfaisalahmed2017
 
Animated Visualization of Software History Using Software Evolution Storyboards
Animated Visualization of Software History Using Software Evolution StoryboardsAnimated Visualization of Software History Using Software Evolution Storyboards
Animated Visualization of Software History Using Software Evolution StoryboardsSAIL_QU
 
Operating Systems: Network Management
Operating Systems: Network ManagementOperating Systems: Network Management
Operating Systems: Network ManagementDamian T. Gordon
 
Operating Systems: A History of Linux
Operating Systems: A History of LinuxOperating Systems: A History of Linux
Operating Systems: A History of LinuxDamian T. Gordon
 

En vedette (12)

Lunar lander - The Game
Lunar lander - The GameLunar lander - The Game
Lunar lander - The Game
 
Harvard mark1
Harvard mark1Harvard mark1
Harvard mark1
 
Historyofcomputers
HistoryofcomputersHistoryofcomputers
Historyofcomputers
 
Unix.part1.history
Unix.part1.historyUnix.part1.history
Unix.part1.history
 
software History
software Historysoftware History
software History
 
Software Testing: History, Trends, Perspectives - a Brief Overview
Software Testing: History, Trends, Perspectives - a Brief OverviewSoftware Testing: History, Trends, Perspectives - a Brief Overview
Software Testing: History, Trends, Perspectives - a Brief Overview
 
Advanced API Design: how an awesome API can help you make friends, get rich, ...
Advanced API Design: how an awesome API can help you make friends, get rich, ...Advanced API Design: how an awesome API can help you make friends, get rich, ...
Advanced API Design: how an awesome API can help you make friends, get rich, ...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Computer Software introduction
Computer  Software introductionComputer  Software introduction
Computer Software introduction
 
Animated Visualization of Software History Using Software Evolution Storyboards
Animated Visualization of Software History Using Software Evolution StoryboardsAnimated Visualization of Software History Using Software Evolution Storyboards
Animated Visualization of Software History Using Software Evolution Storyboards
 
Operating Systems: Network Management
Operating Systems: Network ManagementOperating Systems: Network Management
Operating Systems: Network Management
 
Operating Systems: A History of Linux
Operating Systems: A History of LinuxOperating Systems: A History of Linux
Operating Systems: A History of Linux
 

Similaire à Big and fast a quest for relevant and real-time analytics

Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...NoSQLmatters
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinTuri, Inc.
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud EcosystemAmazon Web Services
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Christopher Curtin
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
Scaling APIs: Predict, Prepare for, Overcome the Challenges
Scaling APIs: Predict, Prepare for, Overcome the ChallengesScaling APIs: Predict, Prepare for, Overcome the Challenges
Scaling APIs: Predict, Prepare for, Overcome the ChallengesApigee | Google Cloud
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsPierre-Luc Maheu
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland SpeechDave Kellogg
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
Big Data LDN 2017: Delivering Instant Experience with Redid EnterpriseBig Data LDN 2017: Delivering Instant Experience with Redid Enterprise
Big Data LDN 2017: Delivering Instant Experience with Redid EnterpriseMatt Stubbs
 

Similaire à Big and fast a quest for relevant and real-time analytics (20)

Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
(ARC311) Decoding The Genetic Blueprint Of Life On A Cloud Ecosystem
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Scaling APIs: Predict, Prepare for, Overcome the Challenges
Scaling APIs: Predict, Prepare for, Overcome the ChallengesScaling APIs: Predict, Prepare for, Overcome the Challenges
Scaling APIs: Predict, Prepare for, Overcome the Challenges
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web apps
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland Speech
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
Big Data LDN 2017: Delivering Instant Experience with Redid EnterpriseBig Data LDN 2017: Delivering Instant Experience with Redid Enterprise
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
 

Plus de Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

Plus de Natalino Busa (15)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Dernier

Most Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdf
Most Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdfMost Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdf
Most Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdfCIO Business World
 
2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local Leads2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local LeadsSearch Engine Journal
 
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdfSnapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdfEastern Online-iSURVEY
 
Infographics about SEO strategies and uses
Infographics about SEO strategies and usesInfographics about SEO strategies and uses
Infographics about SEO strategies and usesbhavanirupeshmoksha
 
Storyboards for my Final Major Project Video
Storyboards for my Final Major Project VideoStoryboards for my Final Major Project Video
Storyboards for my Final Major Project VideoSineadBidwell
 
Best digital marketing e-book form bignners
Best digital marketing e-book form bignnersBest digital marketing e-book form bignners
Best digital marketing e-book form bignnersmuntasibkhan58
 
Codes and Conventions of Film Magazine Websites.pptx
Codes and Conventions of Film Magazine Websites.pptxCodes and Conventions of Film Magazine Websites.pptx
Codes and Conventions of Film Magazine Websites.pptxGeorgeCulica
 
Michael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysisMichael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysisjunaid794917
 
Fueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdfFueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdfVWO
 
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024CIO Business World
 
Exploring Web 3.0 Growth marketing: Navigating the Future of the Internet
Exploring Web 3.0 Growth marketing: Navigating the Future of the InternetExploring Web 3.0 Growth marketing: Navigating the Future of the Internet
Exploring Web 3.0 Growth marketing: Navigating the Future of the Internetnehapardhi711
 
The power of SEO-driven market intelligence
The power of SEO-driven market intelligenceThe power of SEO-driven market intelligence
The power of SEO-driven market intelligenceHinde Lamrani
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?Juan Pineda
 
top marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar Ctop marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar CManojkumar C
 
The Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingThe Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingJuan Pineda
 
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...Hugues Rey
 
McDonald's: A Journey Through Time (PPT)
McDonald's: A Journey Through Time (PPT)McDonald's: A Journey Through Time (PPT)
McDonald's: A Journey Through Time (PPT)DEVARAJV16
 
The Evolution of Internet : How consumers use technology and its impact on th...
The Evolution of Internet : How consumers use technology and its impact on th...The Evolution of Internet : How consumers use technology and its impact on th...
The Evolution of Internet : How consumers use technology and its impact on th...sowmyrao14
 
Exploring The World Of Adult Ad Networks.pdf
Exploring The World Of Adult Ad Networks.pdfExploring The World Of Adult Ad Networks.pdf
Exploring The World Of Adult Ad Networks.pdfadult marketing
 
What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...
What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...
What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...Ahrefs
 

Dernier (20)

Most Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdf
Most Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdfMost Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdf
Most Influential HR Leaders Leading the Corporate World, 2024 (Final file).pdf
 
2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local Leads2024's Top PPC Tactics: Triple Your Google Ads Local Leads
2024's Top PPC Tactics: Triple Your Google Ads Local Leads
 
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdfSnapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
Snapshot of Consumer Behaviors of March 2024-EOLiSurvey (EN).pdf
 
Infographics about SEO strategies and uses
Infographics about SEO strategies and usesInfographics about SEO strategies and uses
Infographics about SEO strategies and uses
 
Storyboards for my Final Major Project Video
Storyboards for my Final Major Project VideoStoryboards for my Final Major Project Video
Storyboards for my Final Major Project Video
 
Best digital marketing e-book form bignners
Best digital marketing e-book form bignnersBest digital marketing e-book form bignners
Best digital marketing e-book form bignners
 
Codes and Conventions of Film Magazine Websites.pptx
Codes and Conventions of Film Magazine Websites.pptxCodes and Conventions of Film Magazine Websites.pptx
Codes and Conventions of Film Magazine Websites.pptx
 
Michael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysisMichael Kors marketing assignment swot analysis
Michael Kors marketing assignment swot analysis
 
Fueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdfFueling A_B experiments with behavioral insights (1).pdf
Fueling A_B experiments with behavioral insights (1).pdf
 
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
The 10 Most Inspirational Leaders LEADING THE WAY TO SUCCESS, 2024
 
Exploring Web 3.0 Growth marketing: Navigating the Future of the Internet
Exploring Web 3.0 Growth marketing: Navigating the Future of the InternetExploring Web 3.0 Growth marketing: Navigating the Future of the Internet
Exploring Web 3.0 Growth marketing: Navigating the Future of the Internet
 
The power of SEO-driven market intelligence
The power of SEO-driven market intelligenceThe power of SEO-driven market intelligence
The power of SEO-driven market intelligence
 
What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?What are the 4 characteristics of CTAs that convert?
What are the 4 characteristics of CTAs that convert?
 
top marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar Ctop marketing posters - Fresh Spar Technologies - Manojkumar C
top marketing posters - Fresh Spar Technologies - Manojkumar C
 
The Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO CopywritingThe Pitfalls of Keyword Stuffing in SEO Copywriting
The Pitfalls of Keyword Stuffing in SEO Copywriting
 
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
(Generative) AI & Marketing: - Out of the Hype - Empowering the Marketing M...
 
McDonald's: A Journey Through Time (PPT)
McDonald's: A Journey Through Time (PPT)McDonald's: A Journey Through Time (PPT)
McDonald's: A Journey Through Time (PPT)
 
The Evolution of Internet : How consumers use technology and its impact on th...
The Evolution of Internet : How consumers use technology and its impact on th...The Evolution of Internet : How consumers use technology and its impact on th...
The Evolution of Internet : How consumers use technology and its impact on th...
 
Exploring The World Of Adult Ad Networks.pdf
Exploring The World Of Adult Ad Networks.pdfExploring The World Of Adult Ad Networks.pdf
Exploring The World Of Adult Ad Networks.pdf
 
What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...
What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...
What I learned from auditing over 1,000,000 websites - SERP Conf 2024 Patrick...
 

Big and fast a quest for relevant and real-time analytics

  • 1. Big & Fast: A quest for relevant and real-time analytics Natalino Busa @natalinobusa
  • 2. Parallelism Mathematics Programming Languages Machine Learning Statistics Big Data Algorithms Cloud Computing Natalino Busa @natalinobusa www.natalinobusa.com
  • 3. Big and Fast. Methodology Architecture Roles and organization
  • 4. Conversion is the ultimate form of permission marketing Permission marketing is about the honour of being heard. How to earn it ? Provide the right suggestions, at the right time. This is what makes data analysis valuable
  • 5. When do you really know your customer ? know about last unique: 5 songs? 100 songs? 10’000 songs?
  • 6. Old & New stuff. We evolve slowly, our personality, our habits. But events and trends can affect us on a short notice How do you combine old with new?
  • 7. The customer’s context Complex on many dimensions: Personal history: amount of transactions ever done Long term Interaction: how the users’ action correlate with others Real time events: Trends and recent events
  • 8. The customer’s context context is related to time: slow changing: the defining characteristic of a person fast changing: events which influence our lives, trends Require very different technology solutions !!!
  • 9. Challenges millions of billions of Not much time to react window of opportunity sometimes is just a few seconds Load of information to process you want to understand well the user history
  • 10. Slow and fast ranking and preference analysis segmentation and clustering short term trending topics rule-based recommendations 10’s Terabytes of Data. This can take hours …. 100’s of events per second. This must be fast ….
  • 11. Hadoop: Distributed Data OS Reliable Distributed, Replicated File System Low cost ↓ Cost vs ↑ Performance/Storage Computing Powerhouse All clusters CPU’s working in parallel for running queries
  • 12. Scala / Akka / Spray: a WEB API reactive framework Actor A Actor B Actor C msg 1 msg 2 msg 3 msg 4 ● it scales horizontally (can run in cluster mode) ● maximum use of the available cores/memory 1. processing is non-blocking, threads are re-used 2. can parallelize computing power across many actors Very fast: 1000’s messages/sec Very reliable: auto recovery
  • 13. Distributed computing: lambda architecture Batch Computing HTTP RESTful API In-Memory Distributed Database In-memory Distributed DB’s Lambda Architecture Batch + Streaming low-latency Web API services Streaming Computing Data Warehouses Messaging Busses
  • 14. Distributed computing: some techs Hadoop Cassandra millions of billions of λ= conversions ( lamda )
  • 15. All Things Distributed Distributing computing and storage more machines = more storage/computing Open Source software solutions mature enough for pragmatic adopters Near realtime + big data technologies Hadoop, Scala, Akka, Spray, Cassandra
  • 16. Science & Engineering Statistics, Data Science Python R Visualization IT Infra Big Data Java Scala SQL Hadoop: Big Data Infrastructure, Data Science on large datasets Big Data and Fast Data requires different profiles to be able to achieve the best results
  • 17. Parallelism Mathematics Programming Languages Machine Learning Statistics Big Data Algorithms Cloud Computing Natalino Busa @natalinobusa www.natalinobusa.com Thanks ! Any questions?