SlideShare a Scribd company logo
1 of 42
Download to read offline
Customer Behaviour Analytics:
Billions of Events to
one Customer-Product Graph
Strata London, 12th November 2013
Presented by Paul Lam
About Paul Lam
Joined uSwitch.com as first Data Scientist in 2010
• developed internal data products
• built distributed data architecture
• team of 3 with a developer and a statistician
Code contributor to various open source tools
• Cascalog, a big data processing library built on top of
Cascading (comparable to Apache Pig)
• Incanter, a statistical computing platform in Clojure
Author of Web Usage Mining: Data Mining Visitor Patterns
From Web Server Logs* to be published in late 2014
* tentative title
What is it
Customer-Product Graph
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:name “Lily”}

[:BUY {:timestamp 2013-11-01 13:00:00}]
bought
viewed
Question: Who bought an iPhone?
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:name “Lily”}
bought
viewed
Query: Who bought an iPhone?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}
START

x=node:node_auto_index(product='iPhone')Express”}
{:product “American
{:name “Lily”}
MATCH (person)-[:BUY]->(x)

RETURN person
Question: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “American Express”}

{:name “Lily”}
bought
viewed
Query: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

START x=node:node_auto_index(product='iPhone')
{:name
MATCH “Emily”}

{:product “American Express”}

(x)<-[:BUY]-(person)-[:BUY]->(y)
{:name “Lily”}

RETURN y
Hypothesis: People that buy X has interest in Y
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “American Express”}

{:name “Lily”}
bought
viewed
Query: Who to recommend Y
{:name “Bob”}

X

{:product “iPhone”}
START
{:name “Tom”}
x=node:node_auto_index(product='iPhone'),
y=node:node_auto_index(product='American
Express')
{:name “Emily”}
Y Looked
{:product “American Express”}
MATCH (p)-[:BUY]->(x),
at AE

(p)-[:VIEW]->(y)
WHERE NOT (p)-[:BUY]->(y)
{:name “Lily”}

RETURN p

Haven’t
bought AE
Product Recommendation by Reasoning Example

Interactive demo at http://bit.ly/customer_graph

1. Start with an idea
2. Trace to connected nodes
3. Identify patterns from viewpoint of those nodes
4. Repeat from #1 until discovering actionable item
5. Apply pattern
Challenge: Event Data to Graph Data

User ID

Product ID Action

Bob

iPhone

Bought

Tom

iPhone

Bought

Emily

iPhone

Bought

Bob

AE

Bought

Emily

AE

Viewed

Lily

AE

Bought
Why should you care
Customer Journey

14
Purchase Funnel

Interest

Desire

Action
15
Understanding Customer Intents

Interest

Desire

Action

16
Customer Experience as a Graph

17
http://insideintercom.io/criticism-and-two-way-streets/
18
A Feedback System to Drive Profit

19
Minimise effort between Q & A

Question
Answer

Data Interrogation
One Approach: Make data querying easier

Query = Function(Data) [1]
~ Function(Data Structure)

[1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
Data Structure: Relations versus Relations
a Edges
ak
Sale ID

User ID

Product ID Profit

1

1

1

£100

2

1

2

£50

{:profit £100}
{:profit £50}

User ID

Name

Product ID Name

1

Bob

1

iPhone

2

Emily

2

A.E.
Using the right database for the right task

RDBMS

Graph DB

Data

Attributes

Entities and relations

Model

Record-based

Associative

Relation

By-product of
normalisation

First class citizen

Example use

Reporting

Reasoning
How does it work
User actions as time-stamped records

time

HDFS

Paul Ingles, “User as Data”, Euroclojure 2012
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
From Input to Output with Hadoop, aka ETL step
1. interface
3. output data

2. input data

4... The 80%
HDFS

Reshape

Graph Database
Hadoop interface to Neo4J
•
•
•
•

Cascading-Neo4j tap [1]
Faunus Hadoop binaries [2]
CSV files*
etc.
[1] http://github.com/pingles/cascading.neo4j
[2] http://thinkaurelius.github.io/faunus/

HDFS

Reshape

Graph Database
Input data stored on HDFS

time

1

2

3

4

User

1
2

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

4

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

3

Timestamp
Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Nodes and Edges CSVs to go into a property graph
Node ID

Properties

1

{:name “Paul”, email: “paul.lam@uswitch.com”}

2

{:domain “google.com”}

3

{:page “/homepage/”}

...

...

5

{:product “iPhone”}

From To

Type

Properties

1

2

:SOURCE

{:timestamp “2013-11-01 13:00”}

1

3

:VIEWED

{:timestamp “2013-11-01 13:00”}

...

...

...

...

1

5

:BOUGHT

{:timestamp “2013-11-01 13:05”}
Records to Graph in 3 Steps

^ impo
rtable C

SV

1. Design graph
2. Extract Nodes
3. Build Relations

31
Step 1: Designing your graph
{:name “Paul”
:email “paul.lam@uswitch.com}
:viewed {:timestamp ... }
{:page “homepage”}

:bought {:timestamp ... }
:viewed { ... }
{:product “iPhone”}

:viewed { ... }
{:page “blog”}

:source {:timestamp ... }

{:domain “google.com”}
32
Step 2: Extract list of entity nodes
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

Timestamp

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Step 3: Building node-to-node relations
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

Timestamp

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Do this across all customers and products
Use your data processing tool of choice:
Apache Hive
• Apache Pig
• Cascading
and more ...
• Scalding
• Cascalog
• Spark
• your favourite programming language
•

Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functions
• build entity nodes and meta nodes
• sink data into database with Cascading-Neo4j Tap
•

36
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functions
• build entity nodes and meta nodes
• sink data into database with Cascading-Neo4j Tap
•

37
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direct and categoric relations
• sink data with Cascading-Neo4j Tap
•

38
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direct and categoric relations
• sink data with Cascading-Neo4j Tap
•

39
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
Summary

HDFS

Reshape

Graph
Contact
Paul Lam, data scientist at uSwitch.com
Email: paul@quantisan.com
Twitter: @Quantisan

More Related Content

Similar to Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsVincent Burckhardt
 
Driving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfDriving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfHans Scharler
 
User Story Mapping in Practice
User Story Mapping in PracticeUser Story Mapping in Practice
User Story Mapping in PracticeSteve Rogalsky
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
Eposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraEposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraTAEHOON KIM
 
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Giles Greenway
 
Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Aaron Lamphere
 
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?MongoDB
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and ConsumptionBrian Greig
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchYury Lifshits
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
 
Opportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateOpportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateAnand Deshpande
 
Preparing for Release to the App Store
Preparing for Release to the App StorePreparing for Release to the App Store
Preparing for Release to the App StoreGeoffrey Goetz
 
Building Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceBuilding Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceRaymond Gao
 
IT-Security@Contemporary Life
IT-Security@Contemporary LifeIT-Security@Contemporary Life
IT-Security@Contemporary LifeOliver Pfaff
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesChris Schalk
 

Similar to Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph (20)

Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Driving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfDriving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConf
 
User Story Mapping in Practice
User Story Mapping in PracticeUser Story Mapping in Practice
User Story Mapping in Practice
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Eposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraEposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things era
 
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
 
Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and Consumption
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! Research
 
yogesh CV
yogesh CVyogesh CV
yogesh CV
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Opportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateOpportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to Collaborate
 
Preparing for Release to the App Store
Preparing for Release to the App StorePreparing for Release to the App Store
Preparing for Release to the App Store
 
Building Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceBuilding Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and Salesforce
 
IT-Security@Contemporary Life
IT-Security@Contemporary LifeIT-Security@Contemporary Life
IT-Security@Contemporary Life
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologies
 

More from Paul Lam

Mozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyMozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyPaul Lam
 
When a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barWhen a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barPaul Lam
 
Evolution of Our Software Architecture
Evolution of Our Software ArchitectureEvolution of Our Software Architecture
Evolution of Our Software ArchitecturePaul Lam
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojurePaul Lam
 
Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Paul Lam
 
Clojure in US vs Europe
Clojure in US vs EuropeClojure in US vs Europe
Clojure in US vs EuropePaul Lam
 
2014 docker boston fig for developing microservices
2014 docker boston   fig for developing microservices2014 docker boston   fig for developing microservices
2014 docker boston fig for developing microservicesPaul Lam
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopPaul Lam
 
An agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataAn agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataPaul Lam
 

More from Paul Lam (9)

Mozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyMozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and Technology
 
When a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barWhen a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a bar
 
Evolution of Our Software Architecture
Evolution of Our Software ArchitectureEvolution of Our Software Architecture
Evolution of Our Software Architecture
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)
 
Clojure in US vs Europe
Clojure in US vs EuropeClojure in US vs Europe
Clojure in US vs Europe
 
2014 docker boston fig for developing microservices
2014 docker boston   fig for developing microservices2014 docker boston   fig for developing microservices
2014 docker boston fig for developing microservices
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on Hadoop
 
An agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataAn agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log data
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

  • 1. Customer Behaviour Analytics: Billions of Events to one Customer-Product Graph Strata London, 12th November 2013 Presented by Paul Lam
  • 2. About Paul Lam Joined uSwitch.com as first Data Scientist in 2010 • developed internal data products • built distributed data architecture • team of 3 with a developer and a statistician Code contributor to various open source tools • Cascalog, a big data processing library built on top of Cascading (comparable to Apache Pig) • Incanter, a statistical computing platform in Clojure Author of Web Usage Mining: Data Mining Visitor Patterns From Web Server Logs* to be published in late 2014 * tentative title
  • 4. Customer-Product Graph {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} [:BUY {:timestamp 2013-11-01 13:00:00}] bought viewed
  • 5. Question: Who bought an iPhone? {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} bought viewed
  • 6. Query: Who bought an iPhone? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} START x=node:node_auto_index(product='iPhone')Express”} {:product “American {:name “Lily”} MATCH (person)-[:BUY]->(x) RETURN person
  • 7. Question: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  • 8. Query: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} START x=node:node_auto_index(product='iPhone') {:name MATCH “Emily”} {:product “American Express”} (x)<-[:BUY]-(person)-[:BUY]->(y) {:name “Lily”} RETURN y
  • 9. Hypothesis: People that buy X has interest in Y {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  • 10. Query: Who to recommend Y {:name “Bob”} X {:product “iPhone”} START {:name “Tom”} x=node:node_auto_index(product='iPhone'), y=node:node_auto_index(product='American Express') {:name “Emily”} Y Looked {:product “American Express”} MATCH (p)-[:BUY]->(x), at AE (p)-[:VIEW]->(y) WHERE NOT (p)-[:BUY]->(y) {:name “Lily”} RETURN p Haven’t bought AE
  • 11. Product Recommendation by Reasoning Example Interactive demo at http://bit.ly/customer_graph 1. Start with an idea 2. Trace to connected nodes 3. Identify patterns from viewpoint of those nodes 4. Repeat from #1 until discovering actionable item 5. Apply pattern
  • 12. Challenge: Event Data to Graph Data User ID Product ID Action Bob iPhone Bought Tom iPhone Bought Emily iPhone Bought Bob AE Bought Emily AE Viewed Lily AE Bought
  • 19. A Feedback System to Drive Profit 19
  • 20. Minimise effort between Q & A Question Answer Data Interrogation
  • 21. One Approach: Make data querying easier Query = Function(Data) [1] ~ Function(Data Structure) [1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
  • 22. Data Structure: Relations versus Relations a Edges ak Sale ID User ID Product ID Profit 1 1 1 £100 2 1 2 £50 {:profit £100} {:profit £50} User ID Name Product ID Name 1 Bob 1 iPhone 2 Emily 2 A.E.
  • 23. Using the right database for the right task RDBMS Graph DB Data Attributes Entities and relations Model Record-based Associative Relation By-product of normalisation First class citizen Example use Reporting Reasoning
  • 24. How does it work
  • 25. User actions as time-stamped records time HDFS Paul Ingles, “User as Data”, Euroclojure 2012
  • 26. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  • 27. From Input to Output with Hadoop, aka ETL step 1. interface 3. output data 2. input data 4... The 80% HDFS Reshape Graph Database
  • 28. Hadoop interface to Neo4J • • • • Cascading-Neo4j tap [1] Faunus Hadoop binaries [2] CSV files* etc. [1] http://github.com/pingles/cascading.neo4j [2] http://thinkaurelius.github.io/faunus/ HDFS Reshape Graph Database
  • 29. Input data stored on HDFS time 1 2 3 4 User 1 2 Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User 4 Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User 3 Timestamp Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 30. Nodes and Edges CSVs to go into a property graph Node ID Properties 1 {:name “Paul”, email: “paul.lam@uswitch.com”} 2 {:domain “google.com”} 3 {:page “/homepage/”} ... ... 5 {:product “iPhone”} From To Type Properties 1 2 :SOURCE {:timestamp “2013-11-01 13:00”} 1 3 :VIEWED {:timestamp “2013-11-01 13:00”} ... ... ... ... 1 5 :BOUGHT {:timestamp “2013-11-01 13:05”}
  • 31. Records to Graph in 3 Steps ^ impo rtable C SV 1. Design graph 2. Extract Nodes 3. Build Relations 31
  • 32. Step 1: Designing your graph {:name “Paul” :email “paul.lam@uswitch.com} :viewed {:timestamp ... } {:page “homepage”} :bought {:timestamp ... } :viewed { ... } {:product “iPhone”} :viewed { ... } {:page “blog”} :source {:timestamp ... } {:domain “google.com”} 32
  • 33. Step 2: Extract list of entity nodes User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 34. Step 3: Building node-to-node relations User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 35. Do this across all customers and products Use your data processing tool of choice: Apache Hive • Apache Pig • Cascading and more ... • Scalding • Cascalog • Spark • your favourite programming language • Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
  • 36. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 36
  • 37. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 37
  • 38. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 38
  • 39. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 39
  • 40. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  • 42. Contact Paul Lam, data scientist at uSwitch.com Email: paul@quantisan.com Twitter: @Quantisan