SlideShare a Scribd company logo
1 of 20
CC 2.0 by Mr. T in DC | http://flic.kr/p/7khrin
CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSy
CC 2.0 by John Steven Fernandez | http://flic.kr/p/a8uTzz
CC 2.0 by Ian Carroll | http://flic.kr/p/6NWoGm
CC 2.0 by Perry French | http://flic.kr/p/8wDMJS
CC 2.0 by John Mitchell | http://flic.kr/p/5UaPg8
7
How do we answer these questions?
Before we started designing a blueprint
solution we first of all asked ourselves:
1 Who would be asked to answer questions
like this?
2 Who is this person?
3 What tools does this person expect to
use?
4 And what is a typical skill set of this
person?
5 How do they work?
Preparation
May
17,
2013
8
So, how do we answer these questions as a Data Scientist?
From a high level of abstraction the
answer is simple. We need a data
management system with three pieces:
ingest, store and process.
Traditional Data Management System Approach
May
17,
2013
Data
Source
Data
Ingestion
Data
Processing
Data
Storage
9
So, how do we answer these questions as a Data Scientist?
We take this basis architecture and replace the
generic terms while mapping it onto the Hadoop
ecosystem.
With this Hadoop architecture a Data Scientist should
be able to answer the questions without any
programming environment. He/she can also use
familiar BI, analysis and reporting tools as well.
Blueprint for a Data Management System with Hadoop
May
17,
2013
Data
Source Flume
HIVE,
ImpalaHDFS
BI/Analysis/R
eporting
10
Ingrediants
1 2 WiFi access points to simulate two different stores with
OpenWRT, a linux based firmware for routers, installed
2 Flume to move all log messages to HDFS, without any
manual intervention (no transformation, no filtering)
3 A 4 node CDH4 cluster (2GB RAM, 100GB HDD)
4 Pentaho Data Integration‘s graphical designer for data
transformation, parsing, filtering and loading to the
warehouse
5 Hive as data warehouse system on top of Hadoop to
project structure onto data
6 Impala for querying data from HDFS in real time
7 MS Excel to visualize results
Setup
May
17,
2013
11
How it Works
Analytics System
May
17,
2013
Flume
Hive
Impala
OpenWRT
00:A0:C9:14:C8:28
Syslog Server
Flume
Source
Sinks to
HDFSLoads
RawCSV
Hadoop/HDFS
M/R
Pentaho
UDP
CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfq
13
Visits for stores number one & two
The plot indicates that about 85% of the visits were detected in store
number one and about 15% in store number two. One might draw the
conclusion that store number one is in a much better location with more
occasional customers.
But let’s gain more insights by analysing the number of unique visitors.
Analysis Result
May
17,
2013
14
Unique visitors
This plot gives us more details about the customers. It turns out that
the 135 visits in store number one were caused by just 9 unique
visitors while store number two encountered 5 unique visitors.
Analysis Result
May
17,
2013
15This plot indicates that we have more returning than new users in both
stores. In store number two we didn’t see a new user over the past 4 days at
all.
It’s probably a good idea to start a marketing campaign which aims at new
customers, e.g. to give out vouchers for the first purchase.
New vs. returning users
Analysis Result
May
17,
2013
16The plot for the last 4 days vividly visualizes that the visit duration in
store number one was evenly distributed while the distribution in
store number two shows some peaks.
We can also see that visitors tend to stay in shop number one much
longer.
Visit duration over the past 4 days
Analysis Result
May
17,
2013
17There is a lot of useful information that can be derived
from this plot.
1. There is a repeating pattern of step-ins and step-outs
within a short period of time.
2. There was a step-out of store number one and a step-in
into store number two within just 28 seconds.
Avg. Duration Between Visits of one particular user
Analysis Result
May
17,
2013
Ma
y
17,
201
3
CC 2.0 by AurelienGuichard | http://flic.kr/p/cjg9yw
19
CCAH Course in ZH
• Cloudera Administrator Training for
Apache Hadoop (CCAH)
• June 26th – 28th 2013
• Limmatstrasse 50, Zurich
• More info's: http://www.ymc.ch/training
Announcement
May
17,
2013
20
Links
1 Presentation, Video and Post Series
• http://bitly.com/bundles/cguegi/1
2 http://www.bigdata-usergroup.ch
3 http://about.me/cguegi
4 http://www.ymc.ch/training
May
17,
2013

More Related Content

Similar to Case Study: In-Store Analysis

WMFRA # 46: Case Study - In-Store Analysis
WMFRA # 46: Case Study - In-Store AnalysisWMFRA # 46: Case Study - In-Store Analysis
WMFRA # 46: Case Study - In-Store AnalysisJean-Pierre König
 
Orchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal articleOrchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal articleIdea Transfer Inc.
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsRedox Engine
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 
1 CS 170 ‐ Computer Applications for Business Fall .docx
1  CS 170 ‐ Computer Applications for Business Fall .docx1  CS 170 ‐ Computer Applications for Business Fall .docx
1 CS 170 ‐ Computer Applications for Business Fall .docxhoney725342
 
Building a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathyBuilding a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathySolmaz Shahalizadeh
 
Jiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri Ptacek
 
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docxaulasnilda
 
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxdickonsondorris
 
Synthetic APIs Shape the Future of Data Acquisition and Management
Synthetic APIs Shape the Future of Data Acquisition and ManagementSynthetic APIs Shape the Future of Data Acquisition and Management
Synthetic APIs Shape the Future of Data Acquisition and ManagementDana Gardner
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Odoo
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERPAudaxis
 
Content marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingContent marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingDaniel Smulevich
 
StartupTalk #36 - Feedback Beyond the Buzz
StartupTalk #36 - Feedback Beyond the BuzzStartupTalk #36 - Feedback Beyond the Buzz
StartupTalk #36 - Feedback Beyond the BuzzPreSeed Ventures
 
Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...DigitalMarketingShow
 
Wedding Hall Management 9975053592
Wedding Hall Management 9975053592Wedding Hall Management 9975053592
Wedding Hall Management 9975053592sachinc020
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 

Similar to Case Study: In-Store Analysis (20)

WMFRA # 46: Case Study - In-Store Analysis
WMFRA # 46: Case Study - In-Store AnalysisWMFRA # 46: Case Study - In-Store Analysis
WMFRA # 46: Case Study - In-Store Analysis
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Orchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal articleOrchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal article
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
1 CS 170 ‐ Computer Applications for Business Fall .docx
1  CS 170 ‐ Computer Applications for Business Fall .docx1  CS 170 ‐ Computer Applications for Business Fall .docx
1 CS 170 ‐ Computer Applications for Business Fall .docx
 
Building a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathyBuilding a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathy
 
Jiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_Certified
 
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
 
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
 
Synthetic APIs Shape the Future of Data Acquisition and Management
Synthetic APIs Shape the Future of Data Acquisition and ManagementSynthetic APIs Shape the Future of Data Acquisition and Management
Synthetic APIs Shape the Future of Data Acquisition and Management
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERP
 
7 levers of digital.pdf
7 levers of digital.pdf7 levers of digital.pdf
7 levers of digital.pdf
 
Content marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingContent marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doing
 
2559 Big Data Pack
2559 Big Data Pack2559 Big Data Pack
2559 Big Data Pack
 
StartupTalk #36 - Feedback Beyond the Buzz
StartupTalk #36 - Feedback Beyond the BuzzStartupTalk #36 - Feedback Beyond the Buzz
StartupTalk #36 - Feedback Beyond the Buzz
 
Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...
 
Wedding Hall Management 9975053592
Wedding Hall Management 9975053592Wedding Hall Management 9975053592
Wedding Hall Management 9975053592
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 

More from Christian Gügi

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeChristian Gügi
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaChristian Gügi
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseChristian Gügi
 
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Christian Gügi
 

More from Christian Gügi (7)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Online Media Data Stream Processing with Kafka
Online Media Data Stream Processing with KafkaOnline Media Data Stream Processing with Kafka
Online Media Data Stream Processing with Kafka
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
 
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
 

Recently uploaded

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 

Recently uploaded (20)

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 

Case Study: In-Store Analysis

  • 1. CC 2.0 by Mr. T in DC | http://flic.kr/p/7khrin
  • 2. CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSy
  • 3. CC 2.0 by John Steven Fernandez | http://flic.kr/p/a8uTzz
  • 4. CC 2.0 by Ian Carroll | http://flic.kr/p/6NWoGm
  • 5. CC 2.0 by Perry French | http://flic.kr/p/8wDMJS
  • 6. CC 2.0 by John Mitchell | http://flic.kr/p/5UaPg8
  • 7. 7 How do we answer these questions? Before we started designing a blueprint solution we first of all asked ourselves: 1 Who would be asked to answer questions like this? 2 Who is this person? 3 What tools does this person expect to use? 4 And what is a typical skill set of this person? 5 How do they work? Preparation May 17, 2013
  • 8. 8 So, how do we answer these questions as a Data Scientist? From a high level of abstraction the answer is simple. We need a data management system with three pieces: ingest, store and process. Traditional Data Management System Approach May 17, 2013 Data Source Data Ingestion Data Processing Data Storage
  • 9. 9 So, how do we answer these questions as a Data Scientist? We take this basis architecture and replace the generic terms while mapping it onto the Hadoop ecosystem. With this Hadoop architecture a Data Scientist should be able to answer the questions without any programming environment. He/she can also use familiar BI, analysis and reporting tools as well. Blueprint for a Data Management System with Hadoop May 17, 2013 Data Source Flume HIVE, ImpalaHDFS BI/Analysis/R eporting
  • 10. 10 Ingrediants 1 2 WiFi access points to simulate two different stores with OpenWRT, a linux based firmware for routers, installed 2 Flume to move all log messages to HDFS, without any manual intervention (no transformation, no filtering) 3 A 4 node CDH4 cluster (2GB RAM, 100GB HDD) 4 Pentaho Data Integration‘s graphical designer for data transformation, parsing, filtering and loading to the warehouse 5 Hive as data warehouse system on top of Hadoop to project structure onto data 6 Impala for querying data from HDFS in real time 7 MS Excel to visualize results Setup May 17, 2013
  • 11. 11 How it Works Analytics System May 17, 2013 Flume Hive Impala OpenWRT 00:A0:C9:14:C8:28 Syslog Server Flume Source Sinks to HDFSLoads RawCSV Hadoop/HDFS M/R Pentaho UDP
  • 12. CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfq
  • 13. 13 Visits for stores number one & two The plot indicates that about 85% of the visits were detected in store number one and about 15% in store number two. One might draw the conclusion that store number one is in a much better location with more occasional customers. But let’s gain more insights by analysing the number of unique visitors. Analysis Result May 17, 2013
  • 14. 14 Unique visitors This plot gives us more details about the customers. It turns out that the 135 visits in store number one were caused by just 9 unique visitors while store number two encountered 5 unique visitors. Analysis Result May 17, 2013
  • 15. 15This plot indicates that we have more returning than new users in both stores. In store number two we didn’t see a new user over the past 4 days at all. It’s probably a good idea to start a marketing campaign which aims at new customers, e.g. to give out vouchers for the first purchase. New vs. returning users Analysis Result May 17, 2013
  • 16. 16The plot for the last 4 days vividly visualizes that the visit duration in store number one was evenly distributed while the distribution in store number two shows some peaks. We can also see that visitors tend to stay in shop number one much longer. Visit duration over the past 4 days Analysis Result May 17, 2013
  • 17. 17There is a lot of useful information that can be derived from this plot. 1. There is a repeating pattern of step-ins and step-outs within a short period of time. 2. There was a step-out of store number one and a step-in into store number two within just 28 seconds. Avg. Duration Between Visits of one particular user Analysis Result May 17, 2013
  • 18. Ma y 17, 201 3 CC 2.0 by AurelienGuichard | http://flic.kr/p/cjg9yw
  • 19. 19 CCAH Course in ZH • Cloudera Administrator Training for Apache Hadoop (CCAH) • June 26th – 28th 2013 • Limmatstrasse 50, Zurich • More info's: http://www.ymc.ch/training Announcement May 17, 2013
  • 20. 20 Links 1 Presentation, Video and Post Series • http://bitly.com/bundles/cguegi/1 2 http://www.bigdata-usergroup.ch 3 http://about.me/cguegi 4 http://www.ymc.ch/training May 17, 2013