SlideShare une entreprise Scribd logo
1  sur  44
Big Data Analytics
on

January 9th, 2014
GROW WITH BIG DATA.
Third Eye Consulting Services & Solutions
LLC.
For Questions
Tweet Directly to
@ThirdEyeCss
We are actively monitoring this Twitter
channel!
Agenda
1. 5 minutes
- Introductions
2. 15 minutes
- Introduction to the Google Cloud Platform & its various
Big Data services
3. 10 minutes
- Showcasing various Online Retail Analytics
- User, Site & Products Analytics
4. 15 minutes
- Live Demonstration
- Ingestion of session log data to visualization in Tableau
5. 15 minutes
- Q&A Session
(Can extend beyond based on the audience enthusiasm & participation!)
Google Cloud Platform
Google Cloud Platform
– Key Components
App Engine
 Big Query
 Cloud SQL
 Cloud Storage
 Compute Engine

Tweet @ThirdEyeCss



https://cloud.google.com
App Engine - Architecture
A highly elastic and scale on demand infrastructure for deploying and running front
end web applications
App Master

Front End
Instance 1
Front End
Instance 2
Front End
Instance 3
Front End
Instance n

App Server
Instance 1
App Server
Instance 2
App Server
Instance 3
App Server
Instance n

Datasto
re

Memcac
he

Static
Files

https://cloud.google.com/products/app-engine
App Engine - Advantages







Scales on Demand
Very low barrier for entry
No initial hardware costs
Issues such as scalability, reliability are non-issues
Can handle very large amounts of data
Can handle very large user volumes, including sudden
spikes by scaling elastically

https://cloud.google.com/products/app-engine
BigQuery


A column oriented data store that can store and
process billions of rows of data



SQL like query syntax for querying data



Run ad-hoc queries against multi terabyte data
sets in seconds



Highly scalable, reliable and secure as it uses
underlying core Google Platform Infrastructure

https://cloud.google.com/products/big-query
BigQuery


Supports all the main ETL and BI tools like
Informatica, Talend, QlikView and Tableau



Primarily used for real-time data analysis and
visualization



Integration with App Engine through APIs

https://cloud.google.com/products/big-query
BigQuery
SQL Access


Only SELECT operations



No CREATE, UPDATE or DROP



Analysis of Unstructured data using REGEXP_yyyy
functions



JOINs of small (<8mb of compressed data) and large
tables are possible. Performance penalty for large
table joins

https://cloud.google.com/products/big-query
BigQuery
Programmatic Access


bq command line tool, Google API client library,
REST API



Google API client library supports various languages
like Java, Python, JavaScript, Ruby, PHP, Google
Apps Script



Authentication is handled via Oauth2



In REST API, credentials and HTTP request have to
be handled manually by user

https://cloud.google.com/products/big-query
BigQuery
Use Cases
 Can
 Real

be used for batch analysis of large data sets
time analytics for dashboard type applications

 Pre-process

very large data sets and serve data in

real-time
 Visualization

using third party tools that call Big

Query APIs.
https://cloud.google.com/products/big-query
Cloud SQL


MySQL database running on the Google Cloud Platform



Easy migration from local MySQL instances to Cloud SQL



Highly scalable and reliable with replication



Supports all major MySQL features including stored
procedures, triggers and views



GUI Frontend for easy administration and operations



Built on top of core Google Infrastructure



Easy integration with App Engine

https://cloud.google.com/products/cloud-sql
Cloud Storage




Custom
App

Cloud SQL

BigQuery

Cloud SQL

Cloud Storage

A highly reliable cloud storage
platform for storing and
accessing vast amounts of data
Can be used for data archival
and content delivery



Data can be ingested and
processed by other Google
Cloud Services



Accessible through GUI,
command line and APIs

https://cloud.google.com/products/cloud-storage
Cloud Storage


Object store that can deliver very efficiently over the internet



Not a mountable file system



Buckets are the basic container. They cannot be nested and can reside in the
US or EU geographies.



Objects are stored in buckets. They are immutable and can be upto 5TB in
size.



ACLs can be setup for Google users, groups, app domain, authenticated
users with READ, WRITE or FULL_CONTROL. Signed URL access for
anonymous users.



Can be accessed using XML and JSON REST APIs



Command line access using gsutil tool

 App Engine Storage API for access from App Engine
https://cloud.google.com/products/cloud-storage
Compute Engine


Infrastructure as a service



Linux Virtual machines with associated storage and network
infrastructure are hosted by Google



Can run any type of application or workload in the google cloud that
uses the same Google Core Infrastructure



Highly elastic and scalable



A typical use case would be to provision a Hadoop Cluster on demand
using several 10s to 100s of virtual machines as name node and data
nodes

https://cloud.google.com/products/compute-engine
Compute Engine


Various machine type configurations possible such as High
Memory, High CPU, Standard etc.



Very easy provisioning and management using cloud
management software like RightScale



CentOS and Debian are the default OSes currently
supported.



Typical use cases are batch processing, log analysis, i/o
intensive workloads, hadoop on the cloud (map/reduce)

https://cloud.google.com/products/compute-engine
Online Retail
Analytics
&
Visualization
Online Retail Industry

Forrester: U.S. Online Retail Sales to Hit $370 Billion by
Healthcare Store


Large online
retailer’s Health
Store website.



Thousands of health
care products are
sold per month.
These large online
retailers are killing us!
I need to increase
sales.
I need to understand
my site visitors better.
VP OF MARKETING

Can Big Data
Analytics
help?
DATA SCIENTIST

Yes, Big Data
Analytics can help!
Google’s Cloud
platform handles all
the complexities of Big
Data processing.
We start with regular
session log files.
Session Log File (W3C compliant)

Time & Date
when visitor
came on site

Unique User
& Session Id

Product Page
Visited by
User

Referral Site
From the simple log files, we can do
sophisticated analytics like these:

DATA SCIENTIST

User Analytics
• # of Unique Site Visitors,
per hour, per day
• # of Return Site Visitors,
per hour, per day
• Total # of Site Visitors,
per hour, per day
• Top 10 Active Users
per hour, per day
Product Analytics like these:
• Top 10 Popular Products
per hour, per day
• Top 10 popular Products
in Shopping Basket
per hour, per day
• Top 10 Bought Products
per hour, per day
DATA SCIENTIST
Conversion Analytics like these:
• # of users who added products to
shopping basket
per hour, per day
• # of users who actually bought
products
per hour, per day
• % of users who browsed,
added products to shopping cart &
actually bought
per hour, per day.
DATA SCIENTIST
Behold, The Google Cloud Platform’s Dashboard!
DATA
SCIENTIST

List of
available
Services.
Google Cloud Platform’s Cloud Storage
DATA
SCIENTIST

Session
Log
Files
Uploaded

to
Cloud
Storage.
Google Cloud Platform’s BigQuery
DATA
SCIENTIST

Tables
on
BigQuery

with
data
from
Session
Log
Files.
Running a Query on BigQuery
DATA
SCIENTIST

Queries
on
BigQuery

are very
much
SQL
like,
easy to
develop
& gets
results
fast.
Visualize BigQuery’s Results in
DATA
SCIENTIST

Tableau
provides
an easy
&
effective
way to
develop
dashboards &
reports.
Site Analytics – Referral Site Comparisons
DATA
SCIENTIST

Traffic
referred
to site
from
other
sources
like
Google.
com
Site Analytics – Referral Site Comparisons
DATA
SCIENTIST

Traffic
referred
to site
from
other
sources
like
Google.
com
Site Analytics – Referral Site Comparisons
DATA
SCIENTIST

Traffic
referred
to site
from
other
sources
like
Google.
com
Product Analytics - Product Purchase Trends
DATA
SCIENTIST

Analysis
of
specific
products
as
purchased

on site
over
hours /
days in a
month
Conversion Analytics
- Product Added to Cart vs. Bought.
DATA
SCIENTIST

Analysis
of which
products
were
placed in
cart vs
actually
bought
over
hours /
days in a
month
Conversion Analytics - Conversion Rate Trends
DATA
SCIENTIST

Analysis
of which
products
were
placed in
cart vs
actually
bought
over
hours /
days in a
month
DATA SCIENTIST

You now know:
- how are your products
selling,
- when are they selling,
- which referring site helps
the most and other such info.
You now have the power of
Big Data Analytics on your
fingertips!
Wow!
Now, I can compete
against all the giants!
Let me start on my
marketing plans!
VP OF MARKETING
Q&A
@ThirdEyeCss
Third Eye is Google’s
Partner for the Google
Cloud Platform
We are mentioned on Google’s Cloud
Platform, site:
https://cloud.google.com/partners/
Tweet @ThirdEyeCss
Contact:
Dj Das, Founder & CEO, djdas@thirdeyecss.com
Alan Merrihew, VP of Business Development, alan@thirdeyecss.com
Phone

- (408) 462-5257

Corporate Site

- ThirdEyeCSS.com

Big Data Training

- ThirdEyeClasses.com

Big Data Educational Seminars
- BigDataCloud.com, BigDataCloudToday.com,
meetup.com/BigDataCloud
Big Data Jobs

- jobs.BigDataCloud.com

Big Data Analytics As a Service

- ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com
THANK YOU!

Contenu connexe

En vedette

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014Amazon Web Services
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Khor SoonHin
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlowBayu Aldi Yansyah
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformrajdeep
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies OverviewChris Schalk
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformAmazon Web Services
 
Introduccion a Azure Machine Learning
Introduccion a Azure Machine LearningIntroduccion a Azure Machine Learning
Introduccion a Azure Machine LearningEduardo Castro
 
Data Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsData Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsOlga Scrivner
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlowJose Papo, MSc
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformDr. Ketan Parmar
 
Azure Machine Learning tutorial
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorialGiacomo Lanciano
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
Google Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest ManilaGoogle Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest ManilaPatrick Chanezon
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 

En vedette (20)

(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlow
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies Overview
 
Big Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better PlatformBig Data in The Cloud: Architecting a Better Platform
Big Data in The Cloud: Architecting a Better Platform
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Introduccion a Azure Machine Learning
Introduccion a Azure Machine LearningIntroduccion a Azure Machine Learning
Introduccion a Azure Machine Learning
 
Data Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsData Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web Applications
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Machine learning and TensorFlow
Machine learning and TensorFlowMachine learning and TensorFlow
Machine learning and TensorFlow
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
Understanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud PlatformUnderstanding cloud with Google Cloud Platform
Understanding cloud with Google Cloud Platform
 
Azure Machine Learning tutorial
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorial
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Google Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest ManilaGoogle Cloud for Developers - Devfest Manila
Google Cloud for Developers - Devfest Manila
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 

Plus de BigDataCloud

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsBigDataCloud
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing ServicesBigDataCloud
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!BigDataCloud
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBigDataCloud
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBigDataCloud
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud PlatformBigDataCloud
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value BigDataCloud
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.BigDataCloud
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideBigDataCloud
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?BigDataCloud
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBigDataCloud
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinBigDataCloud
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 

Plus de BigDataCloud (20)

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 

Dernier

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Big Data Analytics on the Google Cloud Platform

  • 2. GROW WITH BIG DATA. Third Eye Consulting Services & Solutions LLC.
  • 3. For Questions Tweet Directly to @ThirdEyeCss We are actively monitoring this Twitter channel!
  • 4. Agenda 1. 5 minutes - Introductions 2. 15 minutes - Introduction to the Google Cloud Platform & its various Big Data services 3. 10 minutes - Showcasing various Online Retail Analytics - User, Site & Products Analytics 4. 15 minutes - Live Demonstration - Ingestion of session log data to visualization in Tableau 5. 15 minutes - Q&A Session (Can extend beyond based on the audience enthusiasm & participation!)
  • 6. Google Cloud Platform – Key Components App Engine  Big Query  Cloud SQL  Cloud Storage  Compute Engine Tweet @ThirdEyeCss  https://cloud.google.com
  • 7. App Engine - Architecture A highly elastic and scale on demand infrastructure for deploying and running front end web applications App Master Front End Instance 1 Front End Instance 2 Front End Instance 3 Front End Instance n App Server Instance 1 App Server Instance 2 App Server Instance 3 App Server Instance n Datasto re Memcac he Static Files https://cloud.google.com/products/app-engine
  • 8. App Engine - Advantages       Scales on Demand Very low barrier for entry No initial hardware costs Issues such as scalability, reliability are non-issues Can handle very large amounts of data Can handle very large user volumes, including sudden spikes by scaling elastically https://cloud.google.com/products/app-engine
  • 9. BigQuery  A column oriented data store that can store and process billions of rows of data  SQL like query syntax for querying data  Run ad-hoc queries against multi terabyte data sets in seconds  Highly scalable, reliable and secure as it uses underlying core Google Platform Infrastructure https://cloud.google.com/products/big-query
  • 10. BigQuery  Supports all the main ETL and BI tools like Informatica, Talend, QlikView and Tableau  Primarily used for real-time data analysis and visualization  Integration with App Engine through APIs https://cloud.google.com/products/big-query
  • 11. BigQuery SQL Access  Only SELECT operations  No CREATE, UPDATE or DROP  Analysis of Unstructured data using REGEXP_yyyy functions  JOINs of small (<8mb of compressed data) and large tables are possible. Performance penalty for large table joins https://cloud.google.com/products/big-query
  • 12. BigQuery Programmatic Access  bq command line tool, Google API client library, REST API  Google API client library supports various languages like Java, Python, JavaScript, Ruby, PHP, Google Apps Script  Authentication is handled via Oauth2  In REST API, credentials and HTTP request have to be handled manually by user https://cloud.google.com/products/big-query
  • 13. BigQuery Use Cases  Can  Real be used for batch analysis of large data sets time analytics for dashboard type applications  Pre-process very large data sets and serve data in real-time  Visualization using third party tools that call Big Query APIs. https://cloud.google.com/products/big-query
  • 14. Cloud SQL  MySQL database running on the Google Cloud Platform  Easy migration from local MySQL instances to Cloud SQL  Highly scalable and reliable with replication  Supports all major MySQL features including stored procedures, triggers and views  GUI Frontend for easy administration and operations  Built on top of core Google Infrastructure  Easy integration with App Engine https://cloud.google.com/products/cloud-sql
  • 15. Cloud Storage   Custom App Cloud SQL BigQuery Cloud SQL Cloud Storage A highly reliable cloud storage platform for storing and accessing vast amounts of data Can be used for data archival and content delivery  Data can be ingested and processed by other Google Cloud Services  Accessible through GUI, command line and APIs https://cloud.google.com/products/cloud-storage
  • 16. Cloud Storage  Object store that can deliver very efficiently over the internet  Not a mountable file system  Buckets are the basic container. They cannot be nested and can reside in the US or EU geographies.  Objects are stored in buckets. They are immutable and can be upto 5TB in size.  ACLs can be setup for Google users, groups, app domain, authenticated users with READ, WRITE or FULL_CONTROL. Signed URL access for anonymous users.  Can be accessed using XML and JSON REST APIs  Command line access using gsutil tool  App Engine Storage API for access from App Engine https://cloud.google.com/products/cloud-storage
  • 17. Compute Engine  Infrastructure as a service  Linux Virtual machines with associated storage and network infrastructure are hosted by Google  Can run any type of application or workload in the google cloud that uses the same Google Core Infrastructure  Highly elastic and scalable  A typical use case would be to provision a Hadoop Cluster on demand using several 10s to 100s of virtual machines as name node and data nodes https://cloud.google.com/products/compute-engine
  • 18. Compute Engine  Various machine type configurations possible such as High Memory, High CPU, Standard etc.  Very easy provisioning and management using cloud management software like RightScale  CentOS and Debian are the default OSes currently supported.  Typical use cases are batch processing, log analysis, i/o intensive workloads, hadoop on the cloud (map/reduce) https://cloud.google.com/products/compute-engine
  • 20. Online Retail Industry Forrester: U.S. Online Retail Sales to Hit $370 Billion by
  • 21. Healthcare Store  Large online retailer’s Health Store website.  Thousands of health care products are sold per month.
  • 22. These large online retailers are killing us! I need to increase sales. I need to understand my site visitors better. VP OF MARKETING Can Big Data Analytics help?
  • 23. DATA SCIENTIST Yes, Big Data Analytics can help! Google’s Cloud platform handles all the complexities of Big Data processing. We start with regular session log files.
  • 24. Session Log File (W3C compliant) Time & Date when visitor came on site Unique User & Session Id Product Page Visited by User Referral Site
  • 25. From the simple log files, we can do sophisticated analytics like these: DATA SCIENTIST User Analytics • # of Unique Site Visitors, per hour, per day • # of Return Site Visitors, per hour, per day • Total # of Site Visitors, per hour, per day • Top 10 Active Users per hour, per day
  • 26. Product Analytics like these: • Top 10 Popular Products per hour, per day • Top 10 popular Products in Shopping Basket per hour, per day • Top 10 Bought Products per hour, per day DATA SCIENTIST
  • 27. Conversion Analytics like these: • # of users who added products to shopping basket per hour, per day • # of users who actually bought products per hour, per day • % of users who browsed, added products to shopping cart & actually bought per hour, per day. DATA SCIENTIST
  • 28. Behold, The Google Cloud Platform’s Dashboard! DATA SCIENTIST List of available Services.
  • 29. Google Cloud Platform’s Cloud Storage DATA SCIENTIST Session Log Files Uploaded to Cloud Storage.
  • 30. Google Cloud Platform’s BigQuery DATA SCIENTIST Tables on BigQuery with data from Session Log Files.
  • 31. Running a Query on BigQuery DATA SCIENTIST Queries on BigQuery are very much SQL like, easy to develop & gets results fast.
  • 32. Visualize BigQuery’s Results in DATA SCIENTIST Tableau provides an easy & effective way to develop dashboards & reports.
  • 33. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
  • 34. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
  • 35. Site Analytics – Referral Site Comparisons DATA SCIENTIST Traffic referred to site from other sources like Google. com
  • 36. Product Analytics - Product Purchase Trends DATA SCIENTIST Analysis of specific products as purchased on site over hours / days in a month
  • 37. Conversion Analytics - Product Added to Cart vs. Bought. DATA SCIENTIST Analysis of which products were placed in cart vs actually bought over hours / days in a month
  • 38. Conversion Analytics - Conversion Rate Trends DATA SCIENTIST Analysis of which products were placed in cart vs actually bought over hours / days in a month
  • 39. DATA SCIENTIST You now know: - how are your products selling, - when are they selling, - which referring site helps the most and other such info. You now have the power of Big Data Analytics on your fingertips!
  • 40. Wow! Now, I can compete against all the giants! Let me start on my marketing plans! VP OF MARKETING
  • 42. Third Eye is Google’s Partner for the Google Cloud Platform We are mentioned on Google’s Cloud Platform, site: https://cloud.google.com/partners/ Tweet @ThirdEyeCss
  • 43. Contact: Dj Das, Founder & CEO, djdas@thirdeyecss.com Alan Merrihew, VP of Business Development, alan@thirdeyecss.com Phone - (408) 462-5257 Corporate Site - ThirdEyeCSS.com Big Data Training - ThirdEyeClasses.com Big Data Educational Seminars - BigDataCloud.com, BigDataCloudToday.com, meetup.com/BigDataCloud Big Data Jobs - jobs.BigDataCloud.com Big Data Analytics As a Service - ClustersTogo.com, Power140.com, Raaser.com, PowerI90.com

Notes de l'éditeur

  1. Online Retail market has seen phenomenal growth in the recent years which is not going to abate in the next couple of decades.More Americans are planning to shop online than go down to their neighborhood mall!