1. Big Data
Turning your data problem into a competitive advantage
Barak Regev
Head of Cloud Platform - EMEA
2. 20 min in 1 minute
Managing big data is hard
There is a better way
Put your data to work for you.
3. How we do it - Google Infrastructure
4 billion hours of video per month
425 million Gmail users
100,000,000 GB web Index
0.25 secs to search results
5. “ How are hotel reservations for Spain from New
York compared with this time last year? ”
“ Do we need to adjust our marketing campaign?
Where? ”
CenterParcs - European hospitality
6. “ Which users who signed up last quarter,
have also advanced at least 3 levels, and
purchased an item worth more than $5? ”
Claritics - mobile & social user analytics
7. Business & IT trends driving Big Data
Opportunities Challenges
Data is a core business asset Information is growing faster than ability to
leverage it
Increasingly data is out in the Cloud
(e.g. social, CRM) Tough for Enterprise to capture all the
data they generate
New things are possible in the Cloud Scaling traditional BI for Big Data can be
(unique algorithms, scale) hard
Greatly increased speed of sharing and Skills: requires IT, analytics, software
iteration development
8. What does Big Data look like?
Some common characteristics Diverse industries
Structured, semi-structured, unstructured Retail point of sales transactions
Millions if not billions of rows User activity logs (mobile & social)
Too large to process on a single machine Mobile telemetry & smart devices
Too large to store on a single machine Industrial & manufacturing
High rate of growth Financial trading
More daily Medical research (e.g. genomics)
Movie rendering & production
9. Put the Data to work
Google cloud services for Big Data
10. Use the cloud
Composable cloud services
Focus on the solution rather than on the
infrastructure
Do new things that weren't possible before
Pay for what you use.
11. BIG DATA LOG ANALYSIS
Google
POS, Spreadsheets
Clickstream
RFID
Customer Loyalty Other BI Tools
Add clickthroughs..
BigQuery
Data sets for
further Analysis
App Engine
Scalable Storage App
SQL API
Marketing
Corporate data Merchandising
3rd party data Local Stores
Partners
Store all your data Analyze interactively Securely Share/
in the cloud Product Affinity, Market Basket etc distribute the results
12. Scaling large ads reporting
Latency Customer load test: On-prem MySQL vs BigQuery
(seconds)
# days of data
Business: ads authoring tools and reporting
Data: ad serving logs for 500 websites, ~300M rows/day
Problem solved: interactively finding new trends and patterns
14. What did we learn?
Store data with reliability, redundancy and consistency
Go from Data to Meaning
At Scale
...fast
Google white papers
Google File System (2003)
MapReduce: Simplified Data Processing on Large Clusters (2004)
BigTable: A Distributed Storage System for Structured Data (2006)
Dremel: Interactive Analysis of Web-Scale Datasets (2010)
Machine Translation (2004-2011)
15. The virtuous cycle of data
Collect Data
(Cloud Storage, Datastore,
Logstore)
Build application Process Data
(GAE / GCE) (App Engine, GCE)
(improve)
Analyze Data,
(BigQuery)
17. BigQuery use cases in industry
Ad Spend Attribution Mash up Adwords + Google Analytics data + customer reservations for high
(online travel reservations) volume attribution analysis
Media consulting Analyze 20GB/day of DoubleClick display ads performance metrics for F500
(global top-5 media agency) clients
Ad authoring tools Deliver x-platform performance analytics dashboards to 100s of ads authoring
(online ads authoring) customers
Social gaming Cohort analysis on million+ gamers to monetize massive online social gaming
(data analytics vendor)
Revenue optimization Measure x-media campaign effectiveness to maximize occupancy rates
(holiday/travel properties)
Business Requirements
A single place to capture growing data
Combine data from different sources
Ad hoc detection of patterns and correlations
Easily share data insights with org
Distribute data-based decision making
19. Mobile & social gaming user analysis
Notice trend change
Slice user data, identify segments
Compare segments vs general
population
20. Revenue optimization - hospitality industry
New solution for real-time decision making
Saves more than $150,00 a year
AppEngine
BigQuery Cloud Storage
Regional Sales Analysts Execs
BI team
Oracle DB
Netezza appliance