SlideShare une entreprise Scribd logo
1  sur  54
October 11-14, Seattle, WA
Tier-1 BI in the World of
Big Data
SQLCAT
Speaker Name
Thomas Kejser, Denny Lee – Microsoft
w/ special guest Kenneth Lieu – Yahoo!
Questions?
• Are you interested in how to build, deploy, and
maintain multi-terabyte cubes?
• This is 400+ level information
• “Does Hadoop give you a case of hives”
• If you understand the pun, definitely stay
• What does Tier-1 or enterprise mean to you?
• You might not like this presentation if you answered
in gigabytes
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 2
Agenda
Microsoft BI Today
Two different workloads, same challenge
• Ad Analytics
• Investment Banks
HADOOP: The mother of all stovepipes
The Big Shuffle
Getting data OUT of BigData
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 3
MICROSOFT BI TODAY
Session Code | Session Title 4
Microsoft BI Today
Two Data Models
• Dimensional (UDM)
• Tabular (The model formerly known as BISM)
UDM is the current large scale engine
• Yahoo!’s 24TB cube
• Multi-terabyte cubes are quickly becoming the
norm
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 5
UDM Scale Themes
• Get that hardware balance right (yes, we have to
talk about IOPS)
• Repeat after me: partitioning, partitioning,
partitioning!
• Multi-user query concurrency– how to handle it
• Keeping it simple
• Locking – how it works – and how to work around it
• What? Did you say ROLAP?
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 6
UDM Guidance
sqlcat.com
• Analysis Services 2008R2 Performance Guide
• Analysis Services 2008R2 Operations Guide
• SSAS Maestro Course (Tech Level 500, 5 day
course
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 7
BIGDATA WORKLOADS
Session Code | Session Title 8
Two different workloads, same challenges
680,000,000Visitors to Yahoo! Branded sites:
Ad Impressions: 3,500,000,000(perday)
Refresh Frequency: Hourly
464,000,000,000(perqtr)Rows Loaded:
Average Query Time: <10 seconds
Yahoo! TAO Technical Requirements
5,000,000,000Risk Vectors reloaded / 30 min:
Total Vectors Loaded: 600,000,000,000
Refresh Frequency: Seconds
ThousandsTotal Concurrent, active Queries:
Average Query Time: Seconds
Investment Banks - Technical Requirements
Workload Scale Themes
Old Themes:
• Getting I/O right (solved!)
• Getting configuration right (solved: Maestro and Fast Track)
• Getting Data Models right (done, but spread the word)
• SMP User concurrency (done!)
• SMP Scaled ETL (done, World Record)
New Themes:
• Cheap storage at scale
• Massive query scale (both size and concurrency)
• Scaling ETL another order of magnitude
• Scaled and Integrated Reporting/BI
....What did we learn?
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 11
DW/BI Scale is getting expensive…
Component Current Max Example Hardware
Cores 128 (256) SGI Altix UV 100
Memory 2TB IBM x5 Series
HP DL980
Attached Storage Capacity
(at reasonable speed)
200-400 TB? Custom build DAS
HP P9500
EMC Symmetrix
Hitachi HDS
Max Table Scan Speed 36GB/sec HP DL980
Max IOPS 1M IOPS FusionIO Octals
2 x Dedicated, enterprise Grade SAN
Max Bulk Speed 16 M rows/sec Unisys 7600R
Max Extract Speed 41M rows/sec 4 x 10Gbit Ethernet
64 cores dedicated Server
Biggest Cube 24TB
Largest Single DB 75TB HP Superdome, 128 Cores
Dedicated SAN
Compression to the Rescue?
Example Compression Rates with Column
Store/VertiPaq:
• Web Logs 30:1
• Trade Risk Vectors 9:1
Good news: Columnar compression can shave off an
order of magnitude
Bad news: But you still have a lot more data than you
can comfortably handle in a single box
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 13
What we saw and see…
Murphy’s Law Risk Countermeasure
Programmers use more CPU
than you have
Can’t add beyond max cores You try to scale out
You scaled out You get WORSE scale Poor you!
You bought too little hardware System is unresponsive Buy too much hardware
You bought too much hardware You wasted money Poor you!
Programmer “forgot” to write
multi threaded code
You buy more hardware, the
system scales WORSE!
Rework code
You reworked code You “forgot” how hard it is to
write multi threaded code
Poor you!
You capacity planned disks
wrong
You run out of disk space
System is down
You bought at big SAN to
compensate
You bought at big SAN to
compensate
You wasted money Poor you!
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 14
....IT JUST WASN’T ENOUGH!
Session Code | Session Title 15
HADOOP is the
Mother of all
Stovepipes
Stastistics, they
catch you
SATA drives?
Really?
Serialization
It is always day
zero
Code is free
The IP is in
the data Map/Reduce
Name/Value
pairs?
MTTI
Cheap, Fast,
Quality, choose two
not three
Scale - What are we trying to achieve?
0
500
1000
1500
2000
2500
3000
0 4 8 12 16 20 24
Throughput
Some Hardware Resource
Good
So so
Bad
We want
to live here
The SMP Scale Up Gaps
The Scale-up
Gaps!
Statistics catch up with you
In a large system, something is ALWAYS broken
Mirrors are no longer enough
• Clone breaks before it master can be
reestablished
• Example: Azure uses three copies of data
User queries run wild, get killed, racks overheat,
network switches die etc…
= Design for failure!
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 18
NoSQL ecosystem | open source, commodity
Cassandra
Hive
Scribe
Hadoop
Hadoop
Oozie
Pig (-latin)
BackType
Hadoop
Pig / Hbase
Cassandra
MR/GFS
Bigtable
Dremel
…
SimpleDB
Dynamo
EC2 / S3
…
Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]
Mahout | Scalable machine learning and data mining
MongoDB | Document-oriented database (C++)
Couchbase | CouchDB (doc dB) + Membase (memcache protocol)
Hbase | Hadoop column-store database
R | Statistical computing and graphics
Pegasus | Peta-scale graph mining system
Lucene | full-featured text search engine library
Comparing RDBMS and MapReduce
Traditional RDBMS MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low (BASE)
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
Traditional RDBMS: Move Data to Compute
As you process more and more data, and you want interactive response
• Typically need more expensive hardware
• Failures at the points of disk and network can be quite problematic
It’s all about ACID: atomicity, consistency, isolation, durability
Can work around this problem with more expensive HW and systems
• Though distribution problem becomes harder to do
Hadoop (and NoSQL in general) follows the Map Reduce framework
• Developed initially by Google -> Map Reduce and Google File system
• Embraced by community to develop MapReduce algorithms that are very robust
• Built Hadoop Distributed File System (HDFS) to auto-replicate data to multiple nodes
• And execute a single MR task on all/many nodes available on HDFS
Use commodity HW: no need for specialized and expensive network and disk
Not so much ACID, but BASE (Basically Available, Soft state, Eventually consistent)
Hadoop / NoSQL: Move Compute to the Data
// Sample Generated Log
588.891.552.388,-,08/05/2011,11:00:02,W3SVC1,CTSSVR14,-,-,0,-
,200,-,GET,/c.gif,Mozilla/5.0 (Windows NT 6.1; rv:5.0)
Gecko/20100101 Firefox/5.0,http://foo.bar.com/cid-
4985109174710/blah?fdkjafdf,[GUID],-,-
,&Page=blah&Hierarchy=2&region=Z1&IsoCy=BR&Lang=1046&bxr=…
select
parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'IsoCy'),
parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'Lang'),
count(distinct GUID)
from ctslog_sample
group by
parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'IsoCy'),
parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'Lang'),
HiveQL: SQL-like language
• Write SQL-like query which becomes
MapReduce functions
• Includes functions like parse_url and
concat so one can perform parsing
functions in HiveQL
Query a web log using HiveQL
But how FAST are we, when we achieve it?
The precarious balance between scale and
performance is going to get even more important.
What do you want?
1. Guaranteed response, but get it slow
2. Fast response, but not always
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 24
ETL: THE BIG SHUFFLE
Session Code | Session Title 25
Our Ideal, scalable world
1-1000
“Logical” Table
1001-2000
2001-3000
3001-4000
Nice
and friendly
Source
Reality…Sorting and Indexes…
1-1000
“Logical” Table
1001-2000
2001-3000
3001-4000
Nice
and friendly
Source
A Z
A Z
A Z
A Z
More Reality… Sources Are Not Nice…
1-1000
“Logical” Table
1001-2000
2001-3000
3001-4000
1,1001,2001
3,1003,2003..
4,1004,2004..
2,1002,2002..
Etc…
Investment Bank Architecture – First stab
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 29
BigData
Cluster
Batches
Batches
Batches
“Golden”
Source
AS Cube
1:1
1:1
1:1
1-3M rows/sec
Sort/Merge Buffer
Zooming in on the Merge Problem
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 30AS Cube
Batch 1
Batch n
Give me Book X!
X
X
Batch 2
Batch 3
X
X
1:1
1:1
1:1
1:1
The big shuffle!
0
1
2
3
hash
ETL Unit
Calc. Hash
Distribute
ETL Unit
Calc. Hash
Distribute
• Each unit operates on a subset of the data
• Computation is distributed
• Database does the minimum work, focus on an optimized user
model!
• Equal sized partitions after the merge (the merge is still there)
ETL Unit
Calc. Hash
Distribute
Investment Bank architecture – Better!
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 32
BigData
Cluster
Batches
Batches
Batches
“Golden”
Source
AS Cube
Hash 3
Hash 2
Hash 1
3M rows/sec
(Current)
X20
throughput
Shuffle Speed Tests
BULK Inbound Speed to SQL Server SMP
• >3GB/sec
Outbound from SQL Server: 40M rows/sec
• ... Or saturating 4 x 10Gbit NIC one way
When you have shuffled:
Using standard relational / MDX functionality to ad-
hoc query subset of BigData
High concurrency access at low CPU cost
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 33
Network as the new Barrier?…
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 34
GETTING THE DATA OUT
Session Code | Session Title 35
Hive Connector: First Step in Integration
with our BI Platform
New Hive ODBC driver
Leverage Hadoop for Map Reduce, text mining, statistical analysis, etc.
Get Hadoop data into AS, RS, PowerPivot using HiveQL
HDFS
Map Reduce
Hive
AS Tabular AS Multidimensional
Crescent Excel
PowerPivot
Analytical Apps
SQL Engine
PDW
RS
Summary: The Challenge Ahead
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 37
Cube
This ...
”Mart” / EDW
F
and this..., this...
...is what we need
to get good at now!
YAHOO! CASE STUDY
Session Code | Session Title 38
A review of the themes
Yahoo! manages a
powerful scalable
advertising exchange
that includes publishers
and advertisers
Yahoo! TAO Business Challenge
Advertisers want to get
the best bang for their
buck by reaching their
targeted audiences
effectively and efficiently
Yahoo! TAO Business Challenge
Yahoo! needs visibility into how consumers
are responding to ads along many
dimensions: web sites, creatives, time of
day, gender, age, location to make the
exchange work as efficiently and
effectively as possible
Yahoo! TAO Business Challenge
Yahoo! TAO Technical Requirements
680,000,000Visitors to Yahoo! Branded sites:
Ad Impressions: 3,500,000,000(perday)
Refresh Frequency: Hourly
464,000,000,000(perqtr)Rows Loaded:
Average Query Time: <10 seconds
Yahoo! TAO Platform Architecture
How did we load so much so quickly?
Data Archive & Staging
Oracle 11G RAC
File 1
File 2
File N
Partition 1
Partition 2
Partition N
Partition 1
Partition 2
Partition N
24TB
Cube
/qtr
1.2TB
/day
135GB/day
compressed
2PB
cluster
Data Aggregation & ETL
Hadoop
BI Server
SQL Server Analysis
Services 2008 R2
PartitionsPartitions
Yahoo Example – “Fast” Oracle Load
• Data is streamed in to Oracle to files
• To get max processing, 30 threads are fired because all T (temp) partitions are
processed concurrently
• Super fast data loads
• Problem is that it requires constant merging of partitions
Files are streamed in
as they become
available
10/10/10 T360772
10/10/10 T360773
…
10/10/10 T361645
10/10/10 T360772
Oracle 10g
10/10/10 T360773
10/10/10 T361645
…
10/10/10 T360772
10/10/10 T360773
10/10/10 T361645
…
SSAS
10/10/10
Merge
Partitions – Directly Merging
Partitions
10/10/10 00:00
Oracle 10g
10/10/10 01:00
10/10/10 23:00
…
• New model allows for set hourly partitions
• No more streaming data but with hourly partitions, cannot have as many threads for
fast data loads, unless…
• Process multiple cubes or measure groups in parallel
Partitions
10/10/10 00:00
10/10/10 01:00
10/10/10 23:00
…
SSAS
Segments
10/10/10 00:00
10/10/10 01:00
10/10/10 23:00
…
Activities
10/10/10 00:00
10/10/10 01:00
10/10/10 23:00
…
Uniques
BI Query Servers
SQL Server Analysis
Services 2008 R2
24TB
Cube
/qtr
Adhoc Query/Visualization
Tableau Desktop 6
Optimization Application
Custom J2EE App
Yahoo! TAO Platform Architecture
Queries at the “speed of thought”
464B rows of
event level data
/qtr
• Dimensions: 24
• Attributes: 247
• Measures: 207
Avg Query Time:
6 secs
Avg Query Time:
2 secs
Yahoo! TAO Return on Investment
For campaigns
optimized using TAO,
advertisers spent 15%
more with Yahoo! than
before
For campaigns
optimized using TAO,
eCPMs (revenue)
has more than
doubled!
Yahoo! TAO Return on Investment
Yahoo! TAO exposed customer segment
performance to campaign managers and
advertisers for the first time! No longer
“flying audience blind”
Yahoo! TAO Future Direction
2xIncrease Daily Ad Impressions:
5xIncrease consumer segments:
Distinct Count
Hadoop to SSASNew Complexity:
New technologies:
Denali: Apollo,
VertiPaq, and Crescent
HiveODBC Driver
Big Data and Analytics
• Later this year
• HiveODBC driver
• Hadoop-to-SQL/PDW connectors
• Hadoop on Windows Azure
• Mid-next year
• Hadoop on Windows Server
BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 51
Complete the Evaluation Form
to Win!
Win a Dell Mini Netbook – every day – just for submitting
your completed form. Each session evaluation form
represents a chance to win.
Pick up your evaluation form:
• In each presentation room
• Online on the PASS Summit website
Drop off your completed form:
• Near the exit of each presentation room
• At the Registration desk
• Online on the PASS Summit website
Sponsored by Dell
52BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data
53BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data
Microsoft SQL
Server Clinic
Work through your
technical issues with SQL
Server CSS & get
architectural guidance from
SQLCAT
Microsoft
Product Pavilion
Talk with Microsoft SQL
Server & BI experts to
learn about the next
version of SQL Server and
check out the new
Database Consolidation
Appliance
Expert Pods
Meet Microsoft SQL
Server Engineering team
members &
SQL MVPs
Hands-on Labs
Get experienced through
self-paced & instructor-led
labs on our cloud based lab
platform - bring your laptop
or use HP provided
hardware
Room 611 Expo Hall 6th Floor Lobby Room 618-620
October 11-14, Seattle, WA
Thank you
for attending this session and the
2011 PASS Summit in Seattle

Contenu connexe

Tendances

Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsAmr Awadallah
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQLCrate.io
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseLaine Campbell
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Latest trends in database management
Latest trends in database managementLatest trends in database management
Latest trends in database managementBcomBT
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Next generation databases july2010
Next generation databases july2010Next generation databases july2010
Next generation databases july2010Guy Harrison
 

Tendances (19)

Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Service Primitives for Internet Scale Applications
Service Primitives for Internet Scale ApplicationsService Primitives for Internet Scale Applications
Service Primitives for Internet Scale Applications
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Hybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouseHybrid my sql_hadoop_datawarehouse
Hybrid my sql_hadoop_datawarehouse
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Latest trends in database management
Latest trends in database managementLatest trends in database management
Latest trends in database management
 
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsWhat Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Next generation databases july2010
Next generation databases july2010Next generation databases july2010
Next generation databases july2010
 

Similaire à SQLCAT: Tier-1 BI in the World of Big Data

Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 
BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827Anthony Potappel
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayAmazon Web Services
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectRemy Rosenbaum
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQLUlf Wendel
 
22059 slides
22059 slides22059 slides
22059 slidespholden1
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Lynn Langit
 
Scaling db infra_pay_pal
Scaling db infra_pay_palScaling db infra_pay_pal
Scaling db infra_pay_palpramod garre
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 

Similaire à SQLCAT: Tier-1 BI in the World of Big Data (20)

Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827BigData Behind-the-Scenes~20150827
BigData Behind-the-Scenes~20150827
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBay
 
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live ConnectTableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
Tableau on Hadoop Meet Up: Advancing from Extracts to Live Connect
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Vote NO for MySQL
Vote NO for MySQLVote NO for MySQL
Vote NO for MySQL
 
22059 slides
22059 slides22059 slides
22059 slides
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011Strata Online_road_to_enterprise_data_2011
Strata Online_road_to_enterprise_data_2011
 
Scaling db infra_pay_pal
Scaling db infra_pay_palScaling db infra_pay_pal
Scaling db infra_pay_pal
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 

Plus de Denny Lee

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceDenny Lee
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connectorDenny Lee
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesDenny Lee
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesDenny Lee
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerDenny Lee
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Denny Lee
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherDenny Lee
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarDenny Lee
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Denny Lee
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDenny Lee
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecurityDenny Lee
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008Denny Lee
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesDenny Lee
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDenny Lee
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger BrainsDenny Lee
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Denny Lee
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeDenny Lee
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarDenny Lee
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Denny Lee
 

Plus de Denny Lee (20)

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better TogetherYahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger Brains
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

SQLCAT: Tier-1 BI in the World of Big Data

  • 1. October 11-14, Seattle, WA Tier-1 BI in the World of Big Data SQLCAT Speaker Name Thomas Kejser, Denny Lee – Microsoft w/ special guest Kenneth Lieu – Yahoo!
  • 2. Questions? • Are you interested in how to build, deploy, and maintain multi-terabyte cubes? • This is 400+ level information • “Does Hadoop give you a case of hives” • If you understand the pun, definitely stay • What does Tier-1 or enterprise mean to you? • You might not like this presentation if you answered in gigabytes BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 2
  • 3. Agenda Microsoft BI Today Two different workloads, same challenge • Ad Analytics • Investment Banks HADOOP: The mother of all stovepipes The Big Shuffle Getting data OUT of BigData BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 3
  • 4. MICROSOFT BI TODAY Session Code | Session Title 4
  • 5. Microsoft BI Today Two Data Models • Dimensional (UDM) • Tabular (The model formerly known as BISM) UDM is the current large scale engine • Yahoo!’s 24TB cube • Multi-terabyte cubes are quickly becoming the norm BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 5
  • 6. UDM Scale Themes • Get that hardware balance right (yes, we have to talk about IOPS) • Repeat after me: partitioning, partitioning, partitioning! • Multi-user query concurrency– how to handle it • Keeping it simple • Locking – how it works – and how to work around it • What? Did you say ROLAP? BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 6
  • 7. UDM Guidance sqlcat.com • Analysis Services 2008R2 Performance Guide • Analysis Services 2008R2 Operations Guide • SSAS Maestro Course (Tech Level 500, 5 day course BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 7
  • 8. BIGDATA WORKLOADS Session Code | Session Title 8 Two different workloads, same challenges
  • 9. 680,000,000Visitors to Yahoo! Branded sites: Ad Impressions: 3,500,000,000(perday) Refresh Frequency: Hourly 464,000,000,000(perqtr)Rows Loaded: Average Query Time: <10 seconds Yahoo! TAO Technical Requirements
  • 10. 5,000,000,000Risk Vectors reloaded / 30 min: Total Vectors Loaded: 600,000,000,000 Refresh Frequency: Seconds ThousandsTotal Concurrent, active Queries: Average Query Time: Seconds Investment Banks - Technical Requirements
  • 11. Workload Scale Themes Old Themes: • Getting I/O right (solved!) • Getting configuration right (solved: Maestro and Fast Track) • Getting Data Models right (done, but spread the word) • SMP User concurrency (done!) • SMP Scaled ETL (done, World Record) New Themes: • Cheap storage at scale • Massive query scale (both size and concurrency) • Scaling ETL another order of magnitude • Scaled and Integrated Reporting/BI ....What did we learn? BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 11
  • 12. DW/BI Scale is getting expensive… Component Current Max Example Hardware Cores 128 (256) SGI Altix UV 100 Memory 2TB IBM x5 Series HP DL980 Attached Storage Capacity (at reasonable speed) 200-400 TB? Custom build DAS HP P9500 EMC Symmetrix Hitachi HDS Max Table Scan Speed 36GB/sec HP DL980 Max IOPS 1M IOPS FusionIO Octals 2 x Dedicated, enterprise Grade SAN Max Bulk Speed 16 M rows/sec Unisys 7600R Max Extract Speed 41M rows/sec 4 x 10Gbit Ethernet 64 cores dedicated Server Biggest Cube 24TB Largest Single DB 75TB HP Superdome, 128 Cores Dedicated SAN
  • 13. Compression to the Rescue? Example Compression Rates with Column Store/VertiPaq: • Web Logs 30:1 • Trade Risk Vectors 9:1 Good news: Columnar compression can shave off an order of magnitude Bad news: But you still have a lot more data than you can comfortably handle in a single box BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 13
  • 14. What we saw and see… Murphy’s Law Risk Countermeasure Programmers use more CPU than you have Can’t add beyond max cores You try to scale out You scaled out You get WORSE scale Poor you! You bought too little hardware System is unresponsive Buy too much hardware You bought too much hardware You wasted money Poor you! Programmer “forgot” to write multi threaded code You buy more hardware, the system scales WORSE! Rework code You reworked code You “forgot” how hard it is to write multi threaded code Poor you! You capacity planned disks wrong You run out of disk space System is down You bought at big SAN to compensate You bought at big SAN to compensate You wasted money Poor you! BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 14
  • 15. ....IT JUST WASN’T ENOUGH! Session Code | Session Title 15 HADOOP is the Mother of all Stovepipes Stastistics, they catch you SATA drives? Really? Serialization It is always day zero Code is free The IP is in the data Map/Reduce Name/Value pairs? MTTI Cheap, Fast, Quality, choose two not three
  • 16. Scale - What are we trying to achieve? 0 500 1000 1500 2000 2500 3000 0 4 8 12 16 20 24 Throughput Some Hardware Resource Good So so Bad We want to live here
  • 17. The SMP Scale Up Gaps The Scale-up Gaps!
  • 18. Statistics catch up with you In a large system, something is ALWAYS broken Mirrors are no longer enough • Clone breaks before it master can be reestablished • Example: Azure uses three copies of data User queries run wild, get killed, racks overheat, network switches die etc… = Design for failure! BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 18
  • 19. NoSQL ecosystem | open source, commodity Cassandra Hive Scribe Hadoop Hadoop Oozie Pig (-latin) BackType Hadoop Pig / Hbase Cassandra MR/GFS Bigtable Dremel … SimpleDB Dynamo EC2 / S3 … Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ] Mahout | Scalable machine learning and data mining MongoDB | Document-oriented database (C++) Couchbase | CouchDB (doc dB) + Membase (memcache protocol) Hbase | Hadoop column-store database R | Statistical computing and graphics Pegasus | Peta-scale graph mining system Lucene | full-featured text search engine library
  • 20. Comparing RDBMS and MapReduce Traditional RDBMS MapReduce Data Size Gigabytes (Terabytes) Petabytes (Hexabytes) Access Interactive and Batch Batch Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low (BASE) Scaling Nonlinear Linear DBA Ratio 1:40 1:3000 Reference: Tom White’s Hadoop: The Definitive Guide
  • 21. Traditional RDBMS: Move Data to Compute As you process more and more data, and you want interactive response • Typically need more expensive hardware • Failures at the points of disk and network can be quite problematic It’s all about ACID: atomicity, consistency, isolation, durability Can work around this problem with more expensive HW and systems • Though distribution problem becomes harder to do
  • 22. Hadoop (and NoSQL in general) follows the Map Reduce framework • Developed initially by Google -> Map Reduce and Google File system • Embraced by community to develop MapReduce algorithms that are very robust • Built Hadoop Distributed File System (HDFS) to auto-replicate data to multiple nodes • And execute a single MR task on all/many nodes available on HDFS Use commodity HW: no need for specialized and expensive network and disk Not so much ACID, but BASE (Basically Available, Soft state, Eventually consistent) Hadoop / NoSQL: Move Compute to the Data
  • 23. // Sample Generated Log 588.891.552.388,-,08/05/2011,11:00:02,W3SVC1,CTSSVR14,-,-,0,- ,200,-,GET,/c.gif,Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20100101 Firefox/5.0,http://foo.bar.com/cid- 4985109174710/blah?fdkjafdf,[GUID],-,- ,&Page=blah&Hierarchy=2&region=Z1&IsoCy=BR&Lang=1046&bxr=… select parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'IsoCy'), parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'Lang'), count(distinct GUID) from ctslog_sample group by parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'IsoCy'), parse_url(concat("http://www.blah.com?", parameters), 'QUERY', 'Lang'), HiveQL: SQL-like language • Write SQL-like query which becomes MapReduce functions • Includes functions like parse_url and concat so one can perform parsing functions in HiveQL Query a web log using HiveQL
  • 24. But how FAST are we, when we achieve it? The precarious balance between scale and performance is going to get even more important. What do you want? 1. Guaranteed response, but get it slow 2. Fast response, but not always BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 24
  • 25. ETL: THE BIG SHUFFLE Session Code | Session Title 25
  • 26. Our Ideal, scalable world 1-1000 “Logical” Table 1001-2000 2001-3000 3001-4000 Nice and friendly Source
  • 27. Reality…Sorting and Indexes… 1-1000 “Logical” Table 1001-2000 2001-3000 3001-4000 Nice and friendly Source A Z A Z A Z A Z
  • 28. More Reality… Sources Are Not Nice… 1-1000 “Logical” Table 1001-2000 2001-3000 3001-4000 1,1001,2001 3,1003,2003.. 4,1004,2004.. 2,1002,2002.. Etc…
  • 29. Investment Bank Architecture – First stab BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 29 BigData Cluster Batches Batches Batches “Golden” Source AS Cube 1:1 1:1 1:1 1-3M rows/sec
  • 30. Sort/Merge Buffer Zooming in on the Merge Problem BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 30AS Cube Batch 1 Batch n Give me Book X! X X Batch 2 Batch 3 X X 1:1 1:1 1:1 1:1
  • 31. The big shuffle! 0 1 2 3 hash ETL Unit Calc. Hash Distribute ETL Unit Calc. Hash Distribute • Each unit operates on a subset of the data • Computation is distributed • Database does the minimum work, focus on an optimized user model! • Equal sized partitions after the merge (the merge is still there) ETL Unit Calc. Hash Distribute
  • 32. Investment Bank architecture – Better! BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 32 BigData Cluster Batches Batches Batches “Golden” Source AS Cube Hash 3 Hash 2 Hash 1 3M rows/sec (Current) X20 throughput
  • 33. Shuffle Speed Tests BULK Inbound Speed to SQL Server SMP • >3GB/sec Outbound from SQL Server: 40M rows/sec • ... Or saturating 4 x 10Gbit NIC one way When you have shuffled: Using standard relational / MDX functionality to ad- hoc query subset of BigData High concurrency access at low CPU cost BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 33
  • 34. Network as the new Barrier?… BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 34
  • 35. GETTING THE DATA OUT Session Code | Session Title 35
  • 36. Hive Connector: First Step in Integration with our BI Platform New Hive ODBC driver Leverage Hadoop for Map Reduce, text mining, statistical analysis, etc. Get Hadoop data into AS, RS, PowerPivot using HiveQL HDFS Map Reduce Hive AS Tabular AS Multidimensional Crescent Excel PowerPivot Analytical Apps SQL Engine PDW RS
  • 37. Summary: The Challenge Ahead BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 37 Cube This ... ”Mart” / EDW F and this..., this... ...is what we need to get good at now!
  • 38. YAHOO! CASE STUDY Session Code | Session Title 38 A review of the themes
  • 39. Yahoo! manages a powerful scalable advertising exchange that includes publishers and advertisers Yahoo! TAO Business Challenge
  • 40. Advertisers want to get the best bang for their buck by reaching their targeted audiences effectively and efficiently Yahoo! TAO Business Challenge
  • 41. Yahoo! needs visibility into how consumers are responding to ads along many dimensions: web sites, creatives, time of day, gender, age, location to make the exchange work as efficiently and effectively as possible Yahoo! TAO Business Challenge
  • 42. Yahoo! TAO Technical Requirements 680,000,000Visitors to Yahoo! Branded sites: Ad Impressions: 3,500,000,000(perday) Refresh Frequency: Hourly 464,000,000,000(perqtr)Rows Loaded: Average Query Time: <10 seconds
  • 43. Yahoo! TAO Platform Architecture How did we load so much so quickly? Data Archive & Staging Oracle 11G RAC File 1 File 2 File N Partition 1 Partition 2 Partition N Partition 1 Partition 2 Partition N 24TB Cube /qtr 1.2TB /day 135GB/day compressed 2PB cluster Data Aggregation & ETL Hadoop BI Server SQL Server Analysis Services 2008 R2
  • 44. PartitionsPartitions Yahoo Example – “Fast” Oracle Load • Data is streamed in to Oracle to files • To get max processing, 30 threads are fired because all T (temp) partitions are processed concurrently • Super fast data loads • Problem is that it requires constant merging of partitions Files are streamed in as they become available 10/10/10 T360772 10/10/10 T360773 … 10/10/10 T361645 10/10/10 T360772 Oracle 10g 10/10/10 T360773 10/10/10 T361645 … 10/10/10 T360772 10/10/10 T360773 10/10/10 T361645 … SSAS 10/10/10 Merge
  • 45. Partitions – Directly Merging Partitions 10/10/10 00:00 Oracle 10g 10/10/10 01:00 10/10/10 23:00 … • New model allows for set hourly partitions • No more streaming data but with hourly partitions, cannot have as many threads for fast data loads, unless… • Process multiple cubes or measure groups in parallel Partitions 10/10/10 00:00 10/10/10 01:00 10/10/10 23:00 … SSAS Segments 10/10/10 00:00 10/10/10 01:00 10/10/10 23:00 … Activities 10/10/10 00:00 10/10/10 01:00 10/10/10 23:00 … Uniques
  • 46. BI Query Servers SQL Server Analysis Services 2008 R2 24TB Cube /qtr Adhoc Query/Visualization Tableau Desktop 6 Optimization Application Custom J2EE App Yahoo! TAO Platform Architecture Queries at the “speed of thought” 464B rows of event level data /qtr • Dimensions: 24 • Attributes: 247 • Measures: 207 Avg Query Time: 6 secs Avg Query Time: 2 secs
  • 47. Yahoo! TAO Return on Investment For campaigns optimized using TAO, advertisers spent 15% more with Yahoo! than before For campaigns optimized using TAO, eCPMs (revenue) has more than doubled!
  • 48. Yahoo! TAO Return on Investment Yahoo! TAO exposed customer segment performance to campaign managers and advertisers for the first time! No longer “flying audience blind”
  • 49. Yahoo! TAO Future Direction 2xIncrease Daily Ad Impressions: 5xIncrease consumer segments: Distinct Count Hadoop to SSASNew Complexity: New technologies: Denali: Apollo, VertiPaq, and Crescent
  • 51. Big Data and Analytics • Later this year • HiveODBC driver • Hadoop-to-SQL/PDW connectors • Hadoop on Windows Azure • Mid-next year • Hadoop on Windows Server BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data 51
  • 52. Complete the Evaluation Form to Win! Win a Dell Mini Netbook – every day – just for submitting your completed form. Each session evaluation form represents a chance to win. Pick up your evaluation form: • In each presentation room • Online on the PASS Summit website Drop off your completed form: • Near the exit of each presentation room • At the Registration desk • Online on the PASS Summit website Sponsored by Dell 52BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data
  • 53. 53BIA-408-A | SQLCAT: Tier-1 BI in the world of Big Data Microsoft SQL Server Clinic Work through your technical issues with SQL Server CSS & get architectural guidance from SQLCAT Microsoft Product Pavilion Talk with Microsoft SQL Server & BI experts to learn about the next version of SQL Server and check out the new Database Consolidation Appliance Expert Pods Meet Microsoft SQL Server Engineering team members & SQL MVPs Hands-on Labs Get experienced through self-paced & instructor-led labs on our cloud based lab platform - bring your laptop or use HP provided hardware Room 611 Expo Hall 6th Floor Lobby Room 618-620
  • 54. October 11-14, Seattle, WA Thank you for attending this session and the 2011 PASS Summit in Seattle

Notes de l'éditeur

  1. Invitation to leave
  2. The number of ad performance factors (i.e. dimensions) and the number of ad impressions per day is huge Yahoo! branded sites attract 680 million unique visitors worldwide 3.5B performance display ad impressions served on Yahoo! exchange per day Large many to many relationships (consumers can be a member of more than one segment) Each consumer is a member of an average of 10 segments – explodes the data by 10x 161B rows per quarter for impression data 203B rows per quarter for segment data (compressed but # of rows processed is really 10x = 2 trillion) Given the number of permutations, query performance needs to be speed of thought or the system is useless Traditional ROLAP is too slow Hundred of dimensions, attributes and metrics create complexity Need integration with good visualization tools to find relevant trends and performance improvement opportunities Data needs to be fresh (from ad impression to query in less than 24 hours) or opportunities are lost Display ad campaigns have very short timeframes (< 2 weeks)
  3. The number of ad performance factors (i.e. dimensions) and the number of ad impressions per day is huge Yahoo! branded sites attract 680 million unique visitors worldwide 3.5B performance display ad impressions served on Yahoo! exchange per day Large many to many relationships (consumers can be a member of more than one segment) Each consumer is a member of an average of 10 segments – explodes the data by 10x 161B rows per quarter for impression data 203B rows per quarter for segment data (compressed but # of rows processed is really 10x = 2 trillion) Given the number of permutations, query performance needs to be speed of thought or the system is useless Traditional ROLAP is too slow Hundred of dimensions, attributes and metrics create complexity Need integration with good visualization tools to find relevant trends and performance improvement opportunities Data needs to be fresh (from ad impression to query in less than 24 hours) or opportunities are lost Display ad campaigns have very short timeframes (< 2 weeks)
  4. Who pays for the sorting?
  5. Like the NYSE, the Yahoo! ad network behaves like an exchange for display advertising Advertisers are the buyers Publishers (web sites) are the sellers (Yahoo! is one of the publishers) Yahoo! needs to create the most efficient exchange as possible
  6. Performance display advertiser requires that we can: Identify the target audience for a campaign Monitor how they behave across a number of different dimensions
  7. Huge opportunity for optimization but difficult given the large number of discrete dimensions
  8. The number of ad performance factors (i.e. dimensions) and the number of ad impressions per day is huge Yahoo! branded sites attract 680 million unique visitors worldwide 3.5B performance display ad impressions served on Yahoo! exchange per day Large many to many relationships (consumers can be a member of more than one segment) Each consumer is a member of an average of 10 segments – explodes the data by 10x 161B rows per quarter for impression data 203B rows per quarter for segment data (compressed but # of rows processed is really 10x = 2 trillion) Given the number of permutations, query performance needs to be speed of thought or the system is useless Traditional ROLAP is too slow Hundred of dimensions, attributes and metrics create complexity Need integration with good visualization tools to find relevant trends and performance improvement opportunities Data needs to be fresh (from ad impression to query in less than 24 hours) or opportunities are lost Display ad campaigns have very short timeframes (< 2 weeks)
  9. Key design concepts are: Use standard, off the shelf parts Loosely coupled components (using a pull architecture) Centralize data aggregation on grid using Hadoop Leverage Oracle’s external table feature to make data available to SSAS with minimal latency One to one match of SASS partitions to Oracle partitions so not aggregation needed & partition pruning enabled (30+ trillion rows in Oracle tables) Maximize parallel loading (90+ threads loading in parallel) Separate cube building from cube querying Improvements in HW/Design 9h -> 2.5h: Change in HW: IBM x3560 M3 256GB RAM, 48 cores; EMC Clariion SAN 2.5h -> 1.25h: Use of Data Direct / Attunity drivers
  10. Cube is complex due to nature of the ad business Need to provide an “anything by anything” query environment to find the optimization opportunities If queries aren’t fast, we lose the value Need to update the cube continuously given that there’s limited time to optimize a display ad campaign (data needs to be updated 4x day at minimum) Used SASS aggregations extensively – cut down on Hadoop aggregations dramatically Only 8 fact tables loaded (4 areas, 1 detail, 1 aggregate) As opposed to an existing ROLAP application at Yahoo! that requires 3,600 facts (aggregate) tables
  11. Doubled the eCPM (revenue) by allowing our campaign managers to “tune” campaign targeting and creatives Drove increase in spend from advertisers since they got better performance by advertising through Yahoo!
  12. Include all Yahoo! network display ads (additional 3.5B ad impressions) – doubles the number of impressions Branded Display Performance Display Increase the number of consumer segments tracked by 5x (from 50 to 256) Add unique user (distinct count) metrics for anything by anything queries Load data into cube directly from Hadoop (skip Oracle load) Leverage SQL Server Denali Vertipaq & Crescent
  13. Like the NYSE, the Yahoo! ad network behaves like an exchange for display advertising Advertisers are the buyers Publishers (web sites) are the sellers (Yahoo! is one of the publishers) Yahoo! needs to create the most efficient exchange as possible