SlideShare a Scribd company logo
1 of 32
Real-time “OLAP” for Big Data (+ use cases)
     Cosmin Lehene | Adobe
     #bigdataro - 30 January 2013




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
What we needed … and built


      OLAP Semantics
      Low Latency Ingestion
      High Throughput
      Real-time Query API




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   2
“Physical” Building Blocks




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   3
Logical Building Blocks


      Dimensions, Metrics
      Aggregations
      Roll-up, drill-down, slicing and dicing, sorting




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   4
OLAP 101 – Queries example




                 Date                           Country                       City            OS        Browser      Sale

                 2012-05-21                     USA                           NY              Windows   FF           0.0

                 2012-05-21                     USA                           NY              Windows   FF           10.0

                 2012-05-22                     USA                           SF              OSX       Chrome       25.0

                 2012-05-22                     Canada                        Ontario         Linux     Chrome       0.0

                 2012-05-23                     USA                           Chicago         OSX       Safari       15.0

                 5 visits,                      2                             4 cities:       3 OS-es   3 browsers   50.0
                 3 days                         countries                     NY: 2           Win: 2    FF: 2        3 sales
                                                USA: 4                        SF: 1           OSX: 2    Chrome:2
                                                Canada: 1



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.               5
OLAP 101 – Queries example

      Rolling up to country level:                                               Country    visits   sales
  SELECT COUNT(visits), SUM(sales)
                                                                                  USA        4        $50
  GROUP BY country
                                                                                  Canada     1        0




      “Slice” by browser                                                         Country   visits sales
  SELECT COUNT(visits), SUM(sales)                                                USA       2         $10
  GROUP BY country
                                                                                  Canada    0         0
  HAVING browser = “FF”

                                                                                  Browser   sales     visits
      Top browsers by sales
  SELECT SUM(sales), COUNT(visits)                                                Chrome    $25       2

  GROUP BY browser                                                                Safari    $15       1
  ORDER BY sales                                                                  FF        $10       2

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   6
OLAP – Runtime Aggregation vs. Pre-aggregation


      Aggregate at runtime                                                      Pre-aggregate
            Most flexible                                                           Fast
            Fast – scatter gather                                                   Efficient – O(1)
            Space efficient                                                         High throughput
      But                                                                       But
            I/O, CPU intensive                                                      More effort to process (latency)
            slow for larger data                                                    Combinatorial explosion (space)
            low throughput                                                          No flexibility




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   7
SaasBase Map




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   8
SaasBase Domain Model Mapping




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   9
SaasBase - Domain Model Mapping




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   10
SaasBase - Ingestion, Processing, Indexing, Querying




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   11
SaasBase - Ingestion, Processing, Indexing, Querying




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   12
Ingestion




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   13
Ingestion(ETL) throughput vs. latency


      Historical data (large batches)
            Optimize for throughput
      Increments (latest data, smaller)
            Optimize for latency




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   14
Processing




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   15
Processing



      Processing involves reading the Input (files, tables, events), pre-
       aggregating it (reducing cardinality) and generating cubes that can be
       queried in real-time


      “Super Processor” code running in Storm, Map-Reduce, HBase




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   16
Processing for OLAP semantics

            GROUP BY (process, query)
            COUNT, SUM, AVG, etc. (process, query)
            SORT (process, query)
            HAVING (mostly query, can define pre-process constraints)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   17
SaasBase vs. SQL Views Comparison




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   18
Query Engine

      Always reads indexed, compact data
      Query parsing
      Scan strategy
            Single vs. multiple scans
            Start/stop rows (prefixes, index positions, etc.)
            Index selection (volatile indexes with incremental processing)
      Deserialization
      Post-aggregation, sorting, fuzzy-sorting etc.
      Paging
      Custom dimension/metric class loading




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   19
Adobe Business Catalyst

      Online business presence: e-commerce, marketing, web analytics etc.
      Use case: Web Analytics (visitors, channels, content, e-
       commerce, campaigns, etc.)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   20
BC - Workflow




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   21
Adobe Business Catalyst - Stats

      3 active datacenters
      Raw data ~6TB (from ~1TB 18 months ago)
      Visits table: ~1TB each(compressed)
      OLAP cubes (stats): 49GB – 64GB (compressed)


      ~30 minutes latency (from actual pageview/sale to chart in UI)
      10s – 100s of milliseconds latency for queries
      ~3000/s max concurrent OLAP queries (actual traffic is much lower)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   22
Adobe Pass for TV Everywhere

      Authentication & Authorization
      Single sign-on to Programmer content (e.g.
       Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g.
       Comcast, Dish, etc.)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   23
Adobe Pass – Use Case

      Analytics use case: Operational metrics (users, devices, latencies, etc.)
      Real-time ingestion in HBase
      High Frequency Map Reduce jobs (every 2 minutes)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   24
Adobe Pass - Stats (London Olympics 2012)

      67M streams ~ 5.3M hours
      1.5M concurrent streams
      > 7M unique users


      1 Technical & Engineering Emmy Award ;)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   25
Adobe Primetime – Real-time Video Analytics

      Unified video platform (acquisition, transcoding, broadcast, ads,
       analytics)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   26
Adobe Primetime – Use Case


      Use Cases:
            Audience metrics – minutes latency ok
            Ads metrics – seconds to minutes ok
            Streaming QoS metrics – seconds must


      Requirements:
            Massive throughput (millions of streams, multiple
             heartbeats every 10 seconds)
            Low latency (end-to-end)


© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   27
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   28
Conclusions

      OLAP semantics on a simple data model
            Data as first class citizen
            Domain Specific “Language” for Dimensions, Metrics, Aggregations
      Framework for vertical analytics systems
      Tunable performance, resource allocation




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   29
Thank you!
                                                            Cosmin Lehene @clehene

                                                            http://hstack.org



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   30
Related

  http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/
  http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   31
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

More Related Content

What's hot

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDBMongoDB
 
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기Jongwon Han
 
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RACAIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RACSandesh Rao
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Amazon Web Services Korea
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraAmazon Web Services
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon Web Services Korea
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB
 
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016Amazon Web Services Korea
 
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...Amazon Web Services Korea
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 

What's hot (20)

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Sqoop
SqoopSqoop
Sqoop
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
 
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
 
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RACAIOUG-GroundBreakers-Jul 2019 - 19c RAC
AIOUG-GroundBreakers-Jul 2019 - 19c RAC
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
 
Amazon EBS: Deep Dive
Amazon EBS: Deep DiveAmazon EBS: Deep Dive
Amazon EBS: Deep Dive
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
 
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
나에게 맞는 AWS 데이터베이스 서비스 선택하기 :: 양승도 :: AWS Summit Seoul 2016
 
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 

Viewers also liked

Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubesmister_zed
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseDataWorks Summit
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Cosmin Lehene
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Ziemowit Jankowski
 
Technical product manager
Technical product managerTechnical product manager
Technical product managerMark Long
 
Lotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & ArchitectureLotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & Architectureddrschiw
 
Building Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductBuilding Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductStacy Vicknair
 
Algorithm - Introduction
Algorithm - IntroductionAlgorithm - Introduction
Algorithm - IntroductionMadhu Bala
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-dataBryan Jacobs
 
Introduction To Algorithm [2]
Introduction To Algorithm [2]Introduction To Algorithm [2]
Introduction To Algorithm [2]ecko_disasterz
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
 
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld
 

Viewers also liked (20)

Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
Technical product manager
Technical product managerTechnical product manager
Technical product manager
 
docker
dockerdocker
docker
 
Core Management - Task 1
Core Management - Task 1Core Management - Task 1
Core Management - Task 1
 
Lotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & ArchitectureLotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & Architecture
 
Building Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductBuilding Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software Product
 
IEA DSM Task 24 Transport Panel at BECC conference
IEA DSM Task 24 Transport Panel at BECC conferenceIEA DSM Task 24 Transport Panel at BECC conference
IEA DSM Task 24 Transport Panel at BECC conference
 
Algorithm - Introduction
Algorithm - IntroductionAlgorithm - Introduction
Algorithm - Introduction
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
 
Introduction To Algorithm [2]
Introduction To Algorithm [2]Introduction To Algorithm [2]
Introduction To Algorithm [2]
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic Algortihms
 
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
 

Similar to Real-time OLAP Big Data Use Cases

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeCloudera, Inc.
 
Xebia adobe flash mobile applications
Xebia adobe flash mobile applicationsXebia adobe flash mobile applications
Xebia adobe flash mobile applicationsMichael Chaize
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentMichael Chaize
 
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeMonitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeIcinga
 
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...François Le Droff
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRailswebuploader
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrencyConcurrency, Inc.
 
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoServerless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoAmazon Web Services
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDAmazon Web Services
 
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017Amazon Web Services
 
Adobe jax2010 1_dashboard
Adobe jax2010 1_dashboardAdobe jax2010 1_dashboard
Adobe jax2010 1_dashboardguest9776673
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016AdobeMarketingCloud
 
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentStrengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentCraig Randall
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongSpiffy
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
The Open PaaS Stack
The Open PaaS StackThe Open PaaS Stack
The Open PaaS StackGuy Korland
 

Similar to Real-time OLAP Big Data Use Cases (20)

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
Xebia adobe flash mobile applications
Xebia adobe flash mobile applicationsXebia adobe flash mobile applications
Xebia adobe flash mobile applications
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
 
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeMonitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
 
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRails
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrency
 
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoServerless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
 
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
 
Adobe jax2010 1_dashboard
Adobe jax2010 1_dashboardAdobe jax2010 1_dashboard
Adobe jax2010 1_dashboard
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016
 
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentStrengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan Wong
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
The Open PaaS Stack
The Open PaaS StackThe Open PaaS Stack
The Open PaaS Stack
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Real-time OLAP Big Data Use Cases

  • 1. Real-time “OLAP” for Big Data (+ use cases) Cosmin Lehene | Adobe #bigdataro - 30 January 2013 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 2. What we needed … and built  OLAP Semantics  Low Latency Ingestion  High Throughput  Real-time Query API © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
  • 3. “Physical” Building Blocks © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
  • 4. Logical Building Blocks  Dimensions, Metrics  Aggregations  Roll-up, drill-down, slicing and dicing, sorting © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
  • 5. OLAP 101 – Queries example Date Country City OS Browser Sale 2012-05-21 USA NY Windows FF 0.0 2012-05-21 USA NY Windows FF 10.0 2012-05-22 USA SF OSX Chrome 25.0 2012-05-22 Canada Ontario Linux Chrome 0.0 2012-05-23 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
  • 6. OLAP 101 – Queries example  Rolling up to country level: Country visits sales SELECT COUNT(visits), SUM(sales) USA 4 $50 GROUP BY country Canada 1 0  “Slice” by browser Country visits sales SELECT COUNT(visits), SUM(sales) USA 2 $10 GROUP BY country Canada 0 0 HAVING browser = “FF” Browser sales visits  Top browsers by sales SELECT SUM(sales), COUNT(visits) Chrome $25 2 GROUP BY browser Safari $15 1 ORDER BY sales FF $10 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
  • 7. OLAP – Runtime Aggregation vs. Pre-aggregation  Aggregate at runtime  Pre-aggregate  Most flexible  Fast  Fast – scatter gather  Efficient – O(1)  Space efficient  High throughput  But  But  I/O, CPU intensive  More effort to process (latency)  slow for larger data  Combinatorial explosion (space)  low throughput  No flexibility © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
  • 8. SaasBase Map © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
  • 9. SaasBase Domain Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
  • 10. SaasBase - Domain Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
  • 11. SaasBase - Ingestion, Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
  • 12. SaasBase - Ingestion, Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
  • 13. Ingestion © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
  • 14. Ingestion(ETL) throughput vs. latency  Historical data (large batches)  Optimize for throughput  Increments (latest data, smaller)  Optimize for latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
  • 15. Processing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
  • 16. Processing  Processing involves reading the Input (files, tables, events), pre- aggregating it (reducing cardinality) and generating cubes that can be queried in real-time  “Super Processor” code running in Storm, Map-Reduce, HBase © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
  • 17. Processing for OLAP semantics  GROUP BY (process, query)  COUNT, SUM, AVG, etc. (process, query)  SORT (process, query)  HAVING (mostly query, can define pre-process constraints) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
  • 18. SaasBase vs. SQL Views Comparison © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
  • 19. Query Engine  Always reads indexed, compact data  Query parsing  Scan strategy  Single vs. multiple scans  Start/stop rows (prefixes, index positions, etc.)  Index selection (volatile indexes with incremental processing)  Deserialization  Post-aggregation, sorting, fuzzy-sorting etc.  Paging  Custom dimension/metric class loading © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
  • 20. Adobe Business Catalyst  Online business presence: e-commerce, marketing, web analytics etc.  Use case: Web Analytics (visitors, channels, content, e- commerce, campaigns, etc.) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
  • 21. BC - Workflow © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
  • 22. Adobe Business Catalyst - Stats  3 active datacenters  Raw data ~6TB (from ~1TB 18 months ago)  Visits table: ~1TB each(compressed)  OLAP cubes (stats): 49GB – 64GB (compressed)  ~30 minutes latency (from actual pageview/sale to chart in UI)  10s – 100s of milliseconds latency for queries  ~3000/s max concurrent OLAP queries (actual traffic is much lower) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
  • 23. Adobe Pass for TV Everywhere  Authentication & Authorization  Single sign-on to Programmer content (e.g. Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g. Comcast, Dish, etc.) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
  • 24. Adobe Pass – Use Case  Analytics use case: Operational metrics (users, devices, latencies, etc.)  Real-time ingestion in HBase  High Frequency Map Reduce jobs (every 2 minutes) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
  • 25. Adobe Pass - Stats (London Olympics 2012)  67M streams ~ 5.3M hours  1.5M concurrent streams  > 7M unique users  1 Technical & Engineering Emmy Award ;) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
  • 26. Adobe Primetime – Real-time Video Analytics  Unified video platform (acquisition, transcoding, broadcast, ads, analytics) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
  • 27. Adobe Primetime – Use Case  Use Cases:  Audience metrics – minutes latency ok  Ads metrics – seconds to minutes ok  Streaming QoS metrics – seconds must  Requirements:  Massive throughput (millions of streams, multiple heartbeats every 10 seconds)  Low latency (end-to-end) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
  • 28. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
  • 29. Conclusions  OLAP semantics on a simple data model  Data as first class citizen  Domain Specific “Language” for Dimensions, Metrics, Aggregations  Framework for vertical analytics systems  Tunable performance, resource allocation © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
  • 30. Thank you! Cosmin Lehene @clehene http://hstack.org © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
  • 31. Related http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/ http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
  • 32. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Editor's Notes

  1. How many HBase users?
  2. Data as first class citizen
  3. Add the real building blocks HDFS, MapReduce, Hbase Storm
  4. Add the real building blocks HDFS, MapReduce, Hbase Storm
  5. Check contrast on projector
  6. Two approaches RDBMS / .OLAP
  7. Dimensions – readtransformserializedeserialize data attributesMetrics – read/transform/aggregate/serializeConstraints: ingestion filteringReport: instrument dimensions groups + metrics with aggregations, sorting
  8. QUERY ENGINE -> INDEX(always realtime)What’s the difference between this and HIVE/PIG/Impala
  9. Process = aggregate,generate indexes (natural)Query = uses indexes, can do extra aggregation
  10. LEFT: report definition, NOT a QUERYLIKE A VIEW - CREATED - THEN QUERIED
  11. >100K/sec/threadREALTIME
  12. ~12 hours to reprocess everything from scratch
  13. 2 datacenters (active-failover) on US West and East coasts (2NN + 19DN, 0.5PB total, 456 cores, 1.1TB RAM)
  14. ----- Meeting Notes (1/29/13 18:09) -----OlympicsSame SaasBase codebase running in Storm instead of HadoopSimpler aggregations, but strict latency requirements
  15. ----- Meeting Notes (1/29/13 18:12) -----draw line between player and chart
  16. Data analysts work with familiar concepts----- Meeting Notes (1/29/13 18:12) -----Future:
  17. …….