SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Real Time Analytics
at Uber: Bring SQL
into Everything
Zhenxiao Luo
NYC
Uber’s mission is to
ignite opportunity by
setting the world in
motion.
15M
Trips/Day
600+
Cities
75M
Monthly Riders
Data informs every decision at the company
Overview of Uber’s Data Platform
DATA SOURCES
RAW DATA
MODELED TABLES
MINING BUSINESS
INSIGHTS
CONSUMING BUSINESS INSIGHTS
EXPERIMENTATION
DATA SCIENCE
MACHINE
LEARNING
CUSTOM DATA SETS
Dashboarding
Alerting
Monitoring
Data Exploration
Knowledge Bases
Storage
Infrastructure
ETL Frameworks
Data Integrity
Query Engines
Kafka
Uber Data Infrastructure
Schemaless
MySQL,
Postgres
Vertica
Streamio
Raw
Data
Raw
Tables
Sqoop
Reports
Hadoop
Hive Presto Spark
Notebook Ad Hoc Queries
Real Time
Applications
Machine
Learning Jobs
Business
Intelligence Jobs
Cluster
Management
All-Active
Observability
Security
Vertica
Samza
Pinot
Flink
AresDB
Modeled
Tables
Streaming
Warehouse
Real-time
Presto @ Uber-scale
5KWeekly Active Users
160KQueries/day
3Data Centers
2KNodes
700MHDFS files read/day
10PBHDFS files
processed/day
Presto use cases at Uber
Growth Marketing
Data Science
Marketplace
Pricing
Community
Operations
Data Quality
Ad-hoc Querying
The people who rely on us
Technical
Skills
Data Scientists
Software Engineers
ML/AI Researchers
Advanced SQL
Advanced Statistics
Scala/Spark, Python/R
Data Modeling
Inventor Ivan
Marketing Managers
Entry-level Analysts
General Managers
Product Managers
Limited SQL
Spreadsheets
Reliant Rebecca
City Operations
Regional Managers
Intermediate SQL
Spreadsheets
Dashboarding
Monitoring Matt
Operations Managers
Data Analysts
Product Analysts
Advanced SQL
Spreadsheets
Limited Statistics
Limited Python/R
Analyst Anna
Exploratory ML &
model-training
Data Scientists ML ResearchersEngineers
Using ML to ensure data
security and compliance
Advanced data
science &
complex analytics
Data Scientists Ops Analysts Support Agents
Surfacing hidden insights
to empower restaurants
Business process
automation
S&P AnalystsOps Managers Contractors
Using technology to make
transportation safer
What is Presto: Interactive SQL
Engine for Big Data
Interactive query speeds
Horizontally scalable
ANSI SQL
Battle-tested by Facebook, Uber, Linkedin, Twitter, Netflix, Airbnb, etc
Completely open source
Access to petabytes of data in the Hadoop, Elasticsearch, Pinot, etc.
How Presto Works
Why Presto is Fast
● Data in memory during execution
● Pipelining and streaming
● Columnar storage & execution
● Bytecode generation
Resource Management
● Presto has its own resource manager
○ Not on YARN
○ Not on Mesos
● CPU Management
○ Priority queues
○ Short running queries higher priority
● Memory Management
○ Max memory per query per node
○ If query exceeds max memory limit, query fails
○ No OutOfMemory in Presto process
Presto Connectors:
No Need to Copy Data
Uber Contributions
Contributions
New Features
● Geospatial indexing and operations - 10x or more speedup
● Pinot connector enhancements (in-house)
Optimizations
● Elasticsearch connector
● New Parquet reader - 4x speedup
● Nested column pushdowns (project, predicate) - 10x speedup
Security
● Metastore authentication support for Kerberos deployments
● Dispatch Proxy using HTTP redirect for multi-cluster operation
Presto Connector Interface
● ConnectorMetadata
○ Schema, Table, Column
● ConnectorSplitManager
○ Divide data into splits
● ConnectorSplit
○ Split data range
○ Predicate/JsonFunction/Limit pushdown
● ConnectorRecordCursor
○ Transform underlying storage data into Presto internal
page/block
Presto Elasticsearch Connector
Data Model
● each Elasticsearch index is a table partition
● each field of an index is a column
● all Elasticsearch indexes sharing the same prefix
consist a logical table
○ Es-vehicles-sjc1, es-vehicles-dca1, es-vehicles
Describe Table
Query
Optimizations
● Parallel Reads
○ Get all indices and search nodes
○ For each search node, send request for one specific index
● Cap Max Hits
● Predicate Pushdown
● Json Function Pushdown
● Limit Pushdown
● Nested Fields
How many Uber trip
requests did we serve
in Chicago yesterday?
Fetch daily trip count in seconds
SELECT T.base.city_id AS cid,
Count(CASE WHEN T.base.status = 'completed' THEN 1 END) AS
completed_trips,
Count(CASE WHEN T.base.status = 'canceled' THEN 1 END) AS
rider_canceled_trips
FROM trips AS T
WHERE T.datestr = '2019-03-11'
GROUP BY 1
Column Chunk
base.client_uuid
Column Chunk
base.driver_uuid
Column Chunk
base.status
Column Chunk
base.vehicle_id
Column Chunk
base.city_idRow Group
Column Chunk
base.client_uuid
Column Chunk
base.driver_uuid
Column Chunk
base.city_id
Column Chunk
base.vehicle_id
Column Chunk
base.statusRow Group
Parquet
Parquet Footer: File Metadata, Row Group Metadata
Step 1: Read all Parquet nested fields from disk
base.driver_uuid base.client_uuid base.city_id …... base.vehicle_id base.status
base.driver_uuid
base.driver_uuid
base.driver_uuid
base.driver_uuid base.client_uuid base.city_id …... base.vehicle_id base.status
Presto Columnar Engine
Step 2: Transform Parquet rows into Presto columnar blocks
Step 3: Evaluate predicates on columnar blocks
base.client_uuid
base.client_uuid
base.client_uuid
base.city_id
base.city_id
base.city_id
base.vehicle_id
base.vehicle_id
base.vehicle_id
base.status
base.status
base.status
….
Default Apache Parquet Reader
Column Chunk
base.client_uuid
Column Chunk
base.driver_uuid
Column Chunk
base.status
Column Chunk
base.vehicle_id
Column Chunk
base.city_id
Row Group
Column Chunk
base.client_uuid
Column Chunk
base.driver_uuid
Column Chunk
base.city_id
Column Chunk
base.vehicle_id
Column Chunk
base.status
Row Group
Parquet Footer: File Metadata, Row Group Metadata
Step 1: Read ONLY Required nested fields from disk
Presto Columnar Engine
Apache Parquet Reader Optimization
base.driver_uuid
base.driver_uuid
base.driver_uuid
base.city_id
base.city_id
base.city_id
Step 1: Read ONLY Required nested fields from disk
Evaluate predicates on the fly:
Skip reading row group;
predicate: base.city_id = 12
dictionary: base.city_id: {3,
5, 9, 14, 21}
Build columnar blocks only
for predicate matches
Step 2. Build columnar blocks on the fly
base.driver_uuid
base.driver_uuid
base.driver_uuid
Step 3: Evaluate predicates on columnar blocks
Parquet
Results
Looking forward
Federated SQL Layer
Vision
HDFS
VerticaElasticsearch
Apache
Pinot
MySQL
Machines
Reports
Users
Presto RealTime Presto
Proxy layer
Management
Universal
Metadata
Service
Focus areas
Connectors
● Apache Hive, Apache Pinot, Elasticsearch, Apache Cassandra, Vertica, MySQL, etc
● Aggregation / Join pushdown
● Cross-connector optimizations (hybrid connectors)
Real-time
● Real-time mode with low latency pass through
● Query plan / result / data cache
● Time-series joins and stitching
Universal Metadata Service (UMS)
● Logical definitions / physical schemas
● Column stitching and joins
● Table and partition caching
Thank you
Proprietary © 2018 Uber Technologies, Inc. All rights reserved. No part of this
document may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any
information storage or retrieval systems, without permission in writing from
Uber. This document is intended only for the use of the individual or entity to
whom it is addressed. All recipients of this document are notified that the
information contained herein includes proprietary information of Uber, and
recipient may not make use of, disseminate, or in any way disclose this
document or any of the enclosed information to any person other than
employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.

Contenu connexe

Tendances

Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflake
SANG WON PARK
 
AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나
AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나
AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나
Amazon Web Services Korea
 

Tendances (20)

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영[215]네이버콘텐츠통계서비스소개 김기영
[215]네이버콘텐츠통계서비스소개 김기영
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
엘라스틱 서치 세미나
엘라스틱 서치 세미나엘라스틱 서치 세미나
엘라스틱 서치 세미나
 
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유 (2부)
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유 (2부)[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유 (2부)
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유 (2부)
 
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
BigQuery의 모든 것(기획자, 마케터, 신입 데이터 분석가를 위한) 입문편
BigQuery의 모든 것(기획자, 마케터, 신입 데이터 분석가를 위한) 입문편BigQuery의 모든 것(기획자, 마케터, 신입 데이터 분석가를 위한) 입문편
BigQuery의 모든 것(기획자, 마케터, 신입 데이터 분석가를 위한) 입문편
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflakeCloud DW technology trends and considerations for enterprises to apply snowflake
Cloud DW technology trends and considerations for enterprises to apply snowflake
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
로그 기깔나게 잘 디자인하는 법
로그 기깔나게 잘 디자인하는 법로그 기깔나게 잘 디자인하는 법
로그 기깔나게 잘 디자인하는 법
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나
AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나
AWS 기반 데이터 레이크(Datalake) 구축 및 분석 - 김민성 (AWS 솔루션즈아키텍트) : 8월 온라인 세미나
 

Similaire à Real time analytics at uber @ strata data 2019

ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch Council
Sunita Shrivastava
 

Similaire à Real time analytics at uber @ strata data 2019 (20)

Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019Real time analytics on deep learning @ strata data 2019
Real time analytics on deep learning @ strata data 2019
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Presto
PrestoPresto
Presto
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
ALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch CouncilALM Search Presentation for the VSS Arch Council
ALM Search Presentation for the VSS Arch Council
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
The Roadmap for SQL Server 2019
The Roadmap for SQL Server 2019The Roadmap for SQL Server 2019
The Roadmap for SQL Server 2019
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
EDB Postgres in DBaaS & Container Platforms
EDB Postgres in DBaaS & Container PlatformsEDB Postgres in DBaaS & Container Platforms
EDB Postgres in DBaaS & Container Platforms
 
Neo4j Database and Graph Platform Overview
Neo4j Database and Graph Platform OverviewNeo4j Database and Graph Platform Overview
Neo4j Database and Graph Platform Overview
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 

Plus de Zhenxiao Luo (9)

Presto Elasticsearch Connector at Presto Summit
Presto Elasticsearch Connector at Presto SummitPresto Elasticsearch Connector at Presto Summit
Presto Elasticsearch Connector at Presto Summit
 
Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks Summit
 
Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017Presto GeoSpatial @ Strata New York 2017
Presto GeoSpatial @ Strata New York 2017
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
Presto@Uber
Presto@UberPresto@Uber
Presto@Uber
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 

Dernier

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
ellan12
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Sheetaleventcompany
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
SofiyaSharma5
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 

Dernier (20)

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 

Real time analytics at uber @ strata data 2019

  • 1. Real Time Analytics at Uber: Bring SQL into Everything Zhenxiao Luo
  • 2. NYC Uber’s mission is to ignite opportunity by setting the world in motion. 15M Trips/Day 600+ Cities 75M Monthly Riders
  • 3. Data informs every decision at the company
  • 4. Overview of Uber’s Data Platform DATA SOURCES RAW DATA MODELED TABLES MINING BUSINESS INSIGHTS CONSUMING BUSINESS INSIGHTS EXPERIMENTATION DATA SCIENCE MACHINE LEARNING CUSTOM DATA SETS Dashboarding Alerting Monitoring Data Exploration Knowledge Bases Storage Infrastructure ETL Frameworks Data Integrity Query Engines
  • 5. Kafka Uber Data Infrastructure Schemaless MySQL, Postgres Vertica Streamio Raw Data Raw Tables Sqoop Reports Hadoop Hive Presto Spark Notebook Ad Hoc Queries Real Time Applications Machine Learning Jobs Business Intelligence Jobs Cluster Management All-Active Observability Security Vertica Samza Pinot Flink AresDB Modeled Tables Streaming Warehouse Real-time
  • 6. Presto @ Uber-scale 5KWeekly Active Users 160KQueries/day 3Data Centers 2KNodes 700MHDFS files read/day 10PBHDFS files processed/day
  • 7. Presto use cases at Uber Growth Marketing Data Science Marketplace Pricing Community Operations Data Quality Ad-hoc Querying
  • 8. The people who rely on us Technical Skills Data Scientists Software Engineers ML/AI Researchers Advanced SQL Advanced Statistics Scala/Spark, Python/R Data Modeling Inventor Ivan Marketing Managers Entry-level Analysts General Managers Product Managers Limited SQL Spreadsheets Reliant Rebecca City Operations Regional Managers Intermediate SQL Spreadsheets Dashboarding Monitoring Matt Operations Managers Data Analysts Product Analysts Advanced SQL Spreadsheets Limited Statistics Limited Python/R Analyst Anna
  • 9. Exploratory ML & model-training Data Scientists ML ResearchersEngineers Using ML to ensure data security and compliance
  • 10. Advanced data science & complex analytics Data Scientists Ops Analysts Support Agents Surfacing hidden insights to empower restaurants
  • 11. Business process automation S&P AnalystsOps Managers Contractors Using technology to make transportation safer
  • 12. What is Presto: Interactive SQL Engine for Big Data Interactive query speeds Horizontally scalable ANSI SQL Battle-tested by Facebook, Uber, Linkedin, Twitter, Netflix, Airbnb, etc Completely open source Access to petabytes of data in the Hadoop, Elasticsearch, Pinot, etc.
  • 14. Why Presto is Fast ● Data in memory during execution ● Pipelining and streaming ● Columnar storage & execution ● Bytecode generation
  • 15. Resource Management ● Presto has its own resource manager ○ Not on YARN ○ Not on Mesos ● CPU Management ○ Priority queues ○ Short running queries higher priority ● Memory Management ○ Max memory per query per node ○ If query exceeds max memory limit, query fails ○ No OutOfMemory in Presto process
  • 18. Contributions New Features ● Geospatial indexing and operations - 10x or more speedup ● Pinot connector enhancements (in-house) Optimizations ● Elasticsearch connector ● New Parquet reader - 4x speedup ● Nested column pushdowns (project, predicate) - 10x speedup Security ● Metastore authentication support for Kerberos deployments ● Dispatch Proxy using HTTP redirect for multi-cluster operation
  • 19. Presto Connector Interface ● ConnectorMetadata ○ Schema, Table, Column ● ConnectorSplitManager ○ Divide data into splits ● ConnectorSplit ○ Split data range ○ Predicate/JsonFunction/Limit pushdown ● ConnectorRecordCursor ○ Transform underlying storage data into Presto internal page/block
  • 21. Data Model ● each Elasticsearch index is a table partition ● each field of an index is a column ● all Elasticsearch indexes sharing the same prefix consist a logical table ○ Es-vehicles-sjc1, es-vehicles-dca1, es-vehicles
  • 23. Query
  • 24. Optimizations ● Parallel Reads ○ Get all indices and search nodes ○ For each search node, send request for one specific index ● Cap Max Hits ● Predicate Pushdown ● Json Function Pushdown ● Limit Pushdown ● Nested Fields
  • 25. How many Uber trip requests did we serve in Chicago yesterday?
  • 26. Fetch daily trip count in seconds SELECT T.base.city_id AS cid, Count(CASE WHEN T.base.status = 'completed' THEN 1 END) AS completed_trips, Count(CASE WHEN T.base.status = 'canceled' THEN 1 END) AS rider_canceled_trips FROM trips AS T WHERE T.datestr = '2019-03-11' GROUP BY 1
  • 27. Column Chunk base.client_uuid Column Chunk base.driver_uuid Column Chunk base.status Column Chunk base.vehicle_id Column Chunk base.city_idRow Group Column Chunk base.client_uuid Column Chunk base.driver_uuid Column Chunk base.city_id Column Chunk base.vehicle_id Column Chunk base.statusRow Group Parquet Parquet Footer: File Metadata, Row Group Metadata Step 1: Read all Parquet nested fields from disk base.driver_uuid base.client_uuid base.city_id …... base.vehicle_id base.status base.driver_uuid base.driver_uuid base.driver_uuid base.driver_uuid base.client_uuid base.city_id …... base.vehicle_id base.status Presto Columnar Engine Step 2: Transform Parquet rows into Presto columnar blocks Step 3: Evaluate predicates on columnar blocks base.client_uuid base.client_uuid base.client_uuid base.city_id base.city_id base.city_id base.vehicle_id base.vehicle_id base.vehicle_id base.status base.status base.status …. Default Apache Parquet Reader
  • 28. Column Chunk base.client_uuid Column Chunk base.driver_uuid Column Chunk base.status Column Chunk base.vehicle_id Column Chunk base.city_id Row Group Column Chunk base.client_uuid Column Chunk base.driver_uuid Column Chunk base.city_id Column Chunk base.vehicle_id Column Chunk base.status Row Group Parquet Footer: File Metadata, Row Group Metadata Step 1: Read ONLY Required nested fields from disk Presto Columnar Engine Apache Parquet Reader Optimization base.driver_uuid base.driver_uuid base.driver_uuid base.city_id base.city_id base.city_id Step 1: Read ONLY Required nested fields from disk Evaluate predicates on the fly: Skip reading row group; predicate: base.city_id = 12 dictionary: base.city_id: {3, 5, 9, 14, 21} Build columnar blocks only for predicate matches Step 2. Build columnar blocks on the fly base.driver_uuid base.driver_uuid base.driver_uuid Step 3: Evaluate predicates on columnar blocks Parquet
  • 31. Federated SQL Layer Vision HDFS VerticaElasticsearch Apache Pinot MySQL Machines Reports Users Presto RealTime Presto Proxy layer Management Universal Metadata Service
  • 32. Focus areas Connectors ● Apache Hive, Apache Pinot, Elasticsearch, Apache Cassandra, Vertica, MySQL, etc ● Aggregation / Join pushdown ● Cross-connector optimizations (hybrid connectors) Real-time ● Real-time mode with low latency pass through ● Query plan / result / data cache ● Time-series joins and stitching Universal Metadata Service (UMS) ● Logical definitions / physical schemas ● Column stitching and joins ● Table and partition caching
  • 33. Thank you Proprietary © 2018 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed. All recipients of this document are notified that the information contained herein includes proprietary information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.