SlideShare une entreprise Scribd logo
1  sur  20
Enterprise Data Platform and role
of Apache Spark
Seshu Adunuthula (@SeshuAd)
78%
of GMB is
fixed price
42%
GMV VIA MOBILE
291M
MOBILE APP DOWNLOADS
GLOBALLY
1.3B
LISTINGS CREATED
VIA MOBILE EVERY WEEK
162M
ACTIVE BUYERS
25M
ACTIVE SELLERS
800M
ACTIVE LISTINGS
8.8M
NEW LISTINGS
EVERY WEEK
Q4 2015 Q4 2015
eBay Marketplace
*Q3 2015 data
Items sold are
new
79%
Items are fixed
price
84%
Evolving from our Auction Roots..
Transactions offer
free shipping
63%
Data is eBay’s
Most Important
Asset
Data Sciences
Personalization
User propensity modeling based on 5 quarters
behavioral data. Cluster based unstructured data.
User (Badges, Activity Synopsis, Word Cloud),
Deals Personalization
Merchandising
Similar items - recommending similar items on
key placements on site and mobile. Powered
among other things by co-clicked items.
Structured Data
Deal discovery leverages machine learning
model.
Trust
Predictive machine learning models for fraud
prevention, account take-over, prevent loss, bad
buying experience prevention, buyer/seller risk
prediction
Shipping
Delivery experience: Building a model to predict
more accurate and shorter delivery estimates.
BI & Analytical Apps
Search Backend
950M items/ ~15TB indexes in 2.5 hours on a
daily basis.
2M item/11TB index subsets generated near
real-time in 3 minutes
Search Sciences
Search ranking/spam/recall factors or data (like
phrase table, query-rewrite table, etc.)
preparation, including many pipelines built on
top of search Scala platform
s
Structured Data
Convert 800M Item listings into Product pages
that are automatically curated and persisted
Traffic – Paid Search
10% efficiency lift for paid search (Amber model)
Identify low performing items (Google search)
Data Preparation.
Bot detection, data transformation, sessionization
for user behavior and core data sets
Experimentation
A/B Testing for new features and user
Experiences released on the site.
Behavioral Data
Search
SPD: Provides global B2C and C2C seller
performance overview
Nous, DNA: Provide self service product
experiences analysis on product health,
behavior analytics and product domain specific
reporting
System Services
Sherlock Monitoring: Applications logs (CAL
logs) are stored and processed on Hadoop.
Data is eBay’s DNA
Enterprise Data Platform
ENTERPRISE DATA PLATFORM
9
Agile Data Warehouse Data Streams
Batch
Humans
Sets of data
Streams
Systems
Sets of data
Data Services
Services
Applications
Specific calls
Populated
Used by
How
Enterprise
Populated
Used by
How
Enterprise Data Platform
Structured Data
SQL
Interactive Relational Analysis
Semi Structured Data
Java/Scala…
Batch and Ad-Hoc Algorithmic Analysis
Relational Analysis
Programmatic Exploration
EDW Hadoop
10
Analysts/BU PM/Executives/Tools Analysts/Scientists/Tools Scientists/Engineers
Agile Data Warehouse
11
Simplify Access to DW
Cross Platform VDMs
Geo Distributed Caches
Data Virtualization
Apache Kylin: Extreme OLAP Engine
12
Cube Build
Engine
SQL
Low Latency -
Seconds
Mid Latency -
Minutes
Routing
3rd Party App
(Web App, Mobile…)
Metadata
SQL-Based Tool
(BI Tools: Tableau…)
Query Engine
Hadoop
Hive
REST API JDBC/ODBC
Star Schema Data Key Value Data
Data
Cube
OLAP
Cube
(HBase)
SQL
REST Server
MOLAP Cubes
ANSI SQL on Hadoop
Interactive Query on Billions
of Rows
Apache Kylin: Extreme OLAP Engine
Data Streams Platform
Apache Kafka
Behavior TXN User
Streaming
Apps
Sand
box
Stream Processing ClustersOpen
Sample
Streams Needs
Approval
Staging Pool1 Pool2
Rheos App
Manager
Configuration
Deployment
GitHub
Tor
a
ETL
Connectors
Hadoop Teradat
a
Pool3
Risk Real-time Representation
Of EDW Datasets
Centralized Shared Data
Streams
DW populated using the
Streams Platform
Data Streams Platform
Q3 2015
DataSearch&Discovery
CollaborativeAnalytics
DataGovernance
Data ServicesData Services
•Execution environment tailored for
Spark
•Governance of Big Data Apps with
a PaaS layer
•Application life cycle spanning
development to deployment
Spades – Spark Provisioning & Deployment Environment
Data Pipelines
Ingest
Servers
Listeners
AnalystsAnalytics Platform & DeliveryCAL
Application
Server
eBay Visitors
Application
servers
SingularityHadoopCentral App
Logging
BEHAVIORAL DATA PIPELINEBehavioral Data Pipeline
Behavior Data: A/B Testing
18
Behavioral Data: A/B Testing
Transactional Data Pipelines
19
• Initial pattern
• More data available faster on Hadoop
• Leverage Hadoop SQL / Spark
• Subsequent pattern
• >10% datasets available on Hadoop
• New innovation avenues via OpenSource
• Give humans more Teradata capacity
• Teradata data available no later than before
• <95% extract / daily batch
• >5% stream / frequent batch
Transactional Data Pipelines
Spark Summit Keynote by Seshu Adunuthula

Contenu connexe

Tendances

Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSAP Technology
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a StreamDatabricks
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at RestInternap
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platformdataeaze systems
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksDatabricks
 
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Spark Summit
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekDatabricks
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Databricks
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemDataWorks Summit
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingDatabricks
 
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksUnlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksDatabricks
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Databricks
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDataWorks Summit
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakePat Patterson
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemDatabricks
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
 

Tendances (20)

Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business Operations
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Data in Motion vs Data at Rest
Data in Motion vs Data at RestData in Motion vs Data at Rest
Data in Motion vs Data at Rest
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
 
Loan Decisioning Transformation
Loan Decisioning TransformationLoan Decisioning Transformation
Loan Decisioning Transformation
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
 
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksUnlocking Geospatial Analytics Use Cases with CARTO and Databricks
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data Lake
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 

En vedette

Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015Debashis Saha
 
eBay's private Cloud Journey
eBay's private Cloud JourneyeBay's private Cloud Journey
eBay's private Cloud JourneySuneet Nandwani
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Kodu Game Lab e Project Spark
Kodu Game Lab e Project SparkKodu Game Lab e Project Spark
Kodu Game Lab e Project SparkFabrício Catae
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandSpark Summit
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosSpark Summit
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Spark Summit
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudJen Aman
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Всем плевать на ваш дизайн
Всем плевать на ваш дизайнВсем плевать на ваш дизайн
Всем плевать на ваш дизайнAlexander Kirov
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
Guia de estudio segundo parcial (tercera parte)
Guia de estudio segundo parcial (tercera parte)Guia de estudio segundo parcial (tercera parte)
Guia de estudio segundo parcial (tercera parte)Ariel Aranda
 
variational bayes in biophysics
variational bayes in biophysicsvariational bayes in biophysics
variational bayes in biophysicschris wiggins
 
Dunbar's Number and Your B2B Relationships
Dunbar's Number and Your B2B RelationshipsDunbar's Number and Your B2B Relationships
Dunbar's Number and Your B2B RelationshipsViabl
 
Презентация НИС коммерция
Презентация НИС коммерцияПрезентация НИС коммерция
Презентация НИС коммерцияОльга Осипова
 
Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)njbartlett
 
A Digital Bill of Rights for the Internet, by the Internet
A Digital Bill of Rights for the Internet, by the InternetA Digital Bill of Rights for the Internet, by the Internet
A Digital Bill of Rights for the Internet, by the InternetMashable
 

En vedette (19)

Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
eBay's private Cloud Journey
eBay's private Cloud JourneyeBay's private Cloud Journey
eBay's private Cloud Journey
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Kodu Game Lab e Project Spark
Kodu Game Lab e Project SparkKodu Game Lab e Project Spark
Kodu Game Lab e Project Spark
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
 
Spark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken TsaiSpark Summit Keynote with Ken Tsai
Spark Summit Keynote with Ken Tsai
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
 
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
Implementing Near-Realtime Datacenter Health Analytics using Model-driven Ver...
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Всем плевать на ваш дизайн
Всем плевать на ваш дизайнВсем плевать на ваш дизайн
Всем плевать на ваш дизайн
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
Guia de estudio segundo parcial (tercera parte)
Guia de estudio segundo parcial (tercera parte)Guia de estudio segundo parcial (tercera parte)
Guia de estudio segundo parcial (tercera parte)
 
variational bayes in biophysics
variational bayes in biophysicsvariational bayes in biophysics
variational bayes in biophysics
 
Dunbar's Number and Your B2B Relationships
Dunbar's Number and Your B2B RelationshipsDunbar's Number and Your B2B Relationships
Dunbar's Number and Your B2B Relationships
 
Презентация НИС коммерция
Презентация НИС коммерцияПрезентация НИС коммерция
Презентация НИС коммерция
 
Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)Introduction to OSGi (Tokyo JUG)
Introduction to OSGi (Tokyo JUG)
 
A Digital Bill of Rights for the Internet, by the Internet
A Digital Bill of Rights for the Internet, by the InternetA Digital Bill of Rights for the Internet, by the Internet
A Digital Bill of Rights for the Internet, by the Internet
 

Similaire à Spark Summit Keynote by Seshu Adunuthula

Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Amazon Web Services
 
Contextually Relevant Retail APIs for Dynamic Insights & Experiences
Contextually Relevant Retail APIs for Dynamic Insights & ExperiencesContextually Relevant Retail APIs for Dynamic Insights & Experiences
Contextually Relevant Retail APIs for Dynamic Insights & ExperiencesJason Lobel
 
Internet of Things Chicago - Meetup
Internet of Things Chicago - MeetupInternet of Things Chicago - Meetup
Internet of Things Chicago - MeetupJason Lobel
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Amazon Web Services
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Amazon Web Services Korea
 
Business intelligence in the era of big data
Business intelligence in the era of big dataBusiness intelligence in the era of big data
Business intelligence in the era of big dataJC Raveneau
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsAmazon Web Services
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)Lucas Jellema
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Amazon Web Services Korea
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Ipedo Company Overview
Ipedo Company OverviewIpedo Company Overview
Ipedo Company OverviewTim_Matthews
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Amazon Web Services
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Impetus Technologies
 

Similaire à Spark Summit Keynote by Seshu Adunuthula (20)

Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
 
Contextually Relevant Retail APIs for Dynamic Insights & Experiences
Contextually Relevant Retail APIs for Dynamic Insights & ExperiencesContextually Relevant Retail APIs for Dynamic Insights & Experiences
Contextually Relevant Retail APIs for Dynamic Insights & Experiences
 
Internet of Things Chicago - Meetup
Internet of Things Chicago - MeetupInternet of Things Chicago - Meetup
Internet of Things Chicago - Meetup
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
 
Business intelligence in the era of big data
Business intelligence in the era of big dataBusiness intelligence in the era of big data
Business intelligence in the era of big data
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
 
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
Data Analytics를 통한 비지니스 혁신::Craig Stries::AWS Summit Seoul 2018
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Ipedo Company Overview
Ipedo Company OverviewIpedo Company Overview
Ipedo Company Overview
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
 

Plus de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Plus de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Dernier

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Dernier (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Spark Summit Keynote by Seshu Adunuthula

  • 1. Enterprise Data Platform and role of Apache Spark Seshu Adunuthula (@SeshuAd)
  • 3.
  • 4.
  • 5. 42% GMV VIA MOBILE 291M MOBILE APP DOWNLOADS GLOBALLY 1.3B LISTINGS CREATED VIA MOBILE EVERY WEEK 162M ACTIVE BUYERS 25M ACTIVE SELLERS 800M ACTIVE LISTINGS 8.8M NEW LISTINGS EVERY WEEK Q4 2015 Q4 2015 eBay Marketplace
  • 6. *Q3 2015 data Items sold are new 79% Items are fixed price 84% Evolving from our Auction Roots.. Transactions offer free shipping 63% Data is eBay’s Most Important Asset
  • 7. Data Sciences Personalization User propensity modeling based on 5 quarters behavioral data. Cluster based unstructured data. User (Badges, Activity Synopsis, Word Cloud), Deals Personalization Merchandising Similar items - recommending similar items on key placements on site and mobile. Powered among other things by co-clicked items. Structured Data Deal discovery leverages machine learning model. Trust Predictive machine learning models for fraud prevention, account take-over, prevent loss, bad buying experience prevention, buyer/seller risk prediction Shipping Delivery experience: Building a model to predict more accurate and shorter delivery estimates. BI & Analytical Apps Search Backend 950M items/ ~15TB indexes in 2.5 hours on a daily basis. 2M item/11TB index subsets generated near real-time in 3 minutes Search Sciences Search ranking/spam/recall factors or data (like phrase table, query-rewrite table, etc.) preparation, including many pipelines built on top of search Scala platform s Structured Data Convert 800M Item listings into Product pages that are automatically curated and persisted Traffic – Paid Search 10% efficiency lift for paid search (Amber model) Identify low performing items (Google search) Data Preparation. Bot detection, data transformation, sessionization for user behavior and core data sets Experimentation A/B Testing for new features and user Experiences released on the site. Behavioral Data Search SPD: Provides global B2C and C2C seller performance overview Nous, DNA: Provide self service product experiences analysis on product health, behavior analytics and product domain specific reporting System Services Sherlock Monitoring: Applications logs (CAL logs) are stored and processed on Hadoop. Data is eBay’s DNA
  • 9. ENTERPRISE DATA PLATFORM 9 Agile Data Warehouse Data Streams Batch Humans Sets of data Streams Systems Sets of data Data Services Services Applications Specific calls Populated Used by How Enterprise Populated Used by How Enterprise Data Platform
  • 10. Structured Data SQL Interactive Relational Analysis Semi Structured Data Java/Scala… Batch and Ad-Hoc Algorithmic Analysis Relational Analysis Programmatic Exploration EDW Hadoop 10 Analysts/BU PM/Executives/Tools Analysts/Scientists/Tools Scientists/Engineers Agile Data Warehouse
  • 11. 11 Simplify Access to DW Cross Platform VDMs Geo Distributed Caches Data Virtualization
  • 12. Apache Kylin: Extreme OLAP Engine 12 Cube Build Engine SQL Low Latency - Seconds Mid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata SQL-Based Tool (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC Star Schema Data Key Value Data Data Cube OLAP Cube (HBase) SQL REST Server MOLAP Cubes ANSI SQL on Hadoop Interactive Query on Billions of Rows Apache Kylin: Extreme OLAP Engine
  • 13. Data Streams Platform Apache Kafka Behavior TXN User Streaming Apps Sand box Stream Processing ClustersOpen Sample Streams Needs Approval Staging Pool1 Pool2 Rheos App Manager Configuration Deployment GitHub Tor a ETL Connectors Hadoop Teradat a Pool3 Risk Real-time Representation Of EDW Datasets Centralized Shared Data Streams DW populated using the Streams Platform Data Streams Platform
  • 15. •Execution environment tailored for Spark •Governance of Big Data Apps with a PaaS layer •Application life cycle spanning development to deployment Spades – Spark Provisioning & Deployment Environment
  • 17. Ingest Servers Listeners AnalystsAnalytics Platform & DeliveryCAL Application Server eBay Visitors Application servers SingularityHadoopCentral App Logging BEHAVIORAL DATA PIPELINEBehavioral Data Pipeline
  • 18. Behavior Data: A/B Testing 18 Behavioral Data: A/B Testing
  • 19. Transactional Data Pipelines 19 • Initial pattern • More data available faster on Hadoop • Leverage Hadoop SQL / Spark • Subsequent pattern • >10% datasets available on Hadoop • New innovation avenues via OpenSource • Give humans more Teradata capacity • Teradata data available no later than before • <95% extract / daily batch • >5% stream / frequent batch Transactional Data Pipelines

Notes de l'éditeur

  1. Good Morning Folks, How are you doing? The last talk I had given most of folks were in button down shirts and Suits, nice to be back in the comfort zone. Great to be at Spark Summit East, Love New York and its Energy very different compared to back in Bay Area.
  2. So we all know what eBay is. We are the other eCommerce company. As opposed to our brethren up north of us we are not bent on World Domination…
  3. You create the coolest Spark Application, you make it big, All you need is 6M Dollars and you can be a proud owner of this Cessna Citation Jet right off eBay.
  4. To the mundane, You had your fitbit and the little dongle fell off on the flight, eBay to the rescue.
  5. eBay is a truly Global Market Place. 162M Active Buyers 25M Active Sellers: This is the key differentiation from our competitor up North. A lot of what we do with Analytics is centered around making the Seller successful. A key to that is the ability to analyze and put structure around the 800M listings we have on eBay with more than 8.8M listings being added every day. As with every one else more than 40% of the eBay revenue being generated on mobile devices and growing. References Active Buyers – Q4 Earnings https://www.ebayinc.com/stories/news/ebay-inc-reports-fourth-quarter-and-full-year-2015-results/ Active Sellers ? Active listings - Q3 2015 Factsheet New listings per week - ? Mobile app (downloads?) globally - Q3 2015 Factsheet Listings created via mobile - Q3 2015 Factsheet % GMV Via Mobile - Q3 2015 Factsheet Q3 2015 Factsheet https://static.ebayinc.com/static/assets/Uploads/PressRoom/eBay-Factsheet-Q3-2015.pdf?2
  6. eBay has gradually evolved from its Auction roots. 79% of Items Sold on eBay are new items 84% of the items are fixed price items 63% of the transaction on eBay offer a free shipping We evolved into this Global Market Place by treating Data as its one of the most important assets. As a market place we do not hold an inventory, we are in the business of getting the buyers and sellers together and we do this by utilizing the data.
  7. Data is in eBay’s DNA. eBay is going through one of its most profound transformations. From Searching Listings to Product Pages that persist beyond the life time of an item that is listed on eBay. We are looking at the 800M listings and automatically converting them into millions of product pages that are generated automatically and persisted beyond the life of a listing. This will tremendously improve the user experience on eBay. We do this not just by looking at the description of an item but looking at how user interact with listing to determine if a new product page is needed or augment the existing product page. Data Science is extensively deployed in Personalization., Merchandizing, Shipping and Trust. We will go thru a few of these use cases in later slides. At a scale of 800 million listings and 162 million users, computer algorithms instead of human are needed to address most of these needs Data is at the core of what we do today and it needs an enterprise class platform. We will drill into what it takes to build an Enterprise Class Data Platform. A lot of these idea will resonate with you. Teradata and eBay have had a long partnership and you will see the Oliver’s ideas of Sentient Enterprise in reality today at eBay.
  8. Additional components are required to extract the value out of data which we are calling the Enterprise Data Platform. In the next few slides I will be talking about the Data Platform we built at eBay. Spark has made gradual inroads into most of the components of our data platform.
  9. Introduce EDP We’ve been on the path of the agile data warehouse for a long time We have the requisite user created data labs, and support experimentation and testing as well as highly integrated core (production) data Our recent change has been the addition of enterprise data streams Enterprise data streams are near real time streaming data designed to mirror the critical core data from the EDW but in real time – enhanced with history from the EDW so that actions and recommendations can be made based upon new (live) actions while taking into account rich context Without this key enabler we could not progress into stage 5: autonomous decision making
  10. The first component of our Enterprise Data Platform is an Agile Data Warehouse. We decided not to go the route of the Data Lakes instead built Data Sets both in our EDW and Hadoop Systems. Data Sets in EDW support high concurrency interactive analsys while Hadoop designed for semi structured data amenable to programmatic access. Agile
  11. With Data now being stored both in Teradata in Hadoop we came to the realization that a simple unified access is required for our datawarehouse. We should hide the complexity of the different data sources from the Analytics Applications. We are excited about this new initiative within eBay called Davis to integrate the different data sources within our Data Platform Some of its key capabilities include Ability to issues SQL queries against both Hadoop and Teradata Ability to create a cache copy of the data sets from Hadoop/Teradata Essentially VDMs Geodistribute the cached copies to the edges for global access to the datasets.
  12. Another key capability within our Agile Datawarehouse is to get sub second responses on billions of rows of data. With commodity hardware and open source tools we came to the realization that building MOLAP cubes is now a viable and cost effective alternative. Apache Kylin is an open source MOLAP implementation with the ability to supports cubes that of 10 to 100 TB in size It has ability to fetch data from variety of sources include Hive and Kafka and build cubes that are stored in Hbase. These components form the basis for the Agile Datawarehouse at eBay.
  13. Data Streams is a key component of an Enterprise Data Platform. Traditionally Datawarehouses were populated with end of the day batch processing. We are replacing this with Streaming based systems We are introducing the classic Application Development Life Cycles for these streaming applications Streaming has in general been fragmented with a variety of solutions but recently there has been some standardization being bought into the system Kafka is quickly being standardized for Messaging. Pub/Sub. A recent introduction called Apache Beam a high level DSL has the ability to standardize the various stream processing engines in the market.
  14. To round up the Enterprise Data Platform: We are providing a variety of services such as Data Search & Discovery: Ability to search the metadata in the data warehouse. We are looking at integrating the search across both Datawarehouses and Data Streams. Collaborative Analytics: Sharing queries with others, tagging queries and workspaces etc. Data Governance: Business Glossaries, Integrated Data Dictionary across Hadoop, Teradata etc.
  15. As Spark become an important component of our Data Platform we built out a brand new environment highly optimized for Spark with a lot of firsts We buit this cluster in a virtualized environment as opposed to Bare Metal We did away with Data Locality and local storage: As a result we were able to use the same compute nodes as other infrastructure within eBay. We debated a lot on YARN vs Mesos and in the end though Mesos was more performant we went with YARN for two reasons Kerberos integration and Multi-tenant capabilities Our Hadoop Clustes were like the wild wild west, once you had access you could deploy any application on the clusters. For Spark we introduced a complete App Development Lifecycle
  16. Time permitting want to talk about our Data Pipelines and impact of Spark. Data Pipelines move Data from our transaction environment into the Analytics world
  17. Two types of Pipelines The behavioral pipeline, all the activity on the web site is captured and stored in the Data warehouses both Hadoop and Teradata. eBay built one of the most sophisticated data pipelines captures around 30TB of activity every day.
  18. One of the key applications built on the behavior data pipelines is the Experimentation system. Every new feature released within eBay properties every UI change goes through a series of Experiments before being widely deployed. There are a series of stages within an experimentation system Determine the features to be added Identify the variables and create variations in the feature Run the experiment Measure the results Determine which feature needs to be deployed based on data.
  19. Transactional pipelines. These were the oldest pipelines built capture all the transactional activity on the eBay site. Listings Created, Items Sold Auctions state changes etc. A big transformation is changing these pipelines into real time using Spark and Spark streaming.
  20. That concludes my talk, In the closing we had built a sophisticated data platform to process large quantities of data and now are gradually witnessing Spark impacting many of the components that form our Data Platform.