SlideShare a Scribd company logo
1 of 12
Download to read offline
Spark and Hadoop at Yahoo:
Brought to you by YARN

Andy Feng
Yahoo! Hadoop
(afeng@yahoo-inc.com)
Personalized Web
Big-Data in Yahoo!

3

9/10/13
Hadoop + Spark:
Empowered by YARN

30k+ Yahoo! production nodes on YARN since Q1 2013
Shark Pilot: Advertising Data Analytics
§  Business questions
› 

Are two sets of audience cohorts similar to each other?

› 

What audience segment is most likely to be interested in this ad
campaign?

› 

In what way was the new front page rollout different than the
previous front page as far as audience engagement goes?

› 

What are the right metrics to define user engagement?

§  Shark pilot
› 

50 nodes, each w/ 96GB RAM
•  Currently loaded w/ 3.2 TB sample data in memory

› 

Homegrown BI tools for ad-hoc queries
•  Using Shark Server (contributed to community by Yahoo!)
Shark Perf: TCP-H Benchmark
Average
Seconds
600
500
400
300
200
100
0
Spark Pilot: Model Training Pipeline
§  A DAG of M/R jobs in Hadoop Streaming
› 

Feature extraction

› 

Train models

› 

Score and analyze models

§  Initial Spark prototype
› 

3x speedup on feature extraction

§  Production launch
› 

Apply Spark against complete pipeline

› 

Spark on 80 node cluster
•  Thanks to the enhanced UI and metrics in Spark 0.8

7

9/10/13
Use Case: Ad Targeting

Spark

M/R and Storm

8

9/10/13
Use Case: Content Recommendation
w/ Collaborative Filtering
Input

CF Learning

Ranking

Spark

Spark

9

9/10/13

Output
Spark-YARN: Deployment Simplified
run spark.deploy.yarn.Client --jar … --class … --args …
--queue …--num-workers … --worker-memory …

Spark-YARN (contributed by Yahoo!) is being adopted by
community (ex. Taobao) for production use. You should try it
on your Hadoop cluster.
10

9/10/13
Acknowledgement
§  AMPLab team
› 

Outstanding collaboration: Ion, Matei, Reynold, Patrick, Matt, …

§  Yahoo! Hadoop team
› 

Thomas, Bobby, Paul, Rajiv, Mithun, …

§  Yahoo! Lab.
› 

Mridul, Nathan, …

§  Yahoo! data analytics
› 

Supreeth, Ram, Tim, …

§  Yahoo! spark users
› 

Gavin, Jay, Hirakendu, …

11

9/10/13
We Are Hiring!
http://careers.yahoo.com/

More Related Content

What's hot

Keyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebKeyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebBill Slawski
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimizationShreyas Anand
 
Website Revamp pitch presentation
Website Revamp pitch presentationWebsite Revamp pitch presentation
Website Revamp pitch presentationsufy_3mpty
 
PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...
PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...
PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...Taste
 
SMX Advanced: Thriving in the New World of Pagination
SMX Advanced: Thriving in the New World of PaginationSMX Advanced: Thriving in the New World of Pagination
SMX Advanced: Thriving in the New World of PaginationLily Ray
 
Website Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & AnalysisWebsite Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & AnalysisSam Partland
 
PPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketing
PPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketingPPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketing
PPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketingTaste
 
Audit SEO : les clés de la réussite
Audit SEO : les clés de la réussiteAudit SEO : les clés de la réussite
Audit SEO : les clés de la réussiteDaniel Roch - SeoMix
 
How to be a data storyteller
How to be a data storytellerHow to be a data storyteller
How to be a data storytellerEdaSalihoglu
 
SEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) ReportSEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) ReportKevin James
 
Build Better Backlinks for Local Brands - MozCon 2023
Build Better Backlinks for Local Brands - MozCon 2023Build Better Backlinks for Local Brands - MozCon 2023
Build Better Backlinks for Local Brands - MozCon 2023AmandaJordan29
 
Creative Seo Proposal
Creative Seo ProposalCreative Seo Proposal
Creative Seo Proposalnishalegend
 
Seo analysis report template (1)
Seo analysis report template (1)Seo analysis report template (1)
Seo analysis report template (1)Doiphode Vishal
 
On-Page Optimization SEO Report Sample by SEO Traffic
On-Page Optimization SEO Report Sample by SEO TrafficOn-Page Optimization SEO Report Sample by SEO Traffic
On-Page Optimization SEO Report Sample by SEO TrafficSEO Traffic
 
PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...
PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...
PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...Taste
 
"En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ...
"En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ..."En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ...
"En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ...Dan Bernier
 
Lazy Load '22 - Performance Mistakes - An HTTP Archive Deep Dive
Lazy Load  '22 - Performance Mistakes - An HTTP Archive Deep DiveLazy Load  '22 - Performance Mistakes - An HTTP Archive Deep Dive
Lazy Load '22 - Performance Mistakes - An HTTP Archive Deep DivePaul Calvano
 
Alt e Lang: Dois atributos da pesada
Alt e Lang: Dois atributos da pesadaAlt e Lang: Dois atributos da pesada
Alt e Lang: Dois atributos da pesadaReinaldo Ferraz
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsRazvan Gavrilas
 

What's hot (20)

Keyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic WebKeyword Research and Topic Modeling in a Semantic Web
Keyword Research and Topic Modeling in a Semantic Web
 
Search engine optimization
Search engine optimizationSearch engine optimization
Search engine optimization
 
Website Revamp pitch presentation
Website Revamp pitch presentationWebsite Revamp pitch presentation
Website Revamp pitch presentation
 
PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...
PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...
PPC Restart 2022: David Janoušek - Share of Search - Nejdůležitější metrika, ...
 
Data Driven SEO in iGaming niche
Data Driven SEOin iGaming nicheData Driven SEOin iGaming niche
Data Driven SEO in iGaming niche
 
SMX Advanced: Thriving in the New World of Pagination
SMX Advanced: Thriving in the New World of PaginationSMX Advanced: Thriving in the New World of Pagination
SMX Advanced: Thriving in the New World of Pagination
 
Website Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & AnalysisWebsite Migration SEO: Advanced Migration Strategy & Analysis
Website Migration SEO: Advanced Migration Strategy & Analysis
 
PPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketing
PPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketingPPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketing
PPC Restart 2022: Tomáš Komárek - 5 tipů na efektivnější B2B online marketing
 
Audit SEO : les clés de la réussite
Audit SEO : les clés de la réussiteAudit SEO : les clés de la réussite
Audit SEO : les clés de la réussite
 
How to be a data storyteller
How to be a data storytellerHow to be a data storyteller
How to be a data storyteller
 
SEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) ReportSEO, Search Engine Ranking Position (SERP) Report
SEO, Search Engine Ranking Position (SERP) Report
 
Build Better Backlinks for Local Brands - MozCon 2023
Build Better Backlinks for Local Brands - MozCon 2023Build Better Backlinks for Local Brands - MozCon 2023
Build Better Backlinks for Local Brands - MozCon 2023
 
Creative Seo Proposal
Creative Seo ProposalCreative Seo Proposal
Creative Seo Proposal
 
Seo analysis report template (1)
Seo analysis report template (1)Seo analysis report template (1)
Seo analysis report template (1)
 
On-Page Optimization SEO Report Sample by SEO Traffic
On-Page Optimization SEO Report Sample by SEO TrafficOn-Page Optimization SEO Report Sample by SEO Traffic
On-Page Optimization SEO Report Sample by SEO Traffic
 
PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...
PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...
PPC Restart 2022: Jan Janoušek - Využijte maximální potenciál kampaně Perform...
 
"En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ...
"En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ..."En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ...
"En SEO, améliorer son maillage interne grâce au test & learn : étude de cas ...
 
Lazy Load '22 - Performance Mistakes - An HTTP Archive Deep Dive
Lazy Load  '22 - Performance Mistakes - An HTTP Archive Deep DiveLazy Load  '22 - Performance Mistakes - An HTTP Archive Deep Dive
Lazy Load '22 - Performance Mistakes - An HTTP Archive Deep Dive
 
Alt e Lang: Dois atributos da pesada
Alt e Lang: Dois atributos da pesadaAlt e Lang: Dois atributos da pesada
Alt e Lang: Dois atributos da pesada
 
Building an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gapsBuilding an SEO Exponential Growth model by closing your content gaps
Building an SEO Exponential Growth model by closing your content gaps
 

Similar to Yahoo spark

Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningSukru Hasdemir
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open DataJongwook Woo
 
Capstone Project Slides- Yelper
Capstone Project Slides- YelperCapstone Project Slides- Yelper
Capstone Project Slides- YelperChuan Sun
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Lillian Pierson
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopJongwook Woo
 
Power Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite AnnouncementsPower Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite AnnouncementsSimon Doy
 
The Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingThe Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingBritney Muller
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopMark Ginnebaugh
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 

Similar to Yahoo spark (20)

Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine Learning
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Capstone Project Slides- Yelper
Capstone Project Slides- YelperCapstone Project Slides- Yelper
Capstone Project Slides- Yelper
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
Power Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite AnnouncementsPower Platform Leeds - November 2019 - Microsoft Ignite Announcements
Power Platform Leeds - November 2019 - Microsoft Ignite Announcements
 
The Future Of SEO/Content Marketing
The Future Of SEO/Content MarketingThe Future Of SEO/Content Marketing
The Future Of SEO/Content Marketing
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Yahoo spark

  • 1. Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)
  • 4. Hadoop + Spark: Empowered by YARN 30k+ Yahoo! production nodes on YARN since Q1 2013
  • 5. Shark Pilot: Advertising Data Analytics §  Business questions ›  Are two sets of audience cohorts similar to each other? ›  What audience segment is most likely to be interested in this ad campaign? ›  In what way was the new front page rollout different than the previous front page as far as audience engagement goes? ›  What are the right metrics to define user engagement? §  Shark pilot ›  50 nodes, each w/ 96GB RAM •  Currently loaded w/ 3.2 TB sample data in memory ›  Homegrown BI tools for ad-hoc queries •  Using Shark Server (contributed to community by Yahoo!)
  • 6. Shark Perf: TCP-H Benchmark Average Seconds 600 500 400 300 200 100 0
  • 7. Spark Pilot: Model Training Pipeline §  A DAG of M/R jobs in Hadoop Streaming ›  Feature extraction ›  Train models ›  Score and analyze models §  Initial Spark prototype ›  3x speedup on feature extraction §  Production launch ›  Apply Spark against complete pipeline ›  Spark on 80 node cluster •  Thanks to the enhanced UI and metrics in Spark 0.8 7 9/10/13
  • 8. Use Case: Ad Targeting Spark M/R and Storm 8 9/10/13
  • 9. Use Case: Content Recommendation w/ Collaborative Filtering Input CF Learning Ranking Spark Spark 9 9/10/13 Output
  • 10. Spark-YARN: Deployment Simplified run spark.deploy.yarn.Client --jar … --class … --args … --queue …--num-workers … --worker-memory … Spark-YARN (contributed by Yahoo!) is being adopted by community (ex. Taobao) for production use. You should try it on your Hadoop cluster. 10 9/10/13
  • 11. Acknowledgement §  AMPLab team ›  Outstanding collaboration: Ion, Matei, Reynold, Patrick, Matt, … §  Yahoo! Hadoop team ›  Thomas, Bobby, Paul, Rajiv, Mithun, … §  Yahoo! Lab. ›  Mridul, Nathan, … §  Yahoo! data analytics ›  Supreeth, Ram, Tim, … §  Yahoo! spark users ›  Gavin, Jay, Hirakendu, … 11 9/10/13