SlideShare a Scribd company logo
1 of 52
How LinkedIn leveraged its data to become the world's
largest professional network
About me
©2013 LinkedIn Corporation. All Rights Reserved. 2
Vitaly Gordon
©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
©2013 LinkedIn Corporation. All Rights Reserved.
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
Data sets that are too large and complex
to manipulate or interrogate with standard
methods or tools.
Oxford Dictionary
©2013 LinkedIn Corporation. All Rights Reserved.
Data sets that are too large and complex
to manipulate or interrogate with standard
methods or tools.
Oxford Dictionary
©2013 LinkedIn Corporation. All Rights Reserved.
Big Data Growth
©2013 LinkedIn Corporation. All Rights Reserved. 7
1E+00
1E+01
1E+02
1E+03
1E+04
1E+05
1E+06
1E+07
1E+08
1E+09
Storage Growth Data Growth
©2013 LinkedIn Corporation. All Rights Reserved.
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
1 What is Big Data?
©2013 LinkedIn Corporation. All Rights Reserved. 9
©2013 LinkedIn Corporation. All Rights Reserved. 10
increase in sales
©2013 LinkedIn Corporation. All Rights Reserved. 11
©2013 LinkedIn Corporation. All Rights Reserved. 12
of watched content
©2013 LinkedIn Corporation. All Rights Reserved. 13
©2013 LinkedIn Corporation. All Rights Reserved. 14
40M users in 18 months
Big Data is more about Business
than Data
©2013 LinkedIn Corporation. All Rights Reserved.
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
6 Summary
1 What is Big Data?
2 Big Data Applications
©2013 LinkedIn Corporation. All Rights Reserved. 17
LinkedIn Revenue
Quarterly Revenue
------------------200 ----------------------------------2010-------------------------------2011----------------
Hiring Solutions Marketing Solutions Premium Subscriptions
($ millions)
-----------------2012-------------------2013---
©2013 LinkedIn Corporation. All Rights Reserved. 18
23 28 30
39 45
55 62
82
94
121
139
168
188
228
252
304
325
0
50
100
150
200
250
300
350
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1
©2013 LinkedIn Corporation. All Rights Reserved. 19
Premium Subscriptions
©2013 LinkedIn Corporation. All Rights Reserved. 20
Marketing Solutions
©2013 LinkedIn Corporation. All Rights Reserved. 21
Talent Solutions
©2013 LinkedIn Corporation. All Rights Reserved. 22
Connecting Talent With Opportunity
©2013 LinkedIn Corporation. All Rights Reserved.
Jobs You May Be Interested In (JYMBII) – Case Study
©2013 LinkedIn Corporation. All Rights Reserved. 24
Software Engineer at
Data Scientist at
Product Manager at
Jobs You May Be Interested In – Case Study
©2013 LinkedIn Corporation. All Rights Reserved. 25
Design
JYMBII – Building The Product
©2013 LinkedIn Corporation. All Rights Reserved. 26
Algorithms
Design
Design Algorithms Framework
Design
©2013 LinkedIn Corporation. All Rights Reserved. 27
Design
©2013 LinkedIn Corporation. All Rights Reserved. 28
Design
©2013 LinkedIn Corporation. All Rights Reserved. 29
1,000X more users
Start simple
Grow with success
©2013 LinkedIn Corporation. All Rights Reserved.
Algorithms
©2013 LinkedIn Corporation. All Rights Reserved. 31
`
Algorithms
©2013 LinkedIn Corporation. All Rights Reserved. 32
Algorithms
©2013 LinkedIn Corporation. All Rights Reserved. 33
50% better results
Start simple
Grow with success
©2013 LinkedIn Corporation. All Rights Reserved.
Technology
©2013 LinkedIn Corporation. All Rights Reserved. 35
Some people, when confronted with a big
data problem, think, I'll use Hadoop.
Now they have a big data problem and a
big Hadoop cluster.
Dmitry Ryaboy, Twitter Engineering Manager
Technology
©2013 LinkedIn Corporation. All Rights Reserved. 36
Technology Advancement
©2013 LinkedIn Corporation. All Rights Reserved. 37
Technology Advancement
©2013 LinkedIn Corporation. All Rights Reserved. 38
50X faster
Kafka
Start simple, grow with success
©2013 LinkedIn Corporation. All Rights Reserved.
4 Finding Experts
5 Big Data Recipe
6 Summary
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
Finding Data Experts
©2013 LinkedIn Corporation. All Rights Reserved. 41
Increase in demand for big data experts
X
Finding Data Experts
©2013 LinkedIn Corporation. All Rights Reserved. 42
Are new analytics experts
33
Finding Data Experts
©2013 LinkedIn Corporation. All Rights Reserved. 43
Be challenged at LinkedIn
We're looking for superb analytical minds
of all levels to expand our small team that
will build some of the most innovative
products at LinkedIn.
No specific technical skills are required
(we'll help you learn SQL, Python, and R).
You should be extremely intelligent, have a
quantitative background, and be able to
learn quickly and work independently.
This is the perfect job for someone who's
really smart, driven, and extremely skilled
at creatively solving problems. You'll learn
statistics, data mining, programming, and
product design, but you've gotta start with
what we can't teach—intellectual
sharpness and creativity.
LinkedIn Experts
©2013 LinkedIn Corporation. All Rights Reserved. 44
LinkedIn Experts
©2013 LinkedIn Corporation. All Rights Reserved. 45
Don't wait for a big data expert to
knock on your door - create your own
©2013 LinkedIn Corporation. All Rights Reserved.
5 Big Data Recipe
6 Summary
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
©2013 LinkedIn Corporation. All Rights Reserved. 48
Big Data Recipe
©2013 LinkedIn Corporation. All Rights Reserved. 49
Big Data Recipe
INGREDIENTS
1. Important business metric
2. Correlating factors
3. Causing factors
4. Product to affect the behavior
METHOD OF PREPARATION
1. Build a simple prototype
2. Measure the effect
3. Improve logic and scale
4. Measure the effect
5. Improve logic and scale
6. Measure the effect
©2013 LinkedIn Corporation. All Rights Reserved.
6 Summary
1 What is Big Data?
2 Big Data Applications
3 LinkedIn’s Big Data Solutions
4 Finding Experts
5 Big Data Recipe
©2013 LinkedIn Corporation. All Rights Reserved. 51
©2013 LinkedIn Corporation. All Rights Reserved. 52
감사합니다

More Related Content

Similar to Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Hakka Labs
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsPeter Skomoroch
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content DominationLinkedIn
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination Jason Miller
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)Social Fresh Conference
 
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsEmil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsNeo4j
 
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...Dataconomy Media
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesCA | Automic Software
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Jason Miller
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analyticsSrinu Adira
 
Blackcoffer Business development
Blackcoffer Business  developmentBlackcoffer Business  development
Blackcoffer Business developmentHarshita Singh
 
Blackcoffer Business Development
Blackcoffer Business Development Blackcoffer Business Development
Blackcoffer Business Development Harshita Singh
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInYael Garten
 
Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...Neo4j
 
BIg data dan data mining
BIg data dan data miningBIg data dan data mining
BIg data dan data miningdiki70
 
Integra Sources Presentation
Integra Sources PresentationIntegra Sources Presentation
Integra Sources PresentationAndreySolovev
 
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating TechnologyA Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating TechnologyInsight
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightSunil Ranka
 

Similar to Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network (20)

Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for SlideShare Content Domination
 
7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination 7 Badass Tactics for Slideshare Content Domination
7 Badass Tactics for Slideshare Content Domination
 
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
 
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent ApplicationsEmil Eifrém - The Data Platform for Today’s Intelligent Applications
Emil Eifrém - The Data Platform for Today’s Intelligent Applications
 
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data Processes
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Big data arch_analytics
Big data arch_analyticsBig data arch_analytics
Big data arch_analytics
 
Blackcoffer Business development
Blackcoffer Business  developmentBlackcoffer Business  development
Blackcoffer Business development
 
Blackcoffer Business Development
Blackcoffer Business Development Blackcoffer Business Development
Blackcoffer Business Development
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedIn
 
Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...Are You Underestimating the Value Within Your Data? A conversation about grap...
Are You Underestimating the Value Within Your Data? A conversation about grap...
 
BIg data dan data mining
BIg data dan data miningBIg data dan data mining
BIg data dan data mining
 
Integra Sources Presentation
Integra Sources PresentationIntegra Sources Presentation
Integra Sources Presentation
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating TechnologyA Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
A Super Solution Integrator Drives Business Outcomes by Orchestrating Technology
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to Foresight
 

Recently uploaded

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Big Data World 2013 - How LinkedIn leveraged its data to become the world's largest professional network

Editor's Notes

  1. LinkedIn – I am currently a data scientist at LinkedIn, one of the world's most advanced big data companies.LivePerson – I have previously worked at LivePerson where I was the first person hired to build their big data solution, so I have experienced both the very beginning of big data solutions and the cutting edge.I will share with you the lessons I've learned while working on big data from both ends of the spectrumI also have a business degree from the Israeli Institute of Technology and a computer science degree from Ben-Gurion university
  2. This is what I am going to talk about, I chose these subjects because they answer the most burning questions both when I was starting with big data and when I was perfecting my craft
  3. The term, Big Data, is used in many ways, so before I'll start talking about big data, I want to explain what big data is
  4. Yes, there is an entry in the Oxford English Dictionary for Big Data
  5. The main word here is standard. Before Big Data, standard methods and tools were enough to process the data we had and now it's not, but what happened?
  6. Data created opportunities, which in turn created demand for even more data and the amount of data in the world grew larger and larger
  7. So what are those big data opportunities I've mentioned? The best way to see is through examples.
  8. Amazon, the ecommerce giant analyzes data about its shoppers. It analyzes what products they are looking at, what products they are searching for and most importantly, what products they are buying.This analysis enables them to produce a product I am sure you have all seen ...
  9. Here we can see that if I look at the book "Big Data Analytics", Amazon provides me with other recommendations about similar books.-- Show increase in sales –So why did it increase sales so much? The logic here is simple, the more products customers see, the higher the chance they will buy something. Amazon wants to show us as many products as it can in order to get us to buy something.
  10. My second example is Netflix.Netflix is an American company that started as a DVD rental service and quickly became a streaming platform for movies and TV shows. It has about 30 million subscribers.At the end of each movie, Netflix asks the viewer to rate the movie he just watched. Netflix has billions of movie ratings from millions of users and it uses this data to create the following product.
  11. Using our rating history, Netflix calculates a unique "taste" for every one of its subscribers and uses this taste to recommend them movies. This product is so important to Netflix, that in 2006 Netflix offered a prize of million dollars to whoever can improve their algorithm by more than 10%.-- Show statisticsSo why is this recommendation engine is so important? The more users find movies they like on Netflix, the longer they will keep their subscription, earning money to Netflix.
  12. My third example is a small Israeli startup. Waze is a GPS mobile app that tracks where people are and at what speed are they travelling.
  13. Waze uses this data to compute traffic maps where they show which streets are have traffic jams and route you according to this data, providing much better traffic suggestions than apps that don't use traffic information.After gaining more than 50 million users for its app, Waze was acquired by Google for about 1.1 billion dollars.Side note: I understand there will be a talk later today by a Korean company that does something very similar.
  14. The above examples, and many more, lead me to the first lesson I've learned about big data
  15. These are great examples. But to dive even deeper to big data applications, let's look at the company I currently work for, LinkedIn.Since we said that Big Data is more about business than data, let me show you first what is LinkedIn's business.
  16. LinkedIn is the largest professional social network in the world. It has more than 225M members. Our largest markets today are North America and Europe, but Asia is growing very well too, with several countries having more than a million members on LinkedIn.
  17. Not only LinkedIn has a lot of members, it also makes significant revenue. Across it 3 bussiness lines, LinkedIn has made almost a billion dollars last year and about 325 million in the first quarter of 2013.
  18. These 3 product lines are Premium Subscriptions, Marketing Solutions and Talent Solutions.Let's dive more deeply into each one of them to understand them better
  19. The premium subscriptions business is for LinkedIn users that want to get extra features on LinkedIn. Those features might be better analytics about who viewed their profile and the ability to contact anyone on LinkedIn through In Mails, LinkedIn's personal messaging system.This product really separates LinkedIn from other social networks in the fact that some of the users of the network pay extra to use it.
  20. Marketing solutions is more similar to what you can find on other social networks. We offer companies the ability to market their products to our members. Since LinkedIn is a professional network with most members having a job or even a lucrative one. The target population is very appealing for marketers who want to market their products.
  21. Our third and largest in terms of revenue product line is the talent solution. Here companies like Sony, Walmart and Loreal pay for their recruiters to have additional functionality for their recruiting needs. This is almost like another product inside LinkedIn for our recruiter members. This product line bring about 57% of LinkedIn's revenue.
  22. LinkedIn's number 1 mission is connecting talent with opportunity. Both helping companies find new talent and helping our 225+ million members find new opportunities when they need themOne of the first big data applications at LinkedIn was to help members find a new job, and I will now dive deep into how it was done
  23. JYMBII is a big data product that matches members with job postings on LinkedIn. For example: here is me, and some of the jobs companies posted on LinkedIn. For every job, we create a score on how much this job is a good fit for the member. Here you can see that I am a good match for a data scientist position at Facebook, and not such a good match for a product manager at Yahoo.
  24. After creating scores for all the jobs in our database, we create a small widget on our homepage where every member can see his top matching jobs.
  25. I will walk you through the 3 pillars of every big data product – Design, Algorithms and Infrastructure/Framework.
  26. Let's start with design. In a consumer oriented company design is very important, because this is how users interact with your product. Also, in many cases, design is the hardest thing for a single small team to change because so many teams are involved.In most companies the big data team is separate from the team that works on the main product, so those of you who already started implementing big data solutions probably know how difficult it is to try to do some tests on the main product. Try to do anything you can to bypass other teams in your organization to test your big data solutions.When LinkedIn's Data Science team decided to build JYMBII, they wanted a very very simple way to test whether their product is working without making too many changes to the main site. This is how they did it. They started with email. Here you can see how the actual email looks today, where I got some recommendations for jobs I might be interested in.The reason why they chose email, is because it is a way to test your product on a small subset of users, without everyone who comes to your website being affected by it and also there is no need to make any changes to the main website.
  27. After the initial emails showed great success and that people are actually interested in it. Our team has built this very small widget that shows the top jobs you might be interested in. Again, it was done with minimum integration with the main website, by having this widget replace one of the ads we had on the site for a certain percentage of the users.
  28. After the great success of the widget, Jobs have now their own section at the LinkedIn website where users can search for jobs and more.Having the job section resulted in having 1000 times more users looking at the LinkedIn jobs than beforehandRemember, JYMBII did not start with its own website, but grew up to have it.
  29. My main message about how to design data products is to start simple and grow with success.
  30. Let's now talk about algorithms, or how does LinkedIn matches members with job postings.The first iteration of the algorithm was very simple. We look at the member's profile, we look at the job posting and we do keyword matching. Very similar to how recruiters screen candidate resumes for a potential match. In this example we can see that my profile is a pretty decent match for this job opportunity.There is no need for a natural language processing expert or a computer science doctor to implement this algorithm. It is pretty simple and worked pretty well for our first prototype.
  31. When the first protype of the email succeeded the team moved to imrove the algorithm a bit further, adding features like education and experience which are also very important for determining the candidate's fit to a position. These improvement, improved the recommendations even further, resulting in more people engaging with jobs on the LinkedIn website
  32. Finally, now that we have our job page on the website where users can search for jobs, save jobs and apply for jobs. We can use all of these signals to recommend users similar jobs to the ones the found themselves.All of these improvements resulted in a 50% more accurate job recommendations to our members.
  33. The message for algorithms is the same as it for design, don't try to implememnt something very difficult before you know your customers even want it. Start simple and grow with success.
  34. Here is a quote from a Twitter engineering manager that I like very much. What it says that most of the time, Hadoop doesn't solve a big data problem, it actually brings a set of new problems to deal with even before we know that what we are trying to build is worth building.
  35. The first JYMBII prototype was developed using a very simple technology. Oracle, some perl scripts in between in some shell scripts. The process involved someone copying files manually from one computer to another, running some scripts on that computer and then copying back the results. The process was so inefficient that it took 6 weeks to run.But 6 weeks is better than never.
  36. After the success of the initial product, LinkedIn has decided to make some infrastructure invetment in buying a parallel database from companies like GreenPlum and AsterData. This sped up the process to run now in a single week instead of 6.
  37. Eventually LinkedIn moved not only to Hadoop but also built it's own infrastucture with project like Kafka, Voldemort and Zoie. You can find more information about them on the linkedin open source page.Now we are generating new recommendations every day, which is 50 times better than having it every 6 weeks.You probably figured out the second lessong by now ...
  38. One of the most important questions that kept me busy for a long time as well is where you find big data expertsBefore I give you the answer, I would like to show you 2 graphs
  39. Here you can see that in the beginning of 2011 the demand for big data experts was 30 times higher than the year before. Now it is even higher. Everyone is looking for big data experts.
  40. Here is a graph from LinkedIn's own analytics team. Here you can see that 33% of the people who started a job as data scientist or analysts are new to this job.You can probably see where I am going with this. Most people who work in big data are new to big data.LinkedIn have realized it quickly and here is the proof ...
  41. Here is an actual LinkedIn job posting from 2008 when LinkedIn just started with big data.The key message is this ... No specific technical skills are requiredHere is an example of how LinkedIn have implemented this strategy on 2 of my colleagues.
  42. Joseph Adler came to LinkedIn from Netflix, where he did Operations Engineering. Now he is one of our top experts on big data and even written a very successful book about it.
  43. Jason is a new data scientist at linkedin. Prior to that he was radar signal processing expert. He is still just at the beginning of his career at LinkedIn, but so far he is doing very well and educating himself quickly,
  44. My third lesson is a bit hard to chew, but if you follow my previous 2, it becomes easier. Look for big data experts everywhere and at all times, but don't let it stop you from starting your projects.
  45. So how do you start a big data project? I would like to show you a very simple recipe you could follow
  46. As always, in order to make it more clear, I will use an example to guide us through the recipe.People You May Know is a LinkedIn Big Data product that traverses your profile and the entire LinkedIn graph to suggest people you should connect with.Let's see how can we use our recipe to create big data applications such as People You May Know.
  47. Important business metric – how often members visit the websiteCorrelating factors – How many new items they have on their news feed. But that is not the root of the cause, something else is affecting it.Causing factors – How many connections do the have.Product – Recommend new connections to users – People You May Know.Beware of the second-system effect, how many of you have been involved with projects where the first prototype was pretty succesful and the second one was much bigger and failed?