SlideShare une entreprise Scribd logo
1  sur  45
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
quora-ml
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Quora is a platform to ask
questions, get useful
answers, and share what
you know with the world.
● Data at Quora
● Lifecycle of a question
● Deep dive: Automatic question correction
● Other question and answer understanding examples
Users
AnswersQuestions
Topics Votes
Comments
Lots of
data
relations
Follow
Ask
Write Cast
Have
Contain Get
Get
Follow
Write
Have Have
User asks a question
Question quality
● Adult detection
● Quality classification (high vs low)
● Automatic question correction
● Duplicate question detection and merging
● Spam/abuse detection
● Policy violations
● etc.
Question understanding
● Question-Topic labeling
● Question type classification
● Question locale detection
● Related Questions
● etc.
Matching questions to writers
● “Request Answers”
● Feed ranking for questions
Writer writes an answer to a question
Answer quality
● Answer ranking for questions
● Answer collapsing
● Adult detection
● Spam/abuse detection
● Policy violations
● etc.
Matching answers to readers
● Feed ranking for answers
● Digest emails
● Search ranking
● Visitors coming from Google
Other ML applications
● Ads
○ Ads CTR prediction
○ Ads-topic matching
● ML on other content types
○ Comment quality + ranking
○ Answer wiki quality + ranking
● Other recommender systems
○ Users to follow
○ Topics to follow
● Under the hood
○ User understanding signals
○ User-topic affinity
○ User-user affinity
○ User expertise
● … and more
● Users often ask questions with grammatical and spelling errors
● Example:
○ Which coin/token is next big thing in crypto currencies? And why?
○ Which coin/token is the next big thing in cryptocurrencies? Why?
● These are well-intentioned questions, but the lack of correct phrasing hurts them
○ Less likely to be answered by experts
○ Harder to catch duplicate questions
○ Can hurt the perception of “quality” of Quora
● Types of errors in questions
○ Grammatical errors, e.g., “How I can ...”
○ Spelling mistakes
○ Missing preposition or article
○ Wrong/missing punctuation
○ Wrong capitalization
○ etc.
● Can we use Machine Learning to automatically correct these questions?
● Started off as an “offroad” hack-week project
● Since shipped
● We frame this problem similar to the machine
translation problem
● Final Model:
○ Multi-level, sequence-to-sequence,
character-level GRU with attention
• At the core: A neuron
• Convert one or more inputs into a single output
via this function
• Objective: Learn the values of weights w_i
given the training data
• Can solve simple ML problems well
• At the core of all the deep learning revolution
(and hype)
• Layers of neurons connecting the inputs to the
outputs
• Training: Adjust the weights of the network
via gradient descent using the backpropagation
algorithm
• Serving: Given a trained network, predict the
output for a new input
Image courtesy: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Standard NNs
o Take in all the inputs at once
o Can’t capture sequential dependencies
between input data
• Recurrent Neural Networks
• Great for data that is in a sequence form: Text,
Videos etc.
• Example tasks: Language modeling (predict the
next word in a sentence), language generation,
sentiment analysis, video scene labeling etc.
Image courtesy: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Standard RNNs
o Hard to capture long-term
dependencies
o Perform worse on longer sequences
• Modifications to handle long-term
dependencies better:
o Long Short Term Memory (LSTMs)
o Gated Recurrent Units (GRUs)
• Better than vanilla RNNs for most tasks
Image courtesy: https://smerity.com/articles/2016/google_nmt_arch.html
• Takes a sequence as input, predicts a sequence as
output. E.g. machine translation
• Also known as the encoder-decoder model
• Ideal when input and output sequences can be of
different lengths
• Base case: Input sequence -> s -> output sequence
• Example tasks: Machine translation, speech
recognition, sentence correction etc.
• Base sequence-to-sequence model: Hard to capture
longer context
• Attention mechanism: When predicting a
particular output, tells you which part of the input to
focus on
• Works really well when the output sequence has a
strong 1:1 mapping with the input sequence
• Better than sequence models without attention for
most tasks
Image courtesy: https://smerity.com/articles/2016/google_nmt_arch.html
• Character-level RNNs
• Bidirectional RNNs
o Captures dependencies in both
directions
• Beam search decoding (vs. greedy decoding)
● Final question correction model:
○ Multi-level, sequence-to-sequence,
character-level GRU with attention
● Tried solving the subproblems individually, but didn’t
work as well
● Training
○ Training data: Pairs of [bad question,
corrected question]
○ Training data size: O(100,000) examples
○ Tensorflow, on a single box with GPUs
○ Training time: 2-3 hours
● Serving:
○ Tensorflow, GPU-based serving
○ Latency: <500 ms p99
● Run on new questions added to Quora
• Goal: Given a question, come up with topics that
describe it
• Traditional topic labeling: Lots of text, few topics
• Question-topic labeling: Less text, huge topic space
• Features:
o Question text
o Relation to other questions
o Who asked the question
o etc.
• Goal: Single canonical question per intent
• Duplicate questions:
o Make it harder for readers to seek knowledge
o Make it harder for writers to find questions to
answer
• Semantic question matching. Not simply a syntactic
search problem.
● BNBR = Be Nice, Be Respectful policy
● Binary classifier: Checks for BNBR violations on
questions, answers, comments.
● Training data:
○ Positive: Confirmed BNBR violations
○ Negative: False BNBR reports, other good
content
● Model: NN with 1 hidden layer (fastText)
• Goal: Given a question and n answers, come up with
the ideal ranking
• What makes a good answer?
o Truthful
o Reusable
o Well formatted
o Clear and easy to read
o ...
• Features
o Answer features: Quality, Formatting etc.
o Interaction features (upvotes/downvotes, clicks,
comments…)
o Network features: Who interacted with the
answer?
o User features: Credibility, Expertise
o etc.
● Machine Learning systems form an important part of what drives Quora
● Lots of interesting Machine Learning problems and solutions all along the question
lifecycle
● Machine Learning helps us make Quora more personalized and relevant to you at scale
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
quora-ml

Contenu connexe

Plus de C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Plus de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Dernier

Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Dernier (20)

Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

ML for Question and Answer Understanding @Quora

  • 1.
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ quora-ml
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4. Quora is a platform to ask questions, get useful answers, and share what you know with the world.
  • 5.
  • 6.
  • 7. ● Data at Quora ● Lifecycle of a question ● Deep dive: Automatic question correction ● Other question and answer understanding examples
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 14.
  • 15. User asks a question Question quality ● Adult detection ● Quality classification (high vs low) ● Automatic question correction ● Duplicate question detection and merging ● Spam/abuse detection ● Policy violations ● etc.
  • 16. Question understanding ● Question-Topic labeling ● Question type classification ● Question locale detection ● Related Questions ● etc.
  • 17. Matching questions to writers ● “Request Answers” ● Feed ranking for questions
  • 18. Writer writes an answer to a question Answer quality ● Answer ranking for questions ● Answer collapsing ● Adult detection ● Spam/abuse detection ● Policy violations ● etc.
  • 19. Matching answers to readers ● Feed ranking for answers ● Digest emails ● Search ranking ● Visitors coming from Google
  • 20. Other ML applications ● Ads ○ Ads CTR prediction ○ Ads-topic matching ● ML on other content types ○ Comment quality + ranking ○ Answer wiki quality + ranking ● Other recommender systems ○ Users to follow ○ Topics to follow ● Under the hood ○ User understanding signals ○ User-topic affinity ○ User-user affinity ○ User expertise ● … and more
  • 21.
  • 22. ● Users often ask questions with grammatical and spelling errors ● Example: ○ Which coin/token is next big thing in crypto currencies? And why? ○ Which coin/token is the next big thing in cryptocurrencies? Why? ● These are well-intentioned questions, but the lack of correct phrasing hurts them ○ Less likely to be answered by experts ○ Harder to catch duplicate questions ○ Can hurt the perception of “quality” of Quora
  • 23. ● Types of errors in questions ○ Grammatical errors, e.g., “How I can ...” ○ Spelling mistakes ○ Missing preposition or article ○ Wrong/missing punctuation ○ Wrong capitalization ○ etc. ● Can we use Machine Learning to automatically correct these questions? ● Started off as an “offroad” hack-week project ● Since shipped
  • 24.
  • 25. ● We frame this problem similar to the machine translation problem ● Final Model: ○ Multi-level, sequence-to-sequence, character-level GRU with attention
  • 26. • At the core: A neuron • Convert one or more inputs into a single output via this function • Objective: Learn the values of weights w_i given the training data • Can solve simple ML problems well • At the core of all the deep learning revolution (and hype)
  • 27. • Layers of neurons connecting the inputs to the outputs • Training: Adjust the weights of the network via gradient descent using the backpropagation algorithm • Serving: Given a trained network, predict the output for a new input
  • 28. Image courtesy: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Standard NNs o Take in all the inputs at once o Can’t capture sequential dependencies between input data • Recurrent Neural Networks • Great for data that is in a sequence form: Text, Videos etc. • Example tasks: Language modeling (predict the next word in a sentence), language generation, sentiment analysis, video scene labeling etc.
  • 29. Image courtesy: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ • Standard RNNs o Hard to capture long-term dependencies o Perform worse on longer sequences • Modifications to handle long-term dependencies better: o Long Short Term Memory (LSTMs) o Gated Recurrent Units (GRUs) • Better than vanilla RNNs for most tasks
  • 30. Image courtesy: https://smerity.com/articles/2016/google_nmt_arch.html • Takes a sequence as input, predicts a sequence as output. E.g. machine translation • Also known as the encoder-decoder model • Ideal when input and output sequences can be of different lengths • Base case: Input sequence -> s -> output sequence • Example tasks: Machine translation, speech recognition, sentence correction etc.
  • 31. • Base sequence-to-sequence model: Hard to capture longer context • Attention mechanism: When predicting a particular output, tells you which part of the input to focus on • Works really well when the output sequence has a strong 1:1 mapping with the input sequence • Better than sequence models without attention for most tasks Image courtesy: https://smerity.com/articles/2016/google_nmt_arch.html
  • 32. • Character-level RNNs • Bidirectional RNNs o Captures dependencies in both directions • Beam search decoding (vs. greedy decoding)
  • 33. ● Final question correction model: ○ Multi-level, sequence-to-sequence, character-level GRU with attention ● Tried solving the subproblems individually, but didn’t work as well
  • 34. ● Training ○ Training data: Pairs of [bad question, corrected question] ○ Training data size: O(100,000) examples ○ Tensorflow, on a single box with GPUs ○ Training time: 2-3 hours ● Serving: ○ Tensorflow, GPU-based serving ○ Latency: <500 ms p99 ● Run on new questions added to Quora
  • 35.
  • 36.
  • 37. • Goal: Given a question, come up with topics that describe it • Traditional topic labeling: Lots of text, few topics • Question-topic labeling: Less text, huge topic space • Features: o Question text o Relation to other questions o Who asked the question o etc.
  • 38. • Goal: Single canonical question per intent • Duplicate questions: o Make it harder for readers to seek knowledge o Make it harder for writers to find questions to answer • Semantic question matching. Not simply a syntactic search problem.
  • 39.
  • 40. ● BNBR = Be Nice, Be Respectful policy ● Binary classifier: Checks for BNBR violations on questions, answers, comments. ● Training data: ○ Positive: Confirmed BNBR violations ○ Negative: False BNBR reports, other good content ● Model: NN with 1 hidden layer (fastText)
  • 41. • Goal: Given a question and n answers, come up with the ideal ranking • What makes a good answer? o Truthful o Reusable o Well formatted o Clear and easy to read o ...
  • 42. • Features o Answer features: Quality, Formatting etc. o Interaction features (upvotes/downvotes, clicks, comments…) o Network features: Who interacted with the answer? o User features: Credibility, Expertise o etc.
  • 43. ● Machine Learning systems form an important part of what drives Quora ● Lots of interesting Machine Learning problems and solutions all along the question lifecycle ● Machine Learning helps us make Quora more personalized and relevant to you at scale
  • 44.
  • 45. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ quora-ml