The document presents a proposal to develop a model to detect duplicate questions on Quora using semantic analysis and deep learning techniques. It discusses using the Universal Sentence Encoder (USE), which relies on transformer and deep averaging network (DAN) models, to encode sentences and identify semantically similar question pairs. The proposed system involves preprocessing text, encoding sentences into vectors with USE, and applying DAN and transformer neural networks to classify pairs as duplicates or not duplicates. Experimental results show that deep learning models using sentence embeddings outperform baseline models, with the transformer model achieving slightly higher accuracy than the DAN model.
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Quora Duplicate Question Detection Using Semantic Analysis
1. Guided by- Ms. Safa Hamdare
Group Members
.
Quora Duplicate Question Pair
Detection Using Semantic Analysis
Name Roll No.
Jai Mulye 64
Anshul Pawaskar 87
Tannmay Redij 88
Akshata Talankar 89
St. Francis Institute of Technology
Department of Computer Engineering
Quora Duplicate Question Pair Detection using Semantic Analysis
1 28/05/2021
2. Content
● Introduction
● Literature
● Problem Statement
● Proposed Solution
● Work Flow of the system
● Algorithm with Implementation details
● Experimental Set Up
● Data Set
● Performance Evaluation Parameters
● Validation with Test Cases
● Results & Discussion
● Conclusion
● References
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 2
3. Introduction
• What is Quora?
28/05/2021 3
Quora Duplicate Question Pair Detection using Semantic Analysis
4. Current Scenario:
Quora uses Random Forest technique to identify duplicate
questions.
Let’s look at two hypothetical questions:
1. Is it true that time flies like an arrow?
2. Do fruit flies like a banana?
There are two common words in these questions, flies and
like.
4
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 4
6. Literature
• The paper[1] explores the Transformer based
Universal Sentence Encoder which relies on
attention mechanism.
• The paper[2] introduces Deep Averaging Network
which performs well with neural networks that model
semantic and syntactic compositionality.
6
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis
7. Literature
• The paper cited [3] explores the two variants of
Universal Sentence Encoder- the transformer and
the deep averaging network (DAN).
• The paper cited [4] analyses several neural network
designs and their variations for sentence pair
modelling and compare their performance
extensively across eight datasets, including
paraphrase identification, semantic textual similarity,
natural language inference, and question answering
tasks.
7
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis
8. Problem Statement
• On Quora, there may be people who might ask same
questions differently from an existing question. Solving
this problem will help to reduce the redundancy on the
platform and the manual task of identifying the questions
to match the correct answer for same. The task to identify
which questions asked on Quora are duplicates of
questions that have already been asked could be useful to
instantly provide answers of existing questions.
• A model created which can predict if the questions
entered are similar in meaning based on deep learning
approach using DAN & Transformer model.
28/05/2021 8
Quora Duplicate Question Pair Detection using Semantic Analysis
9. Proposed Solution
1. Pre Processing 3. Deep Learning Approach
(DAN & Transformer)
2. Sentence to Vector
Conversion (USE)
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 9
Fig 1: Workflow of the System
10. Work Flow of the system
28/05/2021 10
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 2: Architecture Diagram
14. Experimental Setup
28/05/2021 14
Fig 6: Model accuracy of
Transformer
Fig 7: Model loss of
Transformer
Quora Duplicate Question Pair Detection using Semantic Analysis
15. Experimental Setup
28/05/2021 15
Fig 8: Model accuracy of DAN
Fig 9: Model loss of DAN
Quora Duplicate Question Pair Detection using Semantic Analysis
16. Validation with Test cases
28/05/2021 16
Quora Duplicate Question Pair Detection using Semantic Analysis
20. Results and Discussions
28/05/2021 20
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 13: Results by Transformer Model
21. Conclusion
28/05/2021 21
Quora Duplicate Question Pair Detection using Semantic Analysis
Model Embedding technique
F1-score
weighted average
F1- Score macro
average
Logistic
Regression
Word2Vec, Similarity
scores
0.66 0.62
Random Forest
Word2Vec, Similarity
scores
0.70 0.69
Table 1:Accuracy of machine learning models
22. Conclusion
28/05/2021 22
Quora Duplicate Question Pair Detection using Semantic Analysis
Table 2:Accuracy of Deep learning models (DAN & Transformer)
Model
Embedding
technique
Epochs
Training
accuracy (%)
Validation
accuracy (%)
Neural
Network
Universal Sentence
Encoder (DAN)
20 88.63 86
Neural
Network
Universal Sentence
Encoder
(Transformer)
20 89.16 85
23. Conclusion
• Deep learning models using sentence level
embedding outperform the basic classification
model.
• DAN Model sometimes under performs with the
questions having double negation.
• Transformer based Universal Sentence Encoder can
be used.
28/05/2021 23
Quora Duplicate Question Pair Detection using Semantic Analysis
24. References
[1] Mueller J, Thyagarajan A. Siamese recurrent architectures for learning
sentence similarity. In: Proceedings of the thirtieth AAAI conference on artificial
intelligence. (2016)
[2] Eneko Agirre, Aitor Gonzalez-Agirre, Inigo Lopez-Gazpio, Montse Maritxalar,
German Rigau, and Larraitz Uria. Semeval-2016 task 2: Interpretable semantic
textual similarity. In: Proceedings of the 10th International Workshop on Semantic
Evaluation (2016).
[3] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.
Advances in neural information processing systems, pp. 5998-6008. 2017. (2017)
[4] Cer D, Yang Y, Kong S-Y, et al. Universal Sentence Encoder for English. In:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations. doi: 10.18653/v1/d18-2029 (2018)
[5] https://www.kaggle.com/c/quora-question-pairs/data
28/05/2021 24
Quora Duplicate Question Pair Detection using Semantic Analysis