This document discusses natural language processing (NLP) and how it can be used for tasks like searching, information extraction, and speech recognition. It explains how traditional searching works by matching keywords versus modern NLP techniques like vector embeddings that represent words as vectors in a multi-dimensional space. Vector embeddings allow determining semantic similarity between words and can be used for applications like speech recognition. The document also discusses how GPUs can accelerate NLP tasks by parallelizing computations and presents Wavecrafters' solution for providing GPU-accelerated NLP capabilities.
4. Powered by WAVECRAFTERS
Natural Language Processing
Computational techniques used for analysing and representing
text for the purpose of achieving human-like language
processing.
5. Powered by WAVECRAFTERS
Uses
• Searching
• Information Extraction
• Summarization
• Question Answering
• Customer Interaction
• Sentiment Analysis
• Speech to Text
17. Powered by WAVECRAFTERS
How do Vector Embeddings work? (III)
Training
• Different training algorithms: GloVe (Socher, Standford University), Word2Vec
(Google), Doc2Vec (Mikolov, Facebook).
• We will be releasing shortly our own GPU based version of GloVe as open-source.
18. Powered by WAVECRAFTERS
How do Vector Embeddings work? (IV)
• Vectors cosines give us the semantic closeness.
Ball
Mars
Ball
Football
But we can also do much
more! Adding, subtracting…
20. Powered by WAVECRAFTERS
Why aren’t Vector Embeddings
widespread?
• Steep Learning curve. Math can be
complicated.
• Lots of computational power
needed. Slow and expensive.
23. Powered by WAVECRAFTERS
Advantages of GPUs (II)
115ms
11632ms
0
2000
4000
6000
8000
10000
12000
14000
Semantic closeness to 10.000.000 documents. Lower is better!
GPU Execution Time CPU Execution Time
32. Powered by WAVECRAFTERS
Speech To Text (II). From Phonemes to
Words
A
Ball
Ballet
Bull
Market
Mars
Marsh
0.15
0.12
0.10
0.58
0.24
0.13
0.05
0.03
0.07
0.56
33. Powered by WAVECRAFTERS
Speech To Text (III). From Phonemes to
Words
the
co
te
a
me
ma
lo
Dictionary of probabilities
A ball market is a good chance
for investors
36. Powered by WAVECRAFTERS
Speech to Text (VI). Improving the Error
Rate
• Do several rounds of processing.
• In each one, use NLP to find out the theme of the
conversation, then produce a new Language Model (Dictionary)
that fits the theme.
• Reprocess the input
• Costly and slow!
37.
38. Powered by WAVECRAFTERS
Q&A
Vicente Cuéllar, CEO
vicente.cuellar@wavecrafters.com
647523358
Guillermo Moliní, CTO
guillermo.molini@wavecrafters.com
610477054