SlideShare une entreprise Scribd logo
1  sur  58
NLP - State of the Art
(and some missing parts…)
Liad Magen
Wien, am 27.03.2019
Hello, World!
Israel
computer science
biomedical engineering
NLP & AI Researcher @ enliteAI
Keep-Current Project:
- Machine Learning Seminar
- fast.ai + PyTorch course
data4good
EnliteAI Company Presentation | office@enlite.ai | 20190223 | © EnliteAI 2019
Services by enliteAI
AI Strategy &
Transformation
AI Lab
Prototyping &
Project Delivery
▪ Conduct AI readiness
check across organization
▪ Discover Use-cases to
build-up momentum
▪ Define AI Strategy
▪ Leverage AI Research for
practical applications
▪ Extend research to
production-grade
▪ Build models for commercial
use
▪ Development of POCs, MVPs
and prototypes
▪ Implementation &
Integration
▪ Roll-out & Support
3
data4good Hackathon
Applying data science to human needs and generating
social impact:
● GrünStattGrau
● Hilfswerk International
● Hilfswerk Österreich
● CivesSolutions
April 27-28th, 2019
A1 Telekom, Lassallestraße 9
What is NLP?
Enabling computers to interact with users in their language
NLP is everywhere
Applications:
- Language understanding
- Search engines
- Machine translation
- Text analysis (i.e. sentiment)
- Chatbots / Personal Assistance
NLP Building blocks
- Chunking
- Sequence Tagging
- Part of speech
- Named Entity Recognition (NER)
- Coreference Resolution
- ...
- Syntactic Parsing
- And more...
NLP in a nutshell
Human Language
Non-trivial useful output
takes as input text in human language and process it
in a way that suggests an intelligent process was involved
NLP is hard!
I’m eating pizza with olives
I’m eating pizza with friends
NLP is hard!
Google translate:
State of the art Der letzte Stand der
Technik
State of the art “Country of art”
German
Hebrew
Variability
AmbiguityHe acquired it
He purchased it
He bought it
It was bought by him
It was sold to him
She sold it to him
She sold him that
Twitter:
I loooooove it!
everytime → every time
oscar nom’d doc → Oscar-nominated documentary
is bananas → is great
- by Wei Xu
Bank
Apple
Star
Spring
Play
I’m reading a Book
I will Book the flight
Ambiguity - Real news headlines
● Scientists examine whales from space
● The pope's baby step on gays
● Girl hit by car in hospital
● Iraqi head seeks arms
● Lung cancer in women mushrooms
● Squad helps dog bite victim
● Miners refuse to work after death
● Stolen painting found by tree
● Actor sent to jail for not finishing sentence
● Stadium air conditioning fails — Fans protest
What do you mean no?
Martin is a data scientist
Martin is a great data scientist
What do you mean no?
Martin is a data scientist
Martin is not a data scientist
Martin is a great data scientist
Martin is not a great data scientist
What do you mean no?
Martin is a data scientist
Martin is not a data scientist
Martin is a great data scientist
Martin is not a great data scientist
Martin is not a data scientist
Martin is not a mad scientist
Negation + Monotonicity
John did not work for Microsoft
in 1989
as a VP
of sales
Restrictivity / Intersectivity
He is a great pianist → He is a pianist
He is a fake doctor → He is a doctor
I found my old jacket → I found my jacket
How do we do NLP?
Rule based
Systems
1950s-1990s Corpus based
statistics
1990s-2000s Machine
Learning
2000s-2014 Deep Learning
2014-2020(?)
LinguisticsExperience
Machine Learning Expertise
Transparent Debugability
Black box
How should we do NLP?
Rule based
Systems
1950s-1990s Corpus based
statistics
1990s-2000s Machine
Learning
2000s-2014 Deep Learning
2014-2020(?)
LinguisticsExperience
Machine Learning Expertise
Transparent Debugability
Black box
NLP Tomorrow
Rule based
Systems
1950s-1990s Corpus based
statistics
1990s-2000s Machine
Learning
2000s-2014 Deep Learning
2014-2020(?)
LinguisticsExperience
Machine Learning Expertise
Transparent Debugability
Black box
Humans writing rules
aided by ML/DL -
Resulting in
transparent and
debuggable models
NLP Today
- Chris Manning (Stanford) @ Simon’s institute - 27.3.2017
BiLSTM hagmoney ( > 2015)
Works on low level tasks:
● Coordination boundary prediction:
He will attend the meeting and present the results on Tuesday
He will attend the meeting on Tuesday
He will present the results on Tuesday
● Syntactic Parsing
● And more...
BiLSTM hagmoney ( > 2015)
But less on higher levels:
BiLSTM text generation with style:
“The film’s ultimate pleasure if you want to fall in love with the ending, you won’t be disappointed”
“The film’s simple, and a refreshing take on the complex family drama of the regions of human
intelligence.”
https://arxiv.org/pdf/1707.02633.pdf - Controlling Linguistic Style Aspects in Neural Language Generation
BiLSTM hagmoney ( > 2015)
But less on higher levels:
BiLSTM text generation with style:
“There are some funny parts but overall I didn’t like the first few funny parts, but overall pretty decent .”
“Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
https://arxiv.org/pdf/1707.02633.pdf - Controlling Linguistic Style Aspects in Neural Language Generation
NLP in the academy
Multiple stacked networks
Each architecture solves a specific problem
(or a dataset...)
NGrams, TF/IDF, LDA / Topic modeling
Word2vec / GloVe / FastText
ELMo (AI2) / BERT / ERNIE (Baidu)
GPT-2 (OpenAI)
LSTM
spaCy
Flair
& Regular Expressions...
NLP in the industry
20 years of NLP research
Great progress on “low-level” text understanding
tasks:
● Morphology
● Syntax / Parsing
● Semantic Roles.
● Lexical Semantics
● Discourse. Coreference.
● Negation scope
Representation learning, text-to-text semantics,
RNNs
● Word vectors
● Concept vectors
● Sentence vectors
● Paraphrasing
20 years of NLP research
Great progress on “low-level” text understanding
tasks:
● Morphology
● Syntax / Parsing
● Semantic Roles.
● Lexical Semantics
● Discourse. Coreference.
● Negation scope
Representation learning, text-to-text semantics,
RNNs
● Word vectors
● Concept vectors
● Sentence vectors
● Paraphrasing
Remarkable Achievements
Not useful for non-experts
Automatically learn intricate
linguistic patterns
Bad with nuance.
Contains biases.
Hard to control.
Short recup of the last few years
2013
Language model
Learn joint likelihood of training sentences
Word2Vec
… GloVe … FastText …
First use of transfer learning in NLP
2018
ELMo - Contextual word vectors
ULMFiT - transfer learning + fine tuning
BERT - the above & more, on large scale
2019
Transformer-XL
GPT-2
BERT - architecture in brief
Embedding:
BERT - architecture in brief
● Embedding + Masking (Cloze)
● Transformer
● Sentence pairing
● Fine tuning
○ Sentence class Label
○ NER Sequence Labels
● Question Answering:
○ Start+End Vector &
Softmax over all positions
(as additional parameters)
○ Or Running Q + A (for each option)
and training on the class label
Further reading
● Toward data science:
○ Dissecting BERT (parts 1 & 2) - by Miguel Romero Calvo
○ Deconstructing BERT (parts 1 & 2) - by Jesse Vig
● HarvardNLP - The Annotated Transformer
● Jay Alammar - The illustrated BERT, ELMo and co.
● NLP.Stanford.edu/Seminar - Jacob Delvin presentation about BERT
Size matters
● ULMFiT
● ELMo
● GLoMo
● GPT
● BERT
● Transformer-XL
● MT-DNN
● GPT-2
● Contextual representations
● Generalized Language models
● Transfer learning in NLP
Text Generation
Dissecting GPT-2 results
HavardNLP
Giant Language model Test Room
http://gltr.io
NLP Progress - sample domains
Cross lingual Text Classification: pyTorch - LASER
Text Classification (English): ULMFiT
Automatic Speech Recognition (ASR): Google Tensorflow Lingvo
Translation: DeepL / Transformer Big (Facebook + Google)
NER: flair / BERT
Entity linking: DeepType - OpenAI
nlpprogress.com | github.com/syhw/wer_are_we
Q&A Datasets & Benchmarks
General Language Understanding Evaluation (GLUE)
Stanford Natural Language Inference (SNLI)
SQuAD / SQuAD 2.0
RACE (Carnegie Mellon)
HotpotQA, MS Marco, NarrativeQA, CoQA ...
Google AI natural questions dataset
Question Answering - State of the art
Question Answering - State of the art?
Depends on the dataset…
Each information extraction (IE) task is a different NLP application
(Also relevant for chatbots / personal assistants)
Information Extraction
Turning text into structured data
● Most data in the world is in text
● We can only easily analyze structured data
● Today - no easy way to do it on scale
● Possible for specific tasks - but requires
○ Enough data
○ Team of NLP experts
○ Time
Solving IE can be transformative to
industry & science
Question Answering - next steps
In which state was Yahoo founded?
http://ai.stanford.edu/blog/beyond-local-pattern-matching/
Question Answering - next steps
● (Cleverly) add more inductive biases
● Common sense
Possible methods:
● Interactive learning (human in the loop)
● Deep reinforcement learning
● GANs
Still missing...
1. Understanding (Not only NLP-related…)
○ What is captured by a network?
○ What can/’t the networks learn
2. Explainability
3. Fairness & accountability
○ Removing bias (current techniques only “put a lipstick on a pig”)
4. Working with little annotated data
Transferring knowledge across domains
5. Handling missing data
Thank you!
We’re “standing on the shoulders of giants”
- Isaac Newton
Appendix 1 - handling missing data
Handling missing data
- Mary is a great programmer but John isn’t
- The hotel received great reviews at Yelp but mediocre reviews at
TripAdvisor
- Fred took a picture of you and Susan of me
Handling missing data
- Mary is a great programmer but John isn’t _____ (verb phrase ellipsis)
- The hotel received great reviews at Yelp but _____ mediocre reviews at
TripAdvisor (argument clusters)
- Fred took a picture of you and Susan ______ of me (gapping)
(Briding)
Prices of oil
I turn 40 this year
He drove 40 on the highway
I’ll give you 40 for that
I turn 40 (age) this year
He drove 40 (kmh) on the highway
I’ll give you 40 (currency) for that
Handling missing data
● Current end-to-end systems fails on them
● Stanford parser can partially handle several cases
● Work in progress in BIU
Appendix 2
Transferring knowledge across domains
Example - review sentiment analysis
“The soup was delicious” → +
“The steak was tough” → -
“The dessert was decadent” → +
[food] was [good_food_adj]
[food] was [bad_food_adj]
Example - review sentiment analysis
Real world:
the soup that the server recommended was amazing.
the soup, which I expected to be good, was actually pretty bad
the soup, which was served cold and was looking terrible, was surprisingly great.
Example - review sentiment analysis
Syntax to the rescue!
Capture this:
With a single declarative pattern:
Can help to generalize and transfer knowledge across domains
Example - review sentiment analysis
Until it doesn’t:
I didn’t think the food was great:
Would combining rules with deep learning be the solution?
Thank you again!
Human: What do we want?
Computer: Natural-language processing!
Human: When do we want it?
Computer: When do we want what?
Dynamic Word Embeddings
Dynamic Word Embeddings for Evolving Semantic Discovery
https://arxiv.org/pdf/1703.00607.pdf

Contenu connexe

Tendances

Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Modelssaurav singla
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERTATPowr
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopTom Bocklisch
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
transformer解説~Chat-GPTの源流~
transformer解説~Chat-GPTの源流~transformer解説~Chat-GPTの源流~
transformer解説~Chat-GPTの源流~MasayoshiTsutsui
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 

Tendances (20)

Bert
BertBert
Bert
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Comparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP ModelsComparative Analysis of Transformer Based Pre-Trained NLP Models
Comparative Analysis of Transformer Based Pre-Trained NLP Models
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERT
 
BERT (v3).pptx
BERT (v3).pptxBERT (v3).pptx
BERT (v3).pptx
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
BERT
BERTBERT
BERT
 
Conversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData WorkshopConversational AI with Rasa - PyData Workshop
Conversational AI with Rasa - PyData Workshop
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
NLP
NLPNLP
NLP
 
transformer解説~Chat-GPTの源流~
transformer解説~Chat-GPTの源流~transformer解説~Chat-GPTの源流~
transformer解説~Chat-GPTの源流~
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 

Similaire à State of the art in Natural Language Processing (March 2019)

Continual Learning: why, how, and when
Continual Learning: why, how, and whenContinual Learning: why, how, and when
Continual Learning: why, how, and whenGabriele Graffieti
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfOrtus Solutions, Corp
 
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...IMPACT Centre of Competence
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...AI Frontiers
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learningTheodoros Vasiloudis
 
Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalJeffrey Shomaker
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationJulián Urbano
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabsChetan Khatri
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesZoltan Varju
 
AI and ChatGPT in Online Education
AI and ChatGPT in Online Education AI and ChatGPT in Online Education
AI and ChatGPT in Online Education D2L Barry
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?Mustafa Ekim
 
DataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdfDataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdfJedha Bootcamp
 
The Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David LowThe Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David LowDatabricks
 
GDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning WorkshopGDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning Workshopssuser540861
 

Similaire à State of the art in Natural Language Processing (March 2019) (20)

Continual Learning: why, how, and when
Continual Learning: why, how, and whenContinual Learning: why, how, and when
Continual Learning: why, how, and when
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
 
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
Nlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_finalNlp 2020 global ai conf -jeff_shomaker_final
Nlp 2020 global ai conf -jeff_shomaker_final
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
AI and ChatGPT in Online Education
AI and ChatGPT in Online Education AI and ChatGPT in Online Education
AI and ChatGPT in Online Education
 
Machine learning 101 Talk at Freshworks
Machine learning 101 Talk at FreshworksMachine learning 101 Talk at Freshworks
Machine learning 101 Talk at Freshworks
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?
Yenikod Yazılım Kursu - Kodlama Öğrenebilir Miyim? Kodlama Bana Göre Mi?
 
DataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdfDataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdf
 
The Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David LowThe Rise Of Conversational AI with David Low
The Rise Of Conversational AI with David Low
 
GDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning WorkshopGDSC Introduction to Deep Learning Workshop
GDSC Introduction to Deep Learning Workshop
 

Dernier

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 

Dernier (20)

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 

State of the art in Natural Language Processing (March 2019)

  • 1. NLP - State of the Art (and some missing parts…) Liad Magen Wien, am 27.03.2019
  • 2. Hello, World! Israel computer science biomedical engineering NLP & AI Researcher @ enliteAI Keep-Current Project: - Machine Learning Seminar - fast.ai + PyTorch course data4good
  • 3. EnliteAI Company Presentation | office@enlite.ai | 20190223 | © EnliteAI 2019 Services by enliteAI AI Strategy & Transformation AI Lab Prototyping & Project Delivery ▪ Conduct AI readiness check across organization ▪ Discover Use-cases to build-up momentum ▪ Define AI Strategy ▪ Leverage AI Research for practical applications ▪ Extend research to production-grade ▪ Build models for commercial use ▪ Development of POCs, MVPs and prototypes ▪ Implementation & Integration ▪ Roll-out & Support 3
  • 4. data4good Hackathon Applying data science to human needs and generating social impact: ● GrünStattGrau ● Hilfswerk International ● Hilfswerk Österreich ● CivesSolutions April 27-28th, 2019 A1 Telekom, Lassallestraße 9
  • 5. What is NLP? Enabling computers to interact with users in their language
  • 6. NLP is everywhere Applications: - Language understanding - Search engines - Machine translation - Text analysis (i.e. sentiment) - Chatbots / Personal Assistance
  • 7. NLP Building blocks - Chunking - Sequence Tagging - Part of speech - Named Entity Recognition (NER) - Coreference Resolution - ... - Syntactic Parsing - And more...
  • 8. NLP in a nutshell Human Language Non-trivial useful output takes as input text in human language and process it in a way that suggests an intelligent process was involved
  • 9. NLP is hard! I’m eating pizza with olives I’m eating pizza with friends
  • 10. NLP is hard! Google translate: State of the art Der letzte Stand der Technik State of the art “Country of art” German Hebrew
  • 11. Variability AmbiguityHe acquired it He purchased it He bought it It was bought by him It was sold to him She sold it to him She sold him that Twitter: I loooooove it! everytime → every time oscar nom’d doc → Oscar-nominated documentary is bananas → is great - by Wei Xu Bank Apple Star Spring Play I’m reading a Book I will Book the flight
  • 12. Ambiguity - Real news headlines ● Scientists examine whales from space ● The pope's baby step on gays ● Girl hit by car in hospital ● Iraqi head seeks arms ● Lung cancer in women mushrooms ● Squad helps dog bite victim ● Miners refuse to work after death ● Stolen painting found by tree ● Actor sent to jail for not finishing sentence ● Stadium air conditioning fails — Fans protest
  • 13. What do you mean no? Martin is a data scientist Martin is a great data scientist
  • 14. What do you mean no? Martin is a data scientist Martin is not a data scientist Martin is a great data scientist Martin is not a great data scientist
  • 15. What do you mean no? Martin is a data scientist Martin is not a data scientist Martin is a great data scientist Martin is not a great data scientist Martin is not a data scientist Martin is not a mad scientist
  • 16. Negation + Monotonicity John did not work for Microsoft in 1989 as a VP of sales
  • 17. Restrictivity / Intersectivity He is a great pianist → He is a pianist He is a fake doctor → He is a doctor I found my old jacket → I found my jacket
  • 18. How do we do NLP? Rule based Systems 1950s-1990s Corpus based statistics 1990s-2000s Machine Learning 2000s-2014 Deep Learning 2014-2020(?) LinguisticsExperience Machine Learning Expertise Transparent Debugability Black box
  • 19. How should we do NLP? Rule based Systems 1950s-1990s Corpus based statistics 1990s-2000s Machine Learning 2000s-2014 Deep Learning 2014-2020(?) LinguisticsExperience Machine Learning Expertise Transparent Debugability Black box
  • 20. NLP Tomorrow Rule based Systems 1950s-1990s Corpus based statistics 1990s-2000s Machine Learning 2000s-2014 Deep Learning 2014-2020(?) LinguisticsExperience Machine Learning Expertise Transparent Debugability Black box Humans writing rules aided by ML/DL - Resulting in transparent and debuggable models
  • 21. NLP Today - Chris Manning (Stanford) @ Simon’s institute - 27.3.2017
  • 22. BiLSTM hagmoney ( > 2015) Works on low level tasks: ● Coordination boundary prediction: He will attend the meeting and present the results on Tuesday He will attend the meeting on Tuesday He will present the results on Tuesday ● Syntactic Parsing ● And more...
  • 23. BiLSTM hagmoney ( > 2015) But less on higher levels: BiLSTM text generation with style: “The film’s ultimate pleasure if you want to fall in love with the ending, you won’t be disappointed” “The film’s simple, and a refreshing take on the complex family drama of the regions of human intelligence.” https://arxiv.org/pdf/1707.02633.pdf - Controlling Linguistic Style Aspects in Neural Language Generation
  • 24. BiLSTM hagmoney ( > 2015) But less on higher levels: BiLSTM text generation with style: “There are some funny parts but overall I didn’t like the first few funny parts, but overall pretty decent .” “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” https://arxiv.org/pdf/1707.02633.pdf - Controlling Linguistic Style Aspects in Neural Language Generation
  • 25. NLP in the academy Multiple stacked networks Each architecture solves a specific problem (or a dataset...) NGrams, TF/IDF, LDA / Topic modeling Word2vec / GloVe / FastText ELMo (AI2) / BERT / ERNIE (Baidu) GPT-2 (OpenAI) LSTM spaCy Flair & Regular Expressions... NLP in the industry
  • 26. 20 years of NLP research Great progress on “low-level” text understanding tasks: ● Morphology ● Syntax / Parsing ● Semantic Roles. ● Lexical Semantics ● Discourse. Coreference. ● Negation scope Representation learning, text-to-text semantics, RNNs ● Word vectors ● Concept vectors ● Sentence vectors ● Paraphrasing
  • 27. 20 years of NLP research Great progress on “low-level” text understanding tasks: ● Morphology ● Syntax / Parsing ● Semantic Roles. ● Lexical Semantics ● Discourse. Coreference. ● Negation scope Representation learning, text-to-text semantics, RNNs ● Word vectors ● Concept vectors ● Sentence vectors ● Paraphrasing Remarkable Achievements Not useful for non-experts Automatically learn intricate linguistic patterns Bad with nuance. Contains biases. Hard to control.
  • 28. Short recup of the last few years
  • 29. 2013 Language model Learn joint likelihood of training sentences Word2Vec … GloVe … FastText … First use of transfer learning in NLP
  • 30. 2018 ELMo - Contextual word vectors ULMFiT - transfer learning + fine tuning BERT - the above & more, on large scale 2019 Transformer-XL GPT-2
  • 31. BERT - architecture in brief Embedding:
  • 32. BERT - architecture in brief ● Embedding + Masking (Cloze) ● Transformer ● Sentence pairing ● Fine tuning ○ Sentence class Label ○ NER Sequence Labels ● Question Answering: ○ Start+End Vector & Softmax over all positions (as additional parameters) ○ Or Running Q + A (for each option) and training on the class label
  • 33. Further reading ● Toward data science: ○ Dissecting BERT (parts 1 & 2) - by Miguel Romero Calvo ○ Deconstructing BERT (parts 1 & 2) - by Jesse Vig ● HarvardNLP - The Annotated Transformer ● Jay Alammar - The illustrated BERT, ELMo and co. ● NLP.Stanford.edu/Seminar - Jacob Delvin presentation about BERT
  • 34. Size matters ● ULMFiT ● ELMo ● GLoMo ● GPT ● BERT ● Transformer-XL ● MT-DNN ● GPT-2 ● Contextual representations ● Generalized Language models ● Transfer learning in NLP
  • 35. Text Generation Dissecting GPT-2 results HavardNLP Giant Language model Test Room http://gltr.io
  • 36. NLP Progress - sample domains Cross lingual Text Classification: pyTorch - LASER Text Classification (English): ULMFiT Automatic Speech Recognition (ASR): Google Tensorflow Lingvo Translation: DeepL / Transformer Big (Facebook + Google) NER: flair / BERT Entity linking: DeepType - OpenAI nlpprogress.com | github.com/syhw/wer_are_we
  • 37. Q&A Datasets & Benchmarks General Language Understanding Evaluation (GLUE) Stanford Natural Language Inference (SNLI) SQuAD / SQuAD 2.0 RACE (Carnegie Mellon) HotpotQA, MS Marco, NarrativeQA, CoQA ... Google AI natural questions dataset
  • 38. Question Answering - State of the art
  • 39. Question Answering - State of the art? Depends on the dataset… Each information extraction (IE) task is a different NLP application (Also relevant for chatbots / personal assistants)
  • 40. Information Extraction Turning text into structured data ● Most data in the world is in text ● We can only easily analyze structured data ● Today - no easy way to do it on scale ● Possible for specific tasks - but requires ○ Enough data ○ Team of NLP experts ○ Time
  • 41. Solving IE can be transformative to industry & science
  • 42. Question Answering - next steps In which state was Yahoo founded? http://ai.stanford.edu/blog/beyond-local-pattern-matching/
  • 43. Question Answering - next steps ● (Cleverly) add more inductive biases ● Common sense Possible methods: ● Interactive learning (human in the loop) ● Deep reinforcement learning ● GANs
  • 44. Still missing... 1. Understanding (Not only NLP-related…) ○ What is captured by a network? ○ What can/’t the networks learn 2. Explainability 3. Fairness & accountability ○ Removing bias (current techniques only “put a lipstick on a pig”) 4. Working with little annotated data Transferring knowledge across domains 5. Handling missing data
  • 45. Thank you! We’re “standing on the shoulders of giants” - Isaac Newton
  • 46. Appendix 1 - handling missing data
  • 47. Handling missing data - Mary is a great programmer but John isn’t - The hotel received great reviews at Yelp but mediocre reviews at TripAdvisor - Fred took a picture of you and Susan of me
  • 48. Handling missing data - Mary is a great programmer but John isn’t _____ (verb phrase ellipsis) - The hotel received great reviews at Yelp but _____ mediocre reviews at TripAdvisor (argument clusters) - Fred took a picture of you and Susan ______ of me (gapping) (Briding) Prices of oil
  • 49. I turn 40 this year He drove 40 on the highway I’ll give you 40 for that
  • 50. I turn 40 (age) this year He drove 40 (kmh) on the highway I’ll give you 40 (currency) for that
  • 51. Handling missing data ● Current end-to-end systems fails on them ● Stanford parser can partially handle several cases ● Work in progress in BIU
  • 53. Example - review sentiment analysis “The soup was delicious” → + “The steak was tough” → - “The dessert was decadent” → + [food] was [good_food_adj] [food] was [bad_food_adj]
  • 54. Example - review sentiment analysis Real world: the soup that the server recommended was amazing. the soup, which I expected to be good, was actually pretty bad the soup, which was served cold and was looking terrible, was surprisingly great.
  • 55. Example - review sentiment analysis Syntax to the rescue! Capture this: With a single declarative pattern: Can help to generalize and transfer knowledge across domains
  • 56. Example - review sentiment analysis Until it doesn’t: I didn’t think the food was great: Would combining rules with deep learning be the solution?
  • 57. Thank you again! Human: What do we want? Computer: Natural-language processing! Human: When do we want it? Computer: When do we want what?
  • 58. Dynamic Word Embeddings Dynamic Word Embeddings for Evolving Semantic Discovery https://arxiv.org/pdf/1703.00607.pdf