SlideShare une entreprise Scribd logo
1  sur  105
Télécharger pour lire hors ligne
Weakly Supervised
Natural Language Understanding
Ni Lao
mosaix.ai
11.11.2018
Collaborators: Liang Chen, Fan Yang, Jonathan Berant, Mohammad Norouzi, Quoc Le, William Cohen, Kenneth ForbusFor completeness a large part of the tutorial is from previous works.
Thanks to Chen Liang for helping with creating these slides
Speaker Background
● BS from Tsinghua U. EE (1999 - 2003) Worked on the TsinghuAeolus system, and
won world champion in RoboCup simulation league in 2001 and 2002
● MS from Tsinghua U. CS (2003 - 2006) Worked at Microsoft Research Asia on
automatic OS diagnosis, Web search and product search
● PhD from Carnegie Mellon U. (2006 - 2012) Researched on IR, ML, NLP. Worked
on the CMU JAVELIN QA system, and Never-Ending Learning (NELL) system
● Research Scientist Google (2012 - 2017) Researched on KG construction,
semantic parsing. Worked on KG and Web-based QA products
● Chief Scientist & Co-founder Mosaix.ai (2018 - ) Research on semantic parsing
and text understanding for search and NLU services. And we are hiring!
2
Plan
● Weak Supervision NLP
○ NLP, AI, software 2.0
○ Semantics as a foreign language
○ Unsupervised learning
○ Knowledge representation (symbolism)
● Semantic Parsing Tasks
○ WebQuestionsSP, WikiTableQuestions
● Neural Symbolic Machines (ACL 2017)
○ Compositionality (short term memory)
○ Scalable KB inference (symbolism)
○ RL vs MLE
● Memory Augmented Policy Optimization (NIPS 2018)
○ Experience replay (long term memory & optimal updating strategy)
○ Systematic exploration
○ Memory Weight Clipping (unbiased cold start strategy)
Access slides and join discussions at
weakly-supervised-nlu google group
Mobile
Desktop
Natural Language Processing (NLP)
● Enables machines to understand and assists human
● A major problem of AI (AI-complete)
● Leads to new theories in cognitive science
There can be two underlying motivations for building a
computational theory. The technological goal is simply to
build better computers, and any solution that works would be
acceptable. The cognitive goal is to build a computational
analog of the human-language-processing mechanism; such
a theory would be acceptable only after it had been verified
by experiment. -- James Allen,1987
4
Adoption Of Voice Technology
● Google’s Speech Internationalization
Project: From 1 to 300 Languages and
Beyond [Pedro J. Moreno, 2012]
● My daughter adopted YouTube voice
command since 2 years old
● 20% of the U.S. population has access to
smart speakers [Techcrunch, 2018]
● Rising adoption in the Asia Pacific
[GoVocal.AI 2018]
5
Language understanding for AI and humanity
If you got a billion dollars to
spend on a huge research
project that you get to lead,
what would you like to do?
-- r/CyberByte, 2015
NLP is fascinating, allowing us to focus on
highly-structured inference problems, on
issues that go to the core of "what is
thought" but remain eminently practical,
and on a technology that surely would
make the world a better place.
-- Michael I Jordan
What kind of impact you
hope deep learning has on
our future?
-- Steve Paikin, 2016
I hope it allows Google to ... search by the
content of the document rather than by
the words in the document ... I hope it will
make for intelligent personal systems,
who can answer questions in a sensible
way ... It will make computers much easier
to use. Because you’ll be able to just say
to your computer “print this damn thing”
-- Geoffrey Hinton
● Experts with different views of AI agree on the potential of NLP
6
What is understanding?
"If they find a parrot who could answer
to everything, I would claim it to be an
intelligent being without hesitation.",
-- Alan Turing, 1950
Does the machine literally "understand"
Chinese ? Or is it merely simulating the
ability to understand Chinese?
-- John Searle, 1980
The Chinese Room ArgumentThe Imitation Game
7
Full Supervision NLP
● Traditionally NLP is a
labor-intensive business
Applications
Discourse Processing
Semantic Parsing
Syntactic analysis
Morphological analysis
ASR OCR
speech text
Rules written by
1000 PhDs
NLP algorithms
simply lookup
8
Software 2.0
1. specify some goal on the behavior
○ e.g., “satisfy input output pairs of examples”,
○ e.g.,“win a game of Go”
2. write a rough skeleton of the code that
identifies a subset of program space to search
○ e.g. a neural net architecture
3. use the computational resources at disposal to
search this space for a program that works.
Death of feature engineering. (The) users of the
software will (play) a direct role in building it. Data
labeling is a central component to system design.
[Karpathy 2017;
Watson 2017;
Ratner+ 2018]
9
Where does knowledge come from?
domain experts
(bottlenecks) expert systems with
knowledge bases
the world
● Can only cover the most popular semantics used by human
10
Weak Supervision NLP
● Avoid the knowledge acquisition bottleneck with machine learning
● Then we can cover all possible semantics used by human
machine learning
intelligent systems
with knowledge
end to end examples
(e.g., QA pairs)
11
Language & Reasoning
● The formalist view
○ Language was primarily invented for reasoning
[Everaert+ 2015]
● The functionalist view
○ Language is for communication [Kirby 2017]
● Cognitive coupling hypothesis
○ sequential processing is “necessary for behaviours
such as primate tool use, navigation, foraging and
social action.” [Kolodny & Edelman 2018]
12
Parent
Gender Gender
NameBart
Simpson
Reasoning is needed to understand text
Bart's father
is Homer
[Liang+ 2017]
HomerMarge
13
Semantics is a language for computation
“impressionist painters during the 1920s”
painters [/painting] !/art_forms
impressionist <visual_artist> x.[/associated_periods_or_movements = /impressionism]
<artist> during the 1920s x.[/date_of_work < 1930; /date_of_work > 1920] 14
Freebase, DBpedia, YAGO, NELL
Largest city in US?
GO
(Hop V1 CityIn)
(Argmax V2 Population)
RETURN
NYC
Semantics is a language for computation
Semantic
parsing
Computation
15
Semantics as a foreign language
1) Natural languages are programming
languages to control human behavior
(either others or self)
2) For machines and human to understand
each other, they just need translation
models trained with control theory
16
NLU with Unsupervised Learning
● Supervised learning needs either
feature engineering for compact
representation or large amount of
labeled data
Text
Intermediate
representation
labels
● Unsupervised learning produces
better representations and reduces
the labeling cost, and it is easily
transferable!
learning
Feature
engineering
Text
Intermediate
representation
labels
learning
learning
17
LeCun’s Cake
NIPS [LeCun 2016]
18
Pre-Trained Word Embeddings
● learning (non-contextual) word embeddings from text
○ matrix factorization on global word-word co-occurrence counts [1990]
○ local context window methods [2014] will come back to this in 2018
○ weighted least squares on global word-word co-occurrence counts [2014]
● Co-occurrence statistics as an efficient approximation to the original text
LSA [Deer-wester+ 1990]
skip-gram [Mikolov + 2013]
Glove [Pennington+ 2014]
19
Pre-Trained Word Embeddings
Compact representations for syntax, common sense and world knowledge
LSA [Deer-wester+ 1990]
skip-gram [Mikolov + 2013]
Glove [Pennington+ 2014]
Glove [Pennington+ 2014] 20
Pre-Trained Sequence Models
With a lot more computational power we have language modeling sequence models which
converts sequences of tokens to sequences of contextual embeddings
[Devlin+ 2018]
[Radford+ 2018]
[Peters+ 2018]
21
Pre-Trained Sequence Models
Differences in pre-training model architectures.
BERT uses a bidirectional Transformer. OpenAI GPTuses a left-to-right Transformer.
ELMo uses the concatenation of independently trained left-to-right and right-to-left LSTM.
[Devlin+ 2018]
[Radford+ 2018]
[Peters+ 2018]
22
Pre-Trained Sequence Models
2018 is the year of Pre-Trained Sequence Models
[Devlin+ 2018]
[Radford+ 2018]
[Peters+ 2018]
23
Knowledge Representation
& Scalability
a small machine which
copies the complexity of
the world to the brain
24
a suitable
representationthe world
Mandelbrot Set
the nature
of complex
numbers
25
z0
= 0
Internet as an external memory
● How information
should be organized
for scalability?
[Vannevar Bush 1945]
26
The scalability of modern search engines
● Can respond to user's requests
within a fraction of a second
● But are weak at text understanding
and complex reasoning
[Brin & Page 1998]
27
Can we search entities on the web?
● Multimedia, business, products have a lot of reviews and descriptions
28
Traditional IR approach lacks understanding
● Need to interpret the meaning from the surface text
29
Question answering as a simple test bed
● A good semantic representation should support reasoning at scale
Text
Question
Answer
Knowledge
Store
Program
Generation
Execution
Latent
Observed
30
Three approaches to generative models
● Autoregression (e.g., LM), VAE, GAN
[Hochreiter & Schmidhuber 1997][Graves 2013][Sutskever, Vinyals, Le 2014][Cho+ 2014]
[Kingma, Welling 2014]
[Goodfellow+ 2014]
Blog [Karpathy+ 2016]
Autoregressive models (e.g. LM)
[Hochreiter & Schmidhuber 1997]
Graves [1308.0850]
31
Immediately criticised when applied to text
● "I have a lot of respect for language. Deep-learning people seem not to"
● "They include such impressive natural language sentences as:"
○ * what everything they take everything away from
○ * how is the antoher headache
○ * will you have two moment ?
○ * This is undergoing operation a year .
● "These are not even grammatical!"
● The DNN bubble consists of models, which show
great promises but not yet practical at this point
Blog [Goldberg 2017]
[Rajeswar+ 2017]
32
statistician's view v.s. linguist's view
Given the power of deep learning anything
can be mapped to a unit Gaussian ball
Z
The world has real structures, which need
to be represented by real structures
Seq2Seq [Sutskever, Vinyals, Le 2014]
VAE [Kingma & Welling 2014]
GAN [Goodfellow+ 2014]
ACL [Goldberg 2015]
ACL [Mooney 2015]
furious
33
Scalability of mammal memory
● Very rapid adaptation (in just one or a few trials) is necessary for survival
○ E.g., associating taste of food and sickness
● Need fast responses based on large amount of knowledge
○ Needs good representation of knowledge
● However, good representation can only be learnt gradually
○ During sleeps to prevent interference with established associations
[Garcia+ 1966]
[Wickman 2012]
[Bartol+ 2015]34
Scalability of Cognitive Architectures
https://soar.eecs.umich.edu
● The design of mammalian brains
is inspiring to NLP systems
○ they are solving similar problems
● The design has not changed much
since 30 years ago
○ “We’ve totally solved it already … it is
just a matter of job security”
-- Nate Derbinsky, Northeastern U.
● Today we have
○ internet economy and data
○ computation and ML development
35
Plan
● Weak Supervision NLP
○ NLP, AI, software 2.0
○ Semantics as a foreign language
○ Unsupervised learning
○ Knowledge representation (symbolism)
● Semantic Parsing Tasks
○ WebQuestionsSP, WikiTableQuestions
● Neural Symbolic Machines (ACL 2017)
○ Compositionality (short term memory)
○ Scalable KB inference (symbolism)
○ RL vs MLE
● Memory Augmented Policy Optimization (NIPS 2018)
○ Experience replay (long term memory & optimal updating strategy)
○ Systematic exploration
○ Memory Weight Clipping (unbiased cold start strategy)
Access slides and join discussions at
weakly-supervised-nlu google group
Mobile
Desktop
Semantic parsing
Semantic Parsing
Knowledge Base
Natural language
Logical form /
Program
Desired behavior /
Answer
[Berant+ 2013]
[Liang 2013]
● Natural language queries or
commands are converted to
computation steps on data
and produce the expected
answers or behavior
Ful p is
(ha t le )
We k er on
(e s co c )
Execution
37
● Training from full supervision is labor-intensive
○ DeepCoder [Balog, 2016]
○ NPI [Reed & Freitas, 2015]
○ Seq2Tree [Dong & Lapata, 2016]
○ …
● Traditional semantic parsing models require feature engineering
○ SEMPRE [ Berant et al, 2013]
○ STAGG [Yih et al, 2015]
○ …
● End-to-end differentiable models cannot scale to large databases
○ Neural Programmer [Neekalantan et al, 2015]
○ Neural Turing Machines [Graves et al, 2014]
○ Neural GPU [Kaiser & Sutskever, 2015]
○ …
● Combine deep learning, symbolic reasoning and reinforcement learning
○ Neural Symbolic Machines (Liang, Berant, Le, Forbus, Lao, 2017)
○ Memory Augmented Policy Optimization (MAPO) (Liang, Norouzi, Berant, Le, Lao, 2018)
Related Works
38
Question Answering with Knowledge Base
Freebase, DBpedia, YAGO, NELL
Largest city in US?
GO
(Hop V1 CityIn)
(Argmax V2 Population)
RETURN
NYC
Compositionality Large Search Space
(Optimization)
E.g., Freebase:
23K predicates,
82M entities,
417M triplets
Paraphrase
Many ways to ask the same
question, e.g.,
“What was the date that
Minnesota became a state?”
“When was the state
Minnesota created?”
39
WebQuestions: motivation
Motivation: Natural language interface to
large structured knowledge-bases
Background: availability of many structured
datasets (Google KG, Bing Satori, Freebase,
DBPedia, Yelp, ...)
[Singhal, Google 2012]
[Qian, Bing 2013]
[Berant+ 2013]
Introducing the Knowledge
Graph: things, not strings
40
WebQuestions: getting questions
[Berant+ 2013]
Where was Barack Obama born?
Where was __ born?
Google Suggest
Barack Obama
Lady Gaga
Steve Jobs
Where was Steve Jobs born?
Where was Steve Jobs __?
Google Suggest born raised
on the Forbes list
Where was Steve Jobs raised?
Goal: collect large number of natural language queries
Strategy: breadth-first search over Google Suggest graph results in 1M queries
41
Motivation: overcome the limitations of
computers -- machines struggle to absorb
knowledge in the way humans do.
Solution: a large collaborative knowledge
base consisting of data (int triple format)
composed mainly by its community
members.
Later acquired by Google (discontinued)
[Sproat 2016]
A network of facts. © Metaweb/Google
41M entities (nodes)
19K properties (edge labels)
596M assertions (edges)
42
WebQuestions: getting Freebase answers
Goal: obtain label from non-experts
Strategy: Amazon Mechanical Turk (AMT)
● Given a query e.g. “Eric Clapton
hometown” detect if there is a single
named entity “Eric Clapton”
● Ask the worker to pick one (or more)
of the possible entities/values
(“Ripley”) on the entity Freebase page
● Cost $0.03 per question
[Berant+ 2013]
Freebase.com page circa 2011 © Metaweb/Google
43
WebQuestionsSP Dataset
● 5,810 questions from Google Suggest API & Amazon MTurk1
● Remove invalid QA pairs2
● 3,098 training examples, 1,639 testing examples remaining
● Open-domain and contains grammatical error
● Multiple entities as answer => macro-averaged F1
[Berant et al, 2013; Yih et al, 2016]
• What do Michelle Obama do for a living? writer, lawyer
• What character did Natalie Portman play in Star Wars? Padme Amidala
• What currency do you use in Costa Rica? Costa Rican colon
• What did Obama study in school? political science
• What killed Sammy Davis Jr? throat cancer
Grammatical error Multiple entities
44
SEMPRE: paraphrase
Collect entity pair observations for phrases and augmented predicates
(which partially solves the search problem)
[Berant+ 2013]
[Lin+ 2012]
[Fader+ 2011]
15M triples (of 15k phrases)
(Barack Obama, was born in, Honolulu)
(Albert Einstein, was born in, Ulm)
(Barack Obama, lived in, Chicago)
... ...
ClueWeb09
1 billion docs ReVerb
600M triples (of 60k augmented predicates)
(BarackObama, PlaceOfBirth, Honolulu)
(Albert Einstein, PlaceOfBirth, Ulm)
(BarackObama, PlacesLived.Location, Chicago)
... ...
45
SEMPRE: paraphrase
● Construct lexicon and
alignment features based on
entity pair cooccurrences
[Berant+ 2013]
46
SEMPRE: compositionality
The semantic structure (logic form) is coupled with syntactic structure
(surface patterns) through composition rules
[Berant+ 2013]
One derivation
47
SEMPRE: search: bridging
Motivation: individual words can
be highly ambiguous
• What government does Chile have?
• What is Italy’s language?
• Where is Beijing?
• What is the cover price of X-men?
Solution: the bridging operation
generates predicates compatible
with neighboring predicates
[Berant+ 2013]
48
SEMPRE: modeling
Log linear model over derivations
d given utterance x with expert
designed features
[Berant+ 2013]
49
STAGG: paraphrase
3 matching models based on char-ngram conv nets (Sent2vec [Shen+ 14])
[Yih+ 2015]
[Shen+ 2014]
[Gabrilovich+ 2013]
   
Pattern-Chain: who voiced meg on <e> cast-actor
Question-EntPred: who voiced meg on family guy Meg Griffin cast-actor
ClueWeb12: voiced homer on <e> cast-actor
50
STAGG: search
Staged Query Graph Generation
[Yih+ 2015]
[Yang & Chang 2015]
“Who first voiced Meg on Family Guy?”
1) decide the topic entity
2) decide the core inference chain
3) add type/aggregation constraints
51
STAGG: search
keeps up to N candidate states in
the priority queue (N = 1000)
+10 special rules to restrict the
type/aggregation constraints
[Yih+ 2015]
Domain knowledge is often
very important for
structure search problems.
Will revisit in WikiTableQ
e.g. Consider “to” predicates (indicating the
ending time of an event) only when the
question contains “last”, “latest” or “newest”
52
STAGG: modeling
Learning to rank model (for
query graph candidates) based
on expert designed features
+10 special features on the
type/aggregation constraints
[Yih+ 2015]
[Burges 2010]
53
Brief Summary & Preview
DL leads to the death of feature engineering (but not domain knowledge)
DL makes it easier to leverage pre-trained unsupervised models
SEMPRE/STAGG This talk
Paraphrase
(semi-supervised)
Co-occurrences collected from 1B docs
(ClueWeb) and Freebase
Embeddings trained from 840B
text tokens (GloVe)
Compositionality Relies on syntactic structure (text spans)
and domain specific rules to constrain the
generation of logic forms
A (LISP like) language specify
computations on KG
Search priority queue & heuristic rules RL & a program interpreter with
syntactical & semantic checks
Modeling Expert designed features Deep learning
54
WikiTableQuestions: Dataset
Breadth
● No fixed schema: Tables at test time
are not seen during training, needs to
generalize based on column name.
Depth
● More compositional questions, thus
require longer programs
● More operations like arithmetic
operations and aggregation operations
[Pasupat & Liang, 2015]
55
WikiTableQuestions: semantics
[Pasupat & Liang, 2015]
[Liang+ 2018]
56
WikiTableQuestions: semantics
*
[Pasupat & Liang, 2015]
[Liang+ 2018]
57
WikiTableQuestions: search
Feature engineering is dead.
It is survived by program space design.
[Pasupat & Liang, 2015]
[Zhang, Pasupat & Liang, 2017]
[Liang+ 2018]
(when t-alternative
(rule $AnchoredOr ($LEMMA_TOKEN) (FilterTokenFn lemma and or) (anchored 1))
...
(when t-movement
(rule $AnchoredMovement ($LEMMA_TOKEN) (FilterTokenFn lemma next previous after before above below) (anchored 1))
...
(when t-comparison
(rule $AnchoredMore ($LEMMA_TOKEN) (FilterTokenFn lemma more than least above after) (anchored 1))
...
(when t-superlative
(rule $SuperlativeTrigger ($LEMMA_TOKEN) (FilterPosTagFn token JJR JJS RBR RBS) (anchored 1))
(rule $SuperlativeTrigger ($LEMMA_TOKEN) (FilterTokenFn lemma top first bottom last) (anchored 1))
...
(when t-arithmetic
(rule $AnchoredSub ($LEMMA_TOKEN) (FilterTokenFn lemma difference between and much) (anchored 1))
…
... 58
WikiTableQuestions: example solutions
[Liang+, 2018]
59
More examples
[Liang+, 2018]
60
Neural Program Induction & Scalabilities
[Reed & Freitas 2015]
● The learned operations are not as
scalable and precise.
● Why not use existing modules that
are scalable, precise and
interpretable?
● Impressive works to show NN can
learn addition and sorting, but...
[Zaremba & Sutskever 2016]
61
Plan
● Weak Supervision NLP
○ NLP, AI, software 2.0
○ Semantics as a foreign language
○ Unsupervised learning
○ Knowledge representation (symbolism)
● Semantic Parsing Tasks
○ WebQuestionsSP, WikiTableQuestions
● Neural Symbolic Machines (ACL 2017)
○ Compositionality (short term memory)
○ Scalable KB inference (symbolism)
○ RL vs MLE
● Memory Augmented Policy Optimization (NIPS 2018)
○ Experience replay (long term memory & optimal updating strategy)
○ Systematic exploration
○ Memory Weight Clipping (unbiased cold start strategy)
Access slides and join discussions at
weakly-supervised-nlu google group
Mobile
Desktop
A bit background on Seq2Seq
● Separate a sequence model in to encoder and decoder
● Improves a phrase-based SMT system by re-ranking top candidates
● Cannot perform well by itself due to the information bottleneck
Encoder DecoderBottleneck
[Hochreiter & Schmidhuber 1997]
[Sutskever, Vinyals, Le 2014]
63
[Bahdandau, Cho
& Bengio 2015]
[Cho 2017]
64
[Bahdandau, Cho
& Bengio 2015]
[Cho 2017]
1. Don't generate from the ball of Z
2. Generate from a sequence of
source token ids, which encodes
the semantics of the target
sentence
65
Language to Logical Form with Neural Attention
● “compared to previous methods our model achieves similar or better
performance .. with no hand-engineered .. features.”
● Relies on full supervision (labeled logical forms)
[Dong & Lapata, 2016]
66
LSTM & Lapata's scream
[Lapata ACL 2017]
● LSTM has been applied to all kinds of NLP tasks and has greatly
simplified system designs
67
What now? Is NLP DEAD?
SEMPRE/STAGG This talk
Paraphrase
(semi-supervised)
Co-occurrences collected from 1B docs
(ClueWeb) and Freebase
Embeddings trained from 840B
text tokens (GloVe)
Compositionality Relies on syntactic structure (text spans)
and domain specific rules to constrain the
generation of logic forms
A (LISP like) language specify
computations on KG
Search priority queue & heuristic rules RL & a program interpreter with
syntactical & semantic checks
Modeling Expert designed features Deep learning
● The need for symbolic operations and effective model optimization
68
Connectionism vs Symbolism
69
a symbolic
machine
a sequence
of symbols
a neural
controller
The symbolic models represents
elegant solutions to problems,
and have been dominating AI for
a very long time
Once we have figured out how to
train them, the connectionism
approaches starts to win
Symbolic Machines in Brains
● 2014 Nobel Prize in Physiology
or Medicine awarded for ‘inner
GPS’ research
● Positions are represented as
discrete representations in
animals' brains, which enable
accurate and autonomous
calculations
[Stensola+ 2012]
Brain
Symbolic
Modules
Environment
Grid spacing for all modules
(M1–M4) in different animals
Location cells
& grid cells
70
Programmer ComputerManager
Program
Results
Question
Answer Predefined
Functions
Neural Symbolic Machines
Weak
supervision
Neural Symbolic
Knowledge Base
Abstract
Scalable
Precise
Non-differentiable
ACL [Liang+ 2017]
71
Computer with Domain Specific Languages
● Lisp Interpreter
○ Program => exp1
exp2
… expn
<END>
○ Exp => (f arg1
arg2
… argn
)
● What functions will be useful for the given task?
○ 10 operations for WebQuesitons
○ 22 different operations for WikiTableQuestions
■ hop
■ argmax, argmin
■ filter=
, filter!=
, filter>
, filter<
, filter>=
, filter<=
, filterin
, filter!in
,
■ first, last, previous, next
■ max, min, average, sum, mode, diff, same
(hop m.russell_wilon /education)
(hop v0 /institution)
(filter=
v1 m.univeristy
/notable_types)
<END>
ACL [Liang+ 2017]
IDEPen and paper
Code Assistance to Prune Search Space
ACL [Liang+ 2017]
73
Code Assistance: Syntactic Constraint
V0
R0
V1
R1
... ...
E0
Hop
E1
Argmax
... ...
P0
CityIn
P1
BornIn
... ...
(
(GO
Decoder Vocab
Functions: <10
Predicates: 23K
Softmax
Variables: <10
ACL [Liang+ 2017]
74
Code Assistance: Syntactic Constraint
V0
R0
V1
R1
... ...
E0
Hop
E1
Argmax
... ...
P0
CityIn
P1
BornIn
... ...
(
(GO
Decoder Vocab
Functions: <10
Predicates: 23K
Softmax
Variables: <10
Last token is ‘(’, so
has to output a
function name next.
ACL [Liang+ 2017]
75
Code Assistance: Semantic Constraint
V0
R0
V1
R1
... ...
E0
Hop
E1
Argmax
... ...
P0
CityIn
P1
BornIn
... ...
Decoder Vocab
Functions: <10
Predicates: 23K
Softmax
Variables: <10
Hop R0(
( Hop R0GO
ACL [Liang+ 2017]
76
Code Assistance: Semantic Constraint
V0
R0
V1
R1
... ...
E0
Hop
E1
Argmax
... ...
P0
CityIn
P1
BornIn
... ...
Decoder Vocab
Functions: <10
Predicates: 23K
Softmax
Variables: <10
Hop R0(
( Hop R0GO
Valid Predicates:
<100
Given definition of
Hop, need to output
a predicate that is
connected to R2
(m.USA).
ACL [Liang+ 2017]
77
Key-Variable Memory for Semantic Compositionality
● Equivalent to a linearised
bottom-up derivation of the
recursive program
Key Variable
v1
R1(m.USA)
Execute
( Argmax R2 Population )
Execute
Return
m.NYCKey Variable
... ...
v3
R3(m.NYC)
Key Variable
v1
R1(m.USA)
v2
R2(list of US cities)
Execute
( Hop R1 !CityIn )
Hop R1 !CityIn( )
Largest city ( Hop R1in US GO !CityIn Argmax R2( )Population)
R2 Population ReturnArgmax )(
Entity Resolver
78
Save Intermediate Values
Key
(Embedding)
Variable
(Symbol)
Value
(Data in Computer)
V0
R0 m.USA
V1
R1 [m.SF, m.NYC, ...]
Hop R0 !CityIn( )
( Hop R0GO !CityIn
Computer
Execution
ResultExpression is finished.
ACL [Liang+ 2017]
79
Reuse Intermediate Values
Key
(Embedding)
Variable
(Symbol)
Value
(Data in Computer)
V0
R0 m.USA
V1
R1 [m.SF, m.NYC, ...]
)
!CityIn Argmax()
Argmax(
Softmax
ACL [Liang+ 2017]
80
Augmented REINFORCE
● REINFORCE get stuck at local maxima
● Iterative ML training is not directly optimizing the F1 score
● Augmented REINFORCE obtains better performances
81
State-of-the-Art on WebQuestionsSP
● First end-to-end neural network to achieve SOTA on semantic parsing with
weak supervision over large knowledge base
● The performance is approaching SOTA with full supervision
ACL [Liang+ 2017]
82
Plan
● Weak Supervision NLP
○ NLP, AI, software 2.0
○ Semantics as a foreign language
○ Unsupervised learning
○ Knowledge representation (symbolism)
● Semantic Parsing Tasks
○ WebQuestionsSP, WikiTableQuestions
● Neural Symbolic Machines (ACL 2017)
○ Compositionality (short term memory)
○ Scalable KB inference (symbolism)
○ RL vs MLE
● Memory Augmented Policy Optimization (NIPS 2018)
○ Experience replay (long term memory & optimal updating strategy)
○ Systematic exploration
○ Memory Weight Clipping (unbiased cold start strategy)
Access slides and join discussions at
weakly-supervised-nlu google group
Mobile
Desktop
● MLE optimizes the log likelihood of target sequences
● RL optimizes the expected reward under a stochastic policy
What is RL?
[Williams 1992]
[Sutton & Barto 1998]
Directly Optimizing The Expected Reward
84
Challenges of applying RL
● Large search space (sparse rewards)
○ Supervised pretraining (MLE)
○ Systematic exploration [Houthooft+ 2017]
○ Curiosity [Schmidhuber 1991][Pathak2017]
● Credit assignment (delayed reward)
○ Bootstrapping
■ E.g., AlphaGo uses a value function to estimate the future reward
○ Rollout n-steps
● Train speed & stability (optimization)
○ Trust region approaches (e.g., PPO)
○ Experience replay
Book [Sutton & Barto 1998]
NIPS [Abbeel & Schulman 2016]
Our focus today
85
Efficiency challenge
● RL is still far from data efficient
○ E.g. the best learning algorithm
(DeepMind RainbowDQN) “passes
median human performance on 57
Atari games at about 18 million frames
(around 90 hours) of gameplay, while
most humans can pick up a game
within a few minutes.”
● How to improve its efficiency?
Credit to Alex Irpan 2018 from Google Brain Robotics
86
Applying RL to NLP
● Benefits of RL
○ Weak supervision (e.g., expected answer, user clicks)
○ Directly optimizing the metric (e.g., F1, accuracy, BLEU etc.)
○ Work with structured hidden variables (e.g., logical forms/programs)
● Challenges with existing solutions
○ Large search space sparse reward often leads to slow and unstable training
○ Spurious reward often lead to biased solutions
87
RL models generate their own training data
● Training sample management issue
○ Too many low quality examples ⇒ slow training
○ Boosting high reward experience ⇒ biased training
Training samples LearnerActor
Updated Model
88
Complementary Learning Theory
[McClelland+ 1995]
[Kumaran+ 2016]
Encoder
Episodic memory
Record & Replay
Primary sensory and motor cortices
Observations
89
Most of the past experience are not
helpful for improving the current model
http://insideout.wikia.com/wiki/Long_Term_Memory
90
Mammals learn from “interesting” dreams
● In early 2000s, scientists discovered that
animals have complex dreams and are able
to retain and recall long sequences of events
while they are asleep.
● Recent studies indicate that by consolidating
memory traces with high emotional /
motivational value, "sleep and dreaming
may offer a neurobehavioral substrate for
the offline … learning” Image courtesy of Kote on Drawception.com, 2012
91
Linear combination of maximum likelihood objective and expected return:
λ log-likelihood on a top-k buffer + (1 - λ) expected return
● Not robust against spurious programs
● The composite objective is ad-hoc, and the gradient is biased
Augment REINFORCE with Memory
[Liang+ 2016]
[Abolafia+ 2017]
Spurious programs: right answer, wrong reason
Which nation won the most silver medal?
● Correct program:
(argmax rows “Silver”)
(hop v1 “Nation”)
● Many spurious programs:
(argmax rows “Gold”)
(hop v1 “Nation”)
(argmax rows “Bronze”)
(hop v1 “Nation”)
(argmin rows “Rank”)
(hop v1 “Nation”)
...
Combating spurious rewards
● Reinforce a rewarded experience only if the model (current policy)
also thinks that it is the right thing to do
Correctness of the behavior
Experiences
Model’s preference
Reinforce of the behavior
94
Memory Augmented Policy Optimization (MAPO)
Memory
Learner
Actor
Good return?
Systematic
Exploration Model checkpoint
NIPS [Liang+ 2018]
95
Lower the gradient variance without introducing bias
Given a memory buffer of (sequence, return) pairs: ,
re-express expected return as,
● Importance sampling
○ Sample more frequently inside the buffer
○ Rejection sampling for samples outside the buffer.
NIPS [Liang+ 2018]
Optimal Sample Allocation
Given that we want to apply stratified sampling
to estimate the gradient of REINFORCE with
baseline under 1/0 rewards. It can be shown
that the optimal strategy is to allocate the same
number of samples to reward vs no reward
experiences
NIPS [Liang+ 2018]Details
Image source: Guy Harris, 2018
How to Give Feedback in a Non-Threatening Way
Comparison of model update strategies
On-policy optimization (REINFORCE)
Iterative Maximum Likelihood (IML)
Maximum Marginal Likelihood (MML)
MAPO
Correctness of the behavior
Experiences & Reward
Model’s preference
NIPS [Liang+ 2018]
98
Memory Weight Clipping
● Policy gradient methods usually suffer from a cold start problem, because
the model probabilities P to good experiences are very small initially
● We adopt a clipping mechanism to ensure that the buffer probability is
larger than a threshold ɑ
NIPS [Liang+ 2018]
99
Memory Weight Clipping
Memory
Actor
max(P(B), α)
1- max(P(B), α)
Trade off bias in
the initial stage
for faster training
NIPS [Liang+ 2018]
Training becomes less biased over time
NIPS [Liang+ 2018]
101
Comparison
● The shaded area represents the standard deviation of the dev accuracy
● REINFORCE does not work at all
● MAPO is slower but less biased
NIPS [Liang+ 2018]
102
SOTA results with weak supervision
NIPS [Liang+ 2018]
103
Actor 1
Actor 2
Actor n
…... Learner
Train set shard 1
Train set shard 2
…...
Checkpoint queue
Sample queue
EvaluatorDev set
Train set shard n
Envs 1
Envs 2
Envs n
Dev envs
Scale up: Distributed Actor-Learner architecture
[Liang et al, 2017; Espeholt et al, 2018; Liang et al, 2018]
● Weak Supervision NLP
○ NLP, AI, software 2.0
○ Semantics as a foreign language
○ Unsupervised learning
○ Knowledge representation (symbolism)
● Semantic Parsing Tasks
○ WebQuestionsSP, WikiTableQuestions
● Neural Symbolic Machines (ACL 2017)
○ Compositionality (short term memory)
○ Scalable KB inference (symbolism)
○ RL vs MLE
● Memory Augmented Policy Optimization (NIPS 2018)
○ Experience replay (long term memory & optimal updating strategy)
○ Systematic exploration
○ Memory Weight Clipping (unbiased cold start strategy)
Access slides and join discussions at
weakly-supervised-nlu google group
Mobile
Desktop
Thanks!

Contenu connexe

Tendances

Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)AI Frontiers
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
 
The Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep LearningThe Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep Learningindico data
 
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
Adam Coates at AI Frontiers: AI for 100 Million People with Deep LearningAdam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
Adam Coates at AI Frontiers: AI for 100 Million People with Deep LearningAI Frontiers
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligenceananth
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning台灣資料科學年會
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaAI Frontiers
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...AI Frontiers
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadRoelof Pieters
 
許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI台灣資料科學年會
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slidesCloudxLab
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
State of the art in Natural Language Processing (March 2019)
State of the art in Natural Language Processing (March 2019)State of the art in Natural Language Processing (March 2019)
State of the art in Natural Language Processing (March 2019)Liad Magen
 

Tendances (20)

Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
The Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep LearningThe Unreasonable Benefits of Deep Learning
The Unreasonable Benefits of Deep Learning
 
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
Adam Coates at AI Frontiers: AI for 100 Million People with Deep LearningAdam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning李俊良/Feature Engineering in Machine Learning
李俊良/Feature Engineering in Machine Learning
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
 
Agile Deep Learning
Agile Deep LearningAgile Deep Learning
Agile Deep Learning
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
 
State of the art in Natural Language Processing (March 2019)
State of the art in Natural Language Processing (March 2019)State of the art in Natural Language Processing (March 2019)
State of the art in Natural Language Processing (March 2019)
 

Similaire à Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Understanding

Art of artificial intelligence and automation
Art of artificial intelligence and automationArt of artificial intelligence and automation
Art of artificial intelligence and automationLiew Wei Da Andrew
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Language Translator.pptx
Language Translator.pptxLanguage Translator.pptx
Language Translator.pptxMRABC9
 
A prior case study of natural language processing on different domain
A prior case study of natural language processing  on different domain A prior case study of natural language processing  on different domain
A prior case study of natural language processing on different domain IJECEIAES
 
Research Proposal For Counterpoint Music Practice
Research Proposal For Counterpoint Music PracticeResearch Proposal For Counterpoint Music Practice
Research Proposal For Counterpoint Music PracticeRoxy Roberts
 
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...PhD Assistance
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...ijnlc
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...kevig
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence NIKHILMALPURE3
 
Tensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptxTensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptxAnandMenon54
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...AishwaryaChemate
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Aritficial intelligence
Aritficial intelligenceAritficial intelligence
Aritficial intelligenceMaqsood Awan
 
Artificial Intelligence in E-learning (AI-Ed): Current and future applications
Artificial Intelligence in E-learning (AI-Ed): Current and future applicationsArtificial Intelligence in E-learning (AI-Ed): Current and future applications
Artificial Intelligence in E-learning (AI-Ed): Current and future applicationsRoy Clariana
 

Similaire à Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Understanding (20)

Art of artificial intelligence and automation
Art of artificial intelligence and automationArt of artificial intelligence and automation
Art of artificial intelligence and automation
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Language Translator.pptx
Language Translator.pptxLanguage Translator.pptx
Language Translator.pptx
 
A prior case study of natural language processing on different domain
A prior case study of natural language processing  on different domain A prior case study of natural language processing  on different domain
A prior case study of natural language processing on different domain
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
Research Proposal For Counterpoint Music Practice
Research Proposal For Counterpoint Music PracticeResearch Proposal For Counterpoint Music Practice
Research Proposal For Counterpoint Music Practice
 
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
Conversational AI:An Overview of Techniques, Applications & Future Scope - Ph...
 
Learning to code in 2020
Learning to code in 2020Learning to code in 2020
Learning to code in 2020
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
XAI LANGUAGE TUTOR - A XAI-BASED LANGUAGE LEARNING CHATBOT USING ONTOLOGY AND...
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence
 
Tensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptxTensorflow a brief introduction (1).pptx
Tensorflow a brief introduction (1).pptx
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
Ai notes
Ai notesAi notes
Ai notes
 
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
APznzaalselifJKjGQdTCA51cF7bldYdFMvDcshM8opKFZ_ZaIV-dqkiLoIKIfhz2tS6Fw5UBk25u...
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Aritficial intelligence
Aritficial intelligenceAritficial intelligence
Aritficial intelligence
 
Artificial Intelligence in E-learning (AI-Ed): Current and future applications
Artificial Intelligence in E-learning (AI-Ed): Current and future applicationsArtificial Intelligence in E-learning (AI-Ed): Current and future applications
Artificial Intelligence in E-learning (AI-Ed): Current and future applications
 

Plus de AI Frontiers

Divya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationDivya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationAI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...AI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...AI Frontiers
 
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksTraining at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksAI Frontiers
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...AI Frontiers
 
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningPercy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningAI Frontiers
 
Mark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateMark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateAI Frontiers
 
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...AI Frontiers
 
Arnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyArnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyAI Frontiers
 
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...AI Frontiers
 
Sumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseSumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseAI Frontiers
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningAI Frontiers
 
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAlex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAI Frontiers
 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingAI Frontiers
 
Melissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceMelissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceAI Frontiers
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...AI Frontiers
 
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAshok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAI Frontiers
 
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...AI Frontiers
 
Kai-Fu Lee at AI Frontiers : The Era of Artificial Intelligence
Kai-Fu Lee at AI Frontiers : The Era of Artificial IntelligenceKai-Fu Lee at AI Frontiers : The Era of Artificial Intelligence
Kai-Fu Lee at AI Frontiers : The Era of Artificial IntelligenceAI Frontiers
 

Plus de AI Frontiers (20)

Divya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationDivya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video Summarization
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
 
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksTraining at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
 
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningPercy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
 
Mark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateMark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber Elevate
 
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
 
Arnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyArnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the Sky
 
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
 
Sumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseSumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for Enterprise
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
 
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAlex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in Gaming
 
Melissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceMelissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & Finance
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
 
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAshok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
 
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
 
Kai-Fu Lee at AI Frontiers : The Era of Artificial Intelligence
Kai-Fu Lee at AI Frontiers : The Era of Artificial IntelligenceKai-Fu Lee at AI Frontiers : The Era of Artificial Intelligence
Kai-Fu Lee at AI Frontiers : The Era of Artificial Intelligence
 

Dernier

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 

Dernier (20)

Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 

Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Understanding

  • 1. Weakly Supervised Natural Language Understanding Ni Lao mosaix.ai 11.11.2018 Collaborators: Liang Chen, Fan Yang, Jonathan Berant, Mohammad Norouzi, Quoc Le, William Cohen, Kenneth ForbusFor completeness a large part of the tutorial is from previous works. Thanks to Chen Liang for helping with creating these slides
  • 2. Speaker Background ● BS from Tsinghua U. EE (1999 - 2003) Worked on the TsinghuAeolus system, and won world champion in RoboCup simulation league in 2001 and 2002 ● MS from Tsinghua U. CS (2003 - 2006) Worked at Microsoft Research Asia on automatic OS diagnosis, Web search and product search ● PhD from Carnegie Mellon U. (2006 - 2012) Researched on IR, ML, NLP. Worked on the CMU JAVELIN QA system, and Never-Ending Learning (NELL) system ● Research Scientist Google (2012 - 2017) Researched on KG construction, semantic parsing. Worked on KG and Web-based QA products ● Chief Scientist & Co-founder Mosaix.ai (2018 - ) Research on semantic parsing and text understanding for search and NLU services. And we are hiring! 2
  • 3. Plan ● Weak Supervision NLP ○ NLP, AI, software 2.0 ○ Semantics as a foreign language ○ Unsupervised learning ○ Knowledge representation (symbolism) ● Semantic Parsing Tasks ○ WebQuestionsSP, WikiTableQuestions ● Neural Symbolic Machines (ACL 2017) ○ Compositionality (short term memory) ○ Scalable KB inference (symbolism) ○ RL vs MLE ● Memory Augmented Policy Optimization (NIPS 2018) ○ Experience replay (long term memory & optimal updating strategy) ○ Systematic exploration ○ Memory Weight Clipping (unbiased cold start strategy) Access slides and join discussions at weakly-supervised-nlu google group Mobile Desktop
  • 4. Natural Language Processing (NLP) ● Enables machines to understand and assists human ● A major problem of AI (AI-complete) ● Leads to new theories in cognitive science There can be two underlying motivations for building a computational theory. The technological goal is simply to build better computers, and any solution that works would be acceptable. The cognitive goal is to build a computational analog of the human-language-processing mechanism; such a theory would be acceptable only after it had been verified by experiment. -- James Allen,1987 4
  • 5. Adoption Of Voice Technology ● Google’s Speech Internationalization Project: From 1 to 300 Languages and Beyond [Pedro J. Moreno, 2012] ● My daughter adopted YouTube voice command since 2 years old ● 20% of the U.S. population has access to smart speakers [Techcrunch, 2018] ● Rising adoption in the Asia Pacific [GoVocal.AI 2018] 5
  • 6. Language understanding for AI and humanity If you got a billion dollars to spend on a huge research project that you get to lead, what would you like to do? -- r/CyberByte, 2015 NLP is fascinating, allowing us to focus on highly-structured inference problems, on issues that go to the core of "what is thought" but remain eminently practical, and on a technology that surely would make the world a better place. -- Michael I Jordan What kind of impact you hope deep learning has on our future? -- Steve Paikin, 2016 I hope it allows Google to ... search by the content of the document rather than by the words in the document ... I hope it will make for intelligent personal systems, who can answer questions in a sensible way ... It will make computers much easier to use. Because you’ll be able to just say to your computer “print this damn thing” -- Geoffrey Hinton ● Experts with different views of AI agree on the potential of NLP 6
  • 7. What is understanding? "If they find a parrot who could answer to everything, I would claim it to be an intelligent being without hesitation.", -- Alan Turing, 1950 Does the machine literally "understand" Chinese ? Or is it merely simulating the ability to understand Chinese? -- John Searle, 1980 The Chinese Room ArgumentThe Imitation Game 7
  • 8. Full Supervision NLP ● Traditionally NLP is a labor-intensive business Applications Discourse Processing Semantic Parsing Syntactic analysis Morphological analysis ASR OCR speech text Rules written by 1000 PhDs NLP algorithms simply lookup 8
  • 9. Software 2.0 1. specify some goal on the behavior ○ e.g., “satisfy input output pairs of examples”, ○ e.g.,“win a game of Go” 2. write a rough skeleton of the code that identifies a subset of program space to search ○ e.g. a neural net architecture 3. use the computational resources at disposal to search this space for a program that works. Death of feature engineering. (The) users of the software will (play) a direct role in building it. Data labeling is a central component to system design. [Karpathy 2017; Watson 2017; Ratner+ 2018] 9
  • 10. Where does knowledge come from? domain experts (bottlenecks) expert systems with knowledge bases the world ● Can only cover the most popular semantics used by human 10
  • 11. Weak Supervision NLP ● Avoid the knowledge acquisition bottleneck with machine learning ● Then we can cover all possible semantics used by human machine learning intelligent systems with knowledge end to end examples (e.g., QA pairs) 11
  • 12. Language & Reasoning ● The formalist view ○ Language was primarily invented for reasoning [Everaert+ 2015] ● The functionalist view ○ Language is for communication [Kirby 2017] ● Cognitive coupling hypothesis ○ sequential processing is “necessary for behaviours such as primate tool use, navigation, foraging and social action.” [Kolodny & Edelman 2018] 12
  • 13. Parent Gender Gender NameBart Simpson Reasoning is needed to understand text Bart's father is Homer [Liang+ 2017] HomerMarge 13
  • 14. Semantics is a language for computation “impressionist painters during the 1920s” painters [/painting] !/art_forms impressionist <visual_artist> x.[/associated_periods_or_movements = /impressionism] <artist> during the 1920s x.[/date_of_work < 1930; /date_of_work > 1920] 14
  • 15. Freebase, DBpedia, YAGO, NELL Largest city in US? GO (Hop V1 CityIn) (Argmax V2 Population) RETURN NYC Semantics is a language for computation Semantic parsing Computation 15
  • 16. Semantics as a foreign language 1) Natural languages are programming languages to control human behavior (either others or self) 2) For machines and human to understand each other, they just need translation models trained with control theory 16
  • 17. NLU with Unsupervised Learning ● Supervised learning needs either feature engineering for compact representation or large amount of labeled data Text Intermediate representation labels ● Unsupervised learning produces better representations and reduces the labeling cost, and it is easily transferable! learning Feature engineering Text Intermediate representation labels learning learning 17
  • 19. Pre-Trained Word Embeddings ● learning (non-contextual) word embeddings from text ○ matrix factorization on global word-word co-occurrence counts [1990] ○ local context window methods [2014] will come back to this in 2018 ○ weighted least squares on global word-word co-occurrence counts [2014] ● Co-occurrence statistics as an efficient approximation to the original text LSA [Deer-wester+ 1990] skip-gram [Mikolov + 2013] Glove [Pennington+ 2014] 19
  • 20. Pre-Trained Word Embeddings Compact representations for syntax, common sense and world knowledge LSA [Deer-wester+ 1990] skip-gram [Mikolov + 2013] Glove [Pennington+ 2014] Glove [Pennington+ 2014] 20
  • 21. Pre-Trained Sequence Models With a lot more computational power we have language modeling sequence models which converts sequences of tokens to sequences of contextual embeddings [Devlin+ 2018] [Radford+ 2018] [Peters+ 2018] 21
  • 22. Pre-Trained Sequence Models Differences in pre-training model architectures. BERT uses a bidirectional Transformer. OpenAI GPTuses a left-to-right Transformer. ELMo uses the concatenation of independently trained left-to-right and right-to-left LSTM. [Devlin+ 2018] [Radford+ 2018] [Peters+ 2018] 22
  • 23. Pre-Trained Sequence Models 2018 is the year of Pre-Trained Sequence Models [Devlin+ 2018] [Radford+ 2018] [Peters+ 2018] 23
  • 24. Knowledge Representation & Scalability a small machine which copies the complexity of the world to the brain 24 a suitable representationthe world
  • 25. Mandelbrot Set the nature of complex numbers 25 z0 = 0
  • 26. Internet as an external memory ● How information should be organized for scalability? [Vannevar Bush 1945] 26
  • 27. The scalability of modern search engines ● Can respond to user's requests within a fraction of a second ● But are weak at text understanding and complex reasoning [Brin & Page 1998] 27
  • 28. Can we search entities on the web? ● Multimedia, business, products have a lot of reviews and descriptions 28
  • 29. Traditional IR approach lacks understanding ● Need to interpret the meaning from the surface text 29
  • 30. Question answering as a simple test bed ● A good semantic representation should support reasoning at scale Text Question Answer Knowledge Store Program Generation Execution Latent Observed 30
  • 31. Three approaches to generative models ● Autoregression (e.g., LM), VAE, GAN [Hochreiter & Schmidhuber 1997][Graves 2013][Sutskever, Vinyals, Le 2014][Cho+ 2014] [Kingma, Welling 2014] [Goodfellow+ 2014] Blog [Karpathy+ 2016] Autoregressive models (e.g. LM) [Hochreiter & Schmidhuber 1997] Graves [1308.0850] 31
  • 32. Immediately criticised when applied to text ● "I have a lot of respect for language. Deep-learning people seem not to" ● "They include such impressive natural language sentences as:" ○ * what everything they take everything away from ○ * how is the antoher headache ○ * will you have two moment ? ○ * This is undergoing operation a year . ● "These are not even grammatical!" ● The DNN bubble consists of models, which show great promises but not yet practical at this point Blog [Goldberg 2017] [Rajeswar+ 2017] 32
  • 33. statistician's view v.s. linguist's view Given the power of deep learning anything can be mapped to a unit Gaussian ball Z The world has real structures, which need to be represented by real structures Seq2Seq [Sutskever, Vinyals, Le 2014] VAE [Kingma & Welling 2014] GAN [Goodfellow+ 2014] ACL [Goldberg 2015] ACL [Mooney 2015] furious 33
  • 34. Scalability of mammal memory ● Very rapid adaptation (in just one or a few trials) is necessary for survival ○ E.g., associating taste of food and sickness ● Need fast responses based on large amount of knowledge ○ Needs good representation of knowledge ● However, good representation can only be learnt gradually ○ During sleeps to prevent interference with established associations [Garcia+ 1966] [Wickman 2012] [Bartol+ 2015]34
  • 35. Scalability of Cognitive Architectures https://soar.eecs.umich.edu ● The design of mammalian brains is inspiring to NLP systems ○ they are solving similar problems ● The design has not changed much since 30 years ago ○ “We’ve totally solved it already … it is just a matter of job security” -- Nate Derbinsky, Northeastern U. ● Today we have ○ internet economy and data ○ computation and ML development 35
  • 36. Plan ● Weak Supervision NLP ○ NLP, AI, software 2.0 ○ Semantics as a foreign language ○ Unsupervised learning ○ Knowledge representation (symbolism) ● Semantic Parsing Tasks ○ WebQuestionsSP, WikiTableQuestions ● Neural Symbolic Machines (ACL 2017) ○ Compositionality (short term memory) ○ Scalable KB inference (symbolism) ○ RL vs MLE ● Memory Augmented Policy Optimization (NIPS 2018) ○ Experience replay (long term memory & optimal updating strategy) ○ Systematic exploration ○ Memory Weight Clipping (unbiased cold start strategy) Access slides and join discussions at weakly-supervised-nlu google group Mobile Desktop
  • 37. Semantic parsing Semantic Parsing Knowledge Base Natural language Logical form / Program Desired behavior / Answer [Berant+ 2013] [Liang 2013] ● Natural language queries or commands are converted to computation steps on data and produce the expected answers or behavior Ful p is (ha t le ) We k er on (e s co c ) Execution 37
  • 38. ● Training from full supervision is labor-intensive ○ DeepCoder [Balog, 2016] ○ NPI [Reed & Freitas, 2015] ○ Seq2Tree [Dong & Lapata, 2016] ○ … ● Traditional semantic parsing models require feature engineering ○ SEMPRE [ Berant et al, 2013] ○ STAGG [Yih et al, 2015] ○ … ● End-to-end differentiable models cannot scale to large databases ○ Neural Programmer [Neekalantan et al, 2015] ○ Neural Turing Machines [Graves et al, 2014] ○ Neural GPU [Kaiser & Sutskever, 2015] ○ … ● Combine deep learning, symbolic reasoning and reinforcement learning ○ Neural Symbolic Machines (Liang, Berant, Le, Forbus, Lao, 2017) ○ Memory Augmented Policy Optimization (MAPO) (Liang, Norouzi, Berant, Le, Lao, 2018) Related Works 38
  • 39. Question Answering with Knowledge Base Freebase, DBpedia, YAGO, NELL Largest city in US? GO (Hop V1 CityIn) (Argmax V2 Population) RETURN NYC Compositionality Large Search Space (Optimization) E.g., Freebase: 23K predicates, 82M entities, 417M triplets Paraphrase Many ways to ask the same question, e.g., “What was the date that Minnesota became a state?” “When was the state Minnesota created?” 39
  • 40. WebQuestions: motivation Motivation: Natural language interface to large structured knowledge-bases Background: availability of many structured datasets (Google KG, Bing Satori, Freebase, DBPedia, Yelp, ...) [Singhal, Google 2012] [Qian, Bing 2013] [Berant+ 2013] Introducing the Knowledge Graph: things, not strings 40
  • 41. WebQuestions: getting questions [Berant+ 2013] Where was Barack Obama born? Where was __ born? Google Suggest Barack Obama Lady Gaga Steve Jobs Where was Steve Jobs born? Where was Steve Jobs __? Google Suggest born raised on the Forbes list Where was Steve Jobs raised? Goal: collect large number of natural language queries Strategy: breadth-first search over Google Suggest graph results in 1M queries 41
  • 42. Motivation: overcome the limitations of computers -- machines struggle to absorb knowledge in the way humans do. Solution: a large collaborative knowledge base consisting of data (int triple format) composed mainly by its community members. Later acquired by Google (discontinued) [Sproat 2016] A network of facts. © Metaweb/Google 41M entities (nodes) 19K properties (edge labels) 596M assertions (edges) 42
  • 43. WebQuestions: getting Freebase answers Goal: obtain label from non-experts Strategy: Amazon Mechanical Turk (AMT) ● Given a query e.g. “Eric Clapton hometown” detect if there is a single named entity “Eric Clapton” ● Ask the worker to pick one (or more) of the possible entities/values (“Ripley”) on the entity Freebase page ● Cost $0.03 per question [Berant+ 2013] Freebase.com page circa 2011 © Metaweb/Google 43
  • 44. WebQuestionsSP Dataset ● 5,810 questions from Google Suggest API & Amazon MTurk1 ● Remove invalid QA pairs2 ● 3,098 training examples, 1,639 testing examples remaining ● Open-domain and contains grammatical error ● Multiple entities as answer => macro-averaged F1 [Berant et al, 2013; Yih et al, 2016] • What do Michelle Obama do for a living? writer, lawyer • What character did Natalie Portman play in Star Wars? Padme Amidala • What currency do you use in Costa Rica? Costa Rican colon • What did Obama study in school? political science • What killed Sammy Davis Jr? throat cancer Grammatical error Multiple entities 44
  • 45. SEMPRE: paraphrase Collect entity pair observations for phrases and augmented predicates (which partially solves the search problem) [Berant+ 2013] [Lin+ 2012] [Fader+ 2011] 15M triples (of 15k phrases) (Barack Obama, was born in, Honolulu) (Albert Einstein, was born in, Ulm) (Barack Obama, lived in, Chicago) ... ... ClueWeb09 1 billion docs ReVerb 600M triples (of 60k augmented predicates) (BarackObama, PlaceOfBirth, Honolulu) (Albert Einstein, PlaceOfBirth, Ulm) (BarackObama, PlacesLived.Location, Chicago) ... ... 45
  • 46. SEMPRE: paraphrase ● Construct lexicon and alignment features based on entity pair cooccurrences [Berant+ 2013] 46
  • 47. SEMPRE: compositionality The semantic structure (logic form) is coupled with syntactic structure (surface patterns) through composition rules [Berant+ 2013] One derivation 47
  • 48. SEMPRE: search: bridging Motivation: individual words can be highly ambiguous • What government does Chile have? • What is Italy’s language? • Where is Beijing? • What is the cover price of X-men? Solution: the bridging operation generates predicates compatible with neighboring predicates [Berant+ 2013] 48
  • 49. SEMPRE: modeling Log linear model over derivations d given utterance x with expert designed features [Berant+ 2013] 49
  • 50. STAGG: paraphrase 3 matching models based on char-ngram conv nets (Sent2vec [Shen+ 14]) [Yih+ 2015] [Shen+ 2014] [Gabrilovich+ 2013]     Pattern-Chain: who voiced meg on <e> cast-actor Question-EntPred: who voiced meg on family guy Meg Griffin cast-actor ClueWeb12: voiced homer on <e> cast-actor 50
  • 51. STAGG: search Staged Query Graph Generation [Yih+ 2015] [Yang & Chang 2015] “Who first voiced Meg on Family Guy?” 1) decide the topic entity 2) decide the core inference chain 3) add type/aggregation constraints 51
  • 52. STAGG: search keeps up to N candidate states in the priority queue (N = 1000) +10 special rules to restrict the type/aggregation constraints [Yih+ 2015] Domain knowledge is often very important for structure search problems. Will revisit in WikiTableQ e.g. Consider “to” predicates (indicating the ending time of an event) only when the question contains “last”, “latest” or “newest” 52
  • 53. STAGG: modeling Learning to rank model (for query graph candidates) based on expert designed features +10 special features on the type/aggregation constraints [Yih+ 2015] [Burges 2010] 53
  • 54. Brief Summary & Preview DL leads to the death of feature engineering (but not domain knowledge) DL makes it easier to leverage pre-trained unsupervised models SEMPRE/STAGG This talk Paraphrase (semi-supervised) Co-occurrences collected from 1B docs (ClueWeb) and Freebase Embeddings trained from 840B text tokens (GloVe) Compositionality Relies on syntactic structure (text spans) and domain specific rules to constrain the generation of logic forms A (LISP like) language specify computations on KG Search priority queue & heuristic rules RL & a program interpreter with syntactical & semantic checks Modeling Expert designed features Deep learning 54
  • 55. WikiTableQuestions: Dataset Breadth ● No fixed schema: Tables at test time are not seen during training, needs to generalize based on column name. Depth ● More compositional questions, thus require longer programs ● More operations like arithmetic operations and aggregation operations [Pasupat & Liang, 2015] 55
  • 56. WikiTableQuestions: semantics [Pasupat & Liang, 2015] [Liang+ 2018] 56
  • 57. WikiTableQuestions: semantics * [Pasupat & Liang, 2015] [Liang+ 2018] 57
  • 58. WikiTableQuestions: search Feature engineering is dead. It is survived by program space design. [Pasupat & Liang, 2015] [Zhang, Pasupat & Liang, 2017] [Liang+ 2018] (when t-alternative (rule $AnchoredOr ($LEMMA_TOKEN) (FilterTokenFn lemma and or) (anchored 1)) ... (when t-movement (rule $AnchoredMovement ($LEMMA_TOKEN) (FilterTokenFn lemma next previous after before above below) (anchored 1)) ... (when t-comparison (rule $AnchoredMore ($LEMMA_TOKEN) (FilterTokenFn lemma more than least above after) (anchored 1)) ... (when t-superlative (rule $SuperlativeTrigger ($LEMMA_TOKEN) (FilterPosTagFn token JJR JJS RBR RBS) (anchored 1)) (rule $SuperlativeTrigger ($LEMMA_TOKEN) (FilterTokenFn lemma top first bottom last) (anchored 1)) ... (when t-arithmetic (rule $AnchoredSub ($LEMMA_TOKEN) (FilterTokenFn lemma difference between and much) (anchored 1)) … ... 58
  • 61. Neural Program Induction & Scalabilities [Reed & Freitas 2015] ● The learned operations are not as scalable and precise. ● Why not use existing modules that are scalable, precise and interpretable? ● Impressive works to show NN can learn addition and sorting, but... [Zaremba & Sutskever 2016] 61
  • 62. Plan ● Weak Supervision NLP ○ NLP, AI, software 2.0 ○ Semantics as a foreign language ○ Unsupervised learning ○ Knowledge representation (symbolism) ● Semantic Parsing Tasks ○ WebQuestionsSP, WikiTableQuestions ● Neural Symbolic Machines (ACL 2017) ○ Compositionality (short term memory) ○ Scalable KB inference (symbolism) ○ RL vs MLE ● Memory Augmented Policy Optimization (NIPS 2018) ○ Experience replay (long term memory & optimal updating strategy) ○ Systematic exploration ○ Memory Weight Clipping (unbiased cold start strategy) Access slides and join discussions at weakly-supervised-nlu google group Mobile Desktop
  • 63. A bit background on Seq2Seq ● Separate a sequence model in to encoder and decoder ● Improves a phrase-based SMT system by re-ranking top candidates ● Cannot perform well by itself due to the information bottleneck Encoder DecoderBottleneck [Hochreiter & Schmidhuber 1997] [Sutskever, Vinyals, Le 2014] 63
  • 64. [Bahdandau, Cho & Bengio 2015] [Cho 2017] 64
  • 65. [Bahdandau, Cho & Bengio 2015] [Cho 2017] 1. Don't generate from the ball of Z 2. Generate from a sequence of source token ids, which encodes the semantics of the target sentence 65
  • 66. Language to Logical Form with Neural Attention ● “compared to previous methods our model achieves similar or better performance .. with no hand-engineered .. features.” ● Relies on full supervision (labeled logical forms) [Dong & Lapata, 2016] 66
  • 67. LSTM & Lapata's scream [Lapata ACL 2017] ● LSTM has been applied to all kinds of NLP tasks and has greatly simplified system designs 67
  • 68. What now? Is NLP DEAD? SEMPRE/STAGG This talk Paraphrase (semi-supervised) Co-occurrences collected from 1B docs (ClueWeb) and Freebase Embeddings trained from 840B text tokens (GloVe) Compositionality Relies on syntactic structure (text spans) and domain specific rules to constrain the generation of logic forms A (LISP like) language specify computations on KG Search priority queue & heuristic rules RL & a program interpreter with syntactical & semantic checks Modeling Expert designed features Deep learning ● The need for symbolic operations and effective model optimization 68
  • 69. Connectionism vs Symbolism 69 a symbolic machine a sequence of symbols a neural controller The symbolic models represents elegant solutions to problems, and have been dominating AI for a very long time Once we have figured out how to train them, the connectionism approaches starts to win
  • 70. Symbolic Machines in Brains ● 2014 Nobel Prize in Physiology or Medicine awarded for ‘inner GPS’ research ● Positions are represented as discrete representations in animals' brains, which enable accurate and autonomous calculations [Stensola+ 2012] Brain Symbolic Modules Environment Grid spacing for all modules (M1–M4) in different animals Location cells & grid cells 70
  • 71. Programmer ComputerManager Program Results Question Answer Predefined Functions Neural Symbolic Machines Weak supervision Neural Symbolic Knowledge Base Abstract Scalable Precise Non-differentiable ACL [Liang+ 2017] 71
  • 72. Computer with Domain Specific Languages ● Lisp Interpreter ○ Program => exp1 exp2 … expn <END> ○ Exp => (f arg1 arg2 … argn ) ● What functions will be useful for the given task? ○ 10 operations for WebQuesitons ○ 22 different operations for WikiTableQuestions ■ hop ■ argmax, argmin ■ filter= , filter!= , filter> , filter< , filter>= , filter<= , filterin , filter!in , ■ first, last, previous, next ■ max, min, average, sum, mode, diff, same (hop m.russell_wilon /education) (hop v0 /institution) (filter= v1 m.univeristy /notable_types) <END> ACL [Liang+ 2017]
  • 73. IDEPen and paper Code Assistance to Prune Search Space ACL [Liang+ 2017] 73
  • 74. Code Assistance: Syntactic Constraint V0 R0 V1 R1 ... ... E0 Hop E1 Argmax ... ... P0 CityIn P1 BornIn ... ... ( (GO Decoder Vocab Functions: <10 Predicates: 23K Softmax Variables: <10 ACL [Liang+ 2017] 74
  • 75. Code Assistance: Syntactic Constraint V0 R0 V1 R1 ... ... E0 Hop E1 Argmax ... ... P0 CityIn P1 BornIn ... ... ( (GO Decoder Vocab Functions: <10 Predicates: 23K Softmax Variables: <10 Last token is ‘(’, so has to output a function name next. ACL [Liang+ 2017] 75
  • 76. Code Assistance: Semantic Constraint V0 R0 V1 R1 ... ... E0 Hop E1 Argmax ... ... P0 CityIn P1 BornIn ... ... Decoder Vocab Functions: <10 Predicates: 23K Softmax Variables: <10 Hop R0( ( Hop R0GO ACL [Liang+ 2017] 76
  • 77. Code Assistance: Semantic Constraint V0 R0 V1 R1 ... ... E0 Hop E1 Argmax ... ... P0 CityIn P1 BornIn ... ... Decoder Vocab Functions: <10 Predicates: 23K Softmax Variables: <10 Hop R0( ( Hop R0GO Valid Predicates: <100 Given definition of Hop, need to output a predicate that is connected to R2 (m.USA). ACL [Liang+ 2017] 77
  • 78. Key-Variable Memory for Semantic Compositionality ● Equivalent to a linearised bottom-up derivation of the recursive program Key Variable v1 R1(m.USA) Execute ( Argmax R2 Population ) Execute Return m.NYCKey Variable ... ... v3 R3(m.NYC) Key Variable v1 R1(m.USA) v2 R2(list of US cities) Execute ( Hop R1 !CityIn ) Hop R1 !CityIn( ) Largest city ( Hop R1in US GO !CityIn Argmax R2( )Population) R2 Population ReturnArgmax )( Entity Resolver 78
  • 79. Save Intermediate Values Key (Embedding) Variable (Symbol) Value (Data in Computer) V0 R0 m.USA V1 R1 [m.SF, m.NYC, ...] Hop R0 !CityIn( ) ( Hop R0GO !CityIn Computer Execution ResultExpression is finished. ACL [Liang+ 2017] 79
  • 80. Reuse Intermediate Values Key (Embedding) Variable (Symbol) Value (Data in Computer) V0 R0 m.USA V1 R1 [m.SF, m.NYC, ...] ) !CityIn Argmax() Argmax( Softmax ACL [Liang+ 2017] 80
  • 81. Augmented REINFORCE ● REINFORCE get stuck at local maxima ● Iterative ML training is not directly optimizing the F1 score ● Augmented REINFORCE obtains better performances 81
  • 82. State-of-the-Art on WebQuestionsSP ● First end-to-end neural network to achieve SOTA on semantic parsing with weak supervision over large knowledge base ● The performance is approaching SOTA with full supervision ACL [Liang+ 2017] 82
  • 83. Plan ● Weak Supervision NLP ○ NLP, AI, software 2.0 ○ Semantics as a foreign language ○ Unsupervised learning ○ Knowledge representation (symbolism) ● Semantic Parsing Tasks ○ WebQuestionsSP, WikiTableQuestions ● Neural Symbolic Machines (ACL 2017) ○ Compositionality (short term memory) ○ Scalable KB inference (symbolism) ○ RL vs MLE ● Memory Augmented Policy Optimization (NIPS 2018) ○ Experience replay (long term memory & optimal updating strategy) ○ Systematic exploration ○ Memory Weight Clipping (unbiased cold start strategy) Access slides and join discussions at weakly-supervised-nlu google group Mobile Desktop
  • 84. ● MLE optimizes the log likelihood of target sequences ● RL optimizes the expected reward under a stochastic policy What is RL? [Williams 1992] [Sutton & Barto 1998] Directly Optimizing The Expected Reward 84
  • 85. Challenges of applying RL ● Large search space (sparse rewards) ○ Supervised pretraining (MLE) ○ Systematic exploration [Houthooft+ 2017] ○ Curiosity [Schmidhuber 1991][Pathak2017] ● Credit assignment (delayed reward) ○ Bootstrapping ■ E.g., AlphaGo uses a value function to estimate the future reward ○ Rollout n-steps ● Train speed & stability (optimization) ○ Trust region approaches (e.g., PPO) ○ Experience replay Book [Sutton & Barto 1998] NIPS [Abbeel & Schulman 2016] Our focus today 85
  • 86. Efficiency challenge ● RL is still far from data efficient ○ E.g. the best learning algorithm (DeepMind RainbowDQN) “passes median human performance on 57 Atari games at about 18 million frames (around 90 hours) of gameplay, while most humans can pick up a game within a few minutes.” ● How to improve its efficiency? Credit to Alex Irpan 2018 from Google Brain Robotics 86
  • 87. Applying RL to NLP ● Benefits of RL ○ Weak supervision (e.g., expected answer, user clicks) ○ Directly optimizing the metric (e.g., F1, accuracy, BLEU etc.) ○ Work with structured hidden variables (e.g., logical forms/programs) ● Challenges with existing solutions ○ Large search space sparse reward often leads to slow and unstable training ○ Spurious reward often lead to biased solutions 87
  • 88. RL models generate their own training data ● Training sample management issue ○ Too many low quality examples ⇒ slow training ○ Boosting high reward experience ⇒ biased training Training samples LearnerActor Updated Model 88
  • 89. Complementary Learning Theory [McClelland+ 1995] [Kumaran+ 2016] Encoder Episodic memory Record & Replay Primary sensory and motor cortices Observations 89
  • 90. Most of the past experience are not helpful for improving the current model http://insideout.wikia.com/wiki/Long_Term_Memory 90
  • 91. Mammals learn from “interesting” dreams ● In early 2000s, scientists discovered that animals have complex dreams and are able to retain and recall long sequences of events while they are asleep. ● Recent studies indicate that by consolidating memory traces with high emotional / motivational value, "sleep and dreaming may offer a neurobehavioral substrate for the offline … learning” Image courtesy of Kote on Drawception.com, 2012 91
  • 92. Linear combination of maximum likelihood objective and expected return: λ log-likelihood on a top-k buffer + (1 - λ) expected return ● Not robust against spurious programs ● The composite objective is ad-hoc, and the gradient is biased Augment REINFORCE with Memory [Liang+ 2016] [Abolafia+ 2017]
  • 93. Spurious programs: right answer, wrong reason Which nation won the most silver medal? ● Correct program: (argmax rows “Silver”) (hop v1 “Nation”) ● Many spurious programs: (argmax rows “Gold”) (hop v1 “Nation”) (argmax rows “Bronze”) (hop v1 “Nation”) (argmin rows “Rank”) (hop v1 “Nation”) ...
  • 94. Combating spurious rewards ● Reinforce a rewarded experience only if the model (current policy) also thinks that it is the right thing to do Correctness of the behavior Experiences Model’s preference Reinforce of the behavior 94
  • 95. Memory Augmented Policy Optimization (MAPO) Memory Learner Actor Good return? Systematic Exploration Model checkpoint NIPS [Liang+ 2018] 95
  • 96. Lower the gradient variance without introducing bias Given a memory buffer of (sequence, return) pairs: , re-express expected return as, ● Importance sampling ○ Sample more frequently inside the buffer ○ Rejection sampling for samples outside the buffer. NIPS [Liang+ 2018]
  • 97. Optimal Sample Allocation Given that we want to apply stratified sampling to estimate the gradient of REINFORCE with baseline under 1/0 rewards. It can be shown that the optimal strategy is to allocate the same number of samples to reward vs no reward experiences NIPS [Liang+ 2018]Details Image source: Guy Harris, 2018 How to Give Feedback in a Non-Threatening Way
  • 98. Comparison of model update strategies On-policy optimization (REINFORCE) Iterative Maximum Likelihood (IML) Maximum Marginal Likelihood (MML) MAPO Correctness of the behavior Experiences & Reward Model’s preference NIPS [Liang+ 2018] 98
  • 99. Memory Weight Clipping ● Policy gradient methods usually suffer from a cold start problem, because the model probabilities P to good experiences are very small initially ● We adopt a clipping mechanism to ensure that the buffer probability is larger than a threshold ɑ NIPS [Liang+ 2018] 99
  • 100. Memory Weight Clipping Memory Actor max(P(B), α) 1- max(P(B), α) Trade off bias in the initial stage for faster training NIPS [Liang+ 2018]
  • 101. Training becomes less biased over time NIPS [Liang+ 2018] 101
  • 102. Comparison ● The shaded area represents the standard deviation of the dev accuracy ● REINFORCE does not work at all ● MAPO is slower but less biased NIPS [Liang+ 2018] 102
  • 103. SOTA results with weak supervision NIPS [Liang+ 2018] 103
  • 104. Actor 1 Actor 2 Actor n …... Learner Train set shard 1 Train set shard 2 …... Checkpoint queue Sample queue EvaluatorDev set Train set shard n Envs 1 Envs 2 Envs n Dev envs Scale up: Distributed Actor-Learner architecture [Liang et al, 2017; Espeholt et al, 2018; Liang et al, 2018]
  • 105. ● Weak Supervision NLP ○ NLP, AI, software 2.0 ○ Semantics as a foreign language ○ Unsupervised learning ○ Knowledge representation (symbolism) ● Semantic Parsing Tasks ○ WebQuestionsSP, WikiTableQuestions ● Neural Symbolic Machines (ACL 2017) ○ Compositionality (short term memory) ○ Scalable KB inference (symbolism) ○ RL vs MLE ● Memory Augmented Policy Optimization (NIPS 2018) ○ Experience replay (long term memory & optimal updating strategy) ○ Systematic exploration ○ Memory Weight Clipping (unbiased cold start strategy) Access slides and join discussions at weakly-supervised-nlu google group Mobile Desktop Thanks!