7. Required Knowledge: Who
Ginni is, what 'behind'
means
Possible answers: 'a sign'
'the ibm logo' 'multiple signs'
Why NLP is hard
• World Knowledge
– What’s behind Ginni?
13. Type 1: Applications
• These are things you all want to do
– Natural Language Generation
– Summarization
– Dialog
– Machine Translation
– Question Answering
– Sentiment Analysis
14. Type 2: Tasks
• These are usually precursors to applications
– Part of Speech tagging
• Identify which part of speech a word is
– Lemmatizing
• Running run, ran run
– Multiword Expression (Idiom) Identification
• ‘Kick the bucket’ can’t become ‘kicking the bucket’
– Parsing
– Word sense disambiguation
• ‘bank’ as in money vs ‘bank’ as in river
– Conference/Anaphora Resolution
• Sally didn’t know what to do with all the money she made by
becoming a Watson Ecosystem partner
16. How to Solve NLP Problems
• Sample problem: Relationship Extraction
• Input:
– Today I saw Ginni Rometty give an amazing talk about
Watson. She was a fantastic speaker, I want her to give a talk
at my organization.
• Goal: Extract all relations involving people.
• Solution: Chain a bunch of tasks together until you have
enough information to extract relations
– Going to focus on the pipeline process, not implementation
17. Sentence Splitting
• Input:
– Today I saw Ginni Rometty give an amazing talk about
Watson. She was a fantastic speaker, I want her to give a talk
at my organization.
• Output:
– Today I saw Ginni Rometty give an amazing talk about
Watson.
– She was such a fantastic speaker, I want her to give a talk at
my organization.
• Food for thought:
– How would we split if it said Mrs. Rometty?
18. Pop quiz: How many POS tags are there?
− 36
Today I saw Ginni Rometty give an amazing talk about Watson.
− Today = Adverb
Today is a great day
− I = Pronoun
− Saw = Verb
Why not the cutting tool?
− Ginni = Proper noun
Part of Speech (POS) tagging
24. Relationship Extraction
• Today I saw Ginni Rometty give an amazing talk about
Watson. She was such a fantastic speaker, I want her to
give a talk at my organization.
• Which is right?
– Give (an amazing talk about Watson, Ginni Rometty)
– Give (an amazing talk, Ginni Rometty)
– Give (talk, Ginni Rometty)
• Extracting speaker (Ginni Rometty)
– How do we know that (she == Ginni Rometty) but (I ! = Ginni
Rometty)
25. Relationship Extraction
• How do we express that the talk hasn’t happened yet?
– What if the sentence was “I want her to give another talk at my
organization”
• “She was a fantastic speaker”
– Was it this time only? Or is this a property of Ginni?
26. What did we learn?
• NLP is hard
– Language will always surprise you.
• Everything is a pipeline
– Part of Speech tagging Parsing Entity Detection
Relationship extraction
– What if we had identified “Ginni Rometty” as a verb?
27. How do you build applications that
deal with language?
• See what progress has been made on your problem
– “basically solved”
– “good progress”
– “here be dragons”
28. What problems have we basically solved?
• Part of Speech tagging
– I went to the store Store
• Lemmatizing
– Running run
• Morphological segmentation
– Running run + ing
• Sentence Splitting
• Tokenizing
– Breaking sentences into ‘words’
29. What problems have we made good progress on?
• Machine Translation
• Parsing
• Search
• Coreference Resolution
• Sentiment Analysis
• Relationship Extraction
• Word Sense Disambiguation
• Idiom Identification
31. More Reasons Why NLP is hard
• Accept that things will go wrong
– Nothing in NLP ever has 100% accuracy
• Accept that NLP numbers are uncomfortable
– 50% accuracy can be very good
– Going from 72% 74% accuracy can be a HUGE deal
• Embrace Cognitive Computing
– “While they’ll have deep domain expertise, instead of replacing
human experts, cognitive computers will act as a decision
support system and help them make better decisions based
on the best available data, whether in healthcare, finance or
customer service.”
32. It's difficult to do NLP if it's not your core business
competency
Are there any companies that can help?
38. Summary
• NLP is hard to tackle on your own
• But if your application involves users, NLP can provide
huge value
• The best way to get that value without all the hard work is
to become an Ecosystem Partner
39. What now?
• Explore the Watson APIs:
http://www.ibm.com/smarterplanet/us/en/ibmwatson/devel
opercloud/services-catalog.html
• Apply to become an Ecosystem Partner:
http://www.ibm.com/smarterplanet/us/en/ibmwatson/ecos
ystem.html