More Related Content Similar to NOVA Data Science Meetup 1/19/2017 - Presentation 1 (20) More from NOVA DATASCIENCE (6) NOVA Data Science Meetup 1/19/2017 - Presentation 11. © 2015 International Business Machines Corporation1
IBM
© 20015 IBM Corporation
January 2017
Frank Stein, Informs Certified Analytics Professional
Director of Analytics Solution Center
IBM
fstein@us.ibm.com
www.ibm.com/ascdc
Leveraging Information for Smarter Organizational Outcomes
Cognitive Computing:
At the Cross-Roads of Data
Science and Natural Language
Processing
2. © 2015 International Business Machines Corporation2
IBM
Agenda
Part 1 Cognitive Computing (Frank Stein)
What is Cognitive Computing?
Real-Life Examples
Playing with Cognitive
Part 2 Statistical NLP (Mona Diab)
Q&A
3. © 2015 International Business Machines Corporation4
IBM
China
Almaden
Austin
Tokyo
Zurich
IndiaHaifa
IBM Research: The journey to Watson
Machine
Learning
Natural
Language
Processing
High
Performance
Computing
Knowledge
Representation
and
Reasoning
Question
Answering
Technology
Unstructured
Information
Management
Watson
4
Ireland
Australia
Brazil
Africa
Tokyo
4. © 2015 International Business Machines Corporation5
IBM
Businesses are “dying of thirst in an ocean of data”
80%
of the world’s data
today is
unstructured
90%
of the world’s data
was created in the
last two years
1 in 2
Business leaders
don’t have access
to data they need
5. © 2015 International Business Machines Corporation6
IBM
Data is growing exponentially – A Problem or Opportunity?
44 zettabytes
unstructured data
structured data
20202010
You are here
6. © 2015 International Business Machines Corporation7
IBM
Watson answers a grand challenge
Can we design a computing system that rivals a human’s ability to answer
questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
8. © 2015 International Business Machines Corporation9
IBM
1900 1950 2011
Watson is ushering in a new era of computing . . .
With the goal to create a new partnership that enhances, scales
and accelerates human expertise.
9. © 2015 International Business Machines Corporation10
IBM
We Rely on Many Types of Analytics to Process Data
Descriptive
Predictive
Prescriptive
Cognitive
What happened; a single source of the truth
What will happen and implications
What should we do
The way we think
10. Three capabilities differentiate cognitive systems from
traditional programmed computing systems…
Reasoning
They reason. They understand
underlying ideas and concepts.
They form hypothesis. They
infer and extract concepts.
Learning
They never stop learning
getting more valuable with
time. Advancing with each
new piece of information,
interaction, and outcome.
They develop “expertise”.
Understanding
Cognitive systems
understand like humans
do.
…. allowing them to interact with humans.
13. 1
5
Examples include:
Analyst reports
tweets
Wire tap transcripts
Battlefield docs
E-mails
Texts
Forensic reports
Newspapers
Blogs
Wiki
Court rulings
International crime
database
Stolen vehicle data
Missing persons
data
Data, information, and expertise
create the foundation.
Cognitive systems rely on collections of
data and information:
14. © 2015 International Business Machines Corporation16
IBM
Unstructured Information Management Applications (UIMA)
Questions
Ingested Corpus
of
User Domain Info
Watson Advisor Cognitive Computing Pipeline Architecture
Answers
Scores & Evidence
Primary
Search
Candidate
Answers
Answer
Scoring
Contextual
Scoring
Trained
Models
Evidence
Retrieval
Question
Analysis
Hypothesis
Generation
Scoring
Final
Merging &
Ranking
15. © 2015 International Business Machines Corporation17
IBM
The main BlackEnergy executable
being dropped from the Excel
Spreadsheet (vba_macro.exe)
executes an additional two binaries
that it creates: FONTCACHE.DAT
and runndll32.exe
Malware BlackEnergy
Software executable
Threat_Action dropped
Software Excel Spreadsheet
Indicator vba_macro.exe
Software binaries
Indicator FONTCACHE.DAT
Indicator runndll32.exe
Annotate
Identify mentions and relations in unstructured text.
Watson Knowledge Studio
16. © 2015 International Business Machines Corporation18
IBM
Teaching Watson
Ingestion Pipeline
Q/AFactoid
Knowledge
Canvassing
Knowledge
Graph
Domain content
Watson Knowledge
Studio
Define/train
annotators
SIRE
Ground Truth Information
Q&A Training
Watson
Runtime
18. © 2015 International Business Machines Corporation22
IBM
Where do you need a deeper bond with
your organization, client, constituent?
E.g. Staples, Hilton (Pepper the Robot)
Where do you need to have everyone
perform as an expert? (Augmenting
Intelligence)
Watson Oncology Advisor, Watson
Teacher Advisor
Embedded Cognition
Whirlpool (kitchen appliances),
Medtronics (insulin pump), GM (cars)
Cognitive Business Processes
Airbus Smarter Fleet Management
Discovery, Research
Watson for Drug Discovery with
Ontario Brain Institute
What will you do with Cognitive?
19. © 2015 International Business Machines Corporation23
IBM
Watson Oncology Cognitive Assistant:
Helping oncologists treat cancer patients
Business problem:
Need better individualized cancer treatment plans
Solution:
• Suggestions to help inform oncologists’ decisions
based on 600K+ pieces of evidence and 2M pages of
text from 42 publications
• Analyzes patient data against thousands of historical
cases and trained through 5000+ Memorial Sloan-
Kettering MD and analyst hours
• Evolves with the fast-changing field
Attacking the cause of
one in four deaths
IBM Watson
Oncology
Built with Memorial Sloan Kettering
20. © 2015 International Business Machines Corporation27
IBM
Grammy-winning music producer Alex Da
Kid used Watson’s technology to inspire his
new song about heartbreak, “Not Easy.”
Watson analyzed the last five years of
culture and music data
To identify the most pervasive themes,
Watson Alchemy Language API used to
read and understand Nobel Peace Prize
speeches, New York Times articles, etc.
The Watson Tone Analyzer API then
ingested more than 2 million lines of related
social content to understand the emotional
sentiment
Used Watson Beat - a cognitive technology
that understands music and lets artists
change the sound of a song based on the
mood they want to express
Cognitive Creativity
21. © 2015 International Business Machines Corporation34
IBM
…and more new Watson Services APIs continue to emerge frequently
Watson is available as a set of services delivered as APIs in the
Cloud bluemix.net
22. © 2015 International Business Machines Corporation35
IBM
AlchemyLanguage
Twelve APIs around text analysis service functions, each of which uses sophisticated natural language processing techniques to
analyze your content and add high-level semantic information
Entity Extraction: what are the entities (people, places, organizations, etc.) in text
Sentiment Analysis: how are people talking about the entities (positive, negative)
Keyword Extraction: identify important topics in content
Concept Tagging: high-level concepts in text (e.g. article is about monetary policy)
Relation Extraction: subject / action relations between entities
Taxonomy Classifier: hierarchical categorization (finance/personal finance/credit card)
Author Extraction: who wrote the article
Language Detection: what language is this written in
Text Extraction: extract the important parts of text within an article
Microformat Parsing: enhances webpage categorization and indexing and to perform content
discovery tasks
Feed Detection: discover new content, including blog posts, news articles and comment streams.
Linked Data Support: bring any content into the semantic web
23. © 2015 International Business Machines Corporation36
IBM
Watson Data Platform allows employees to work
together to gain insight from data.
Enables collaboration of Data Scientists, Data Engineers,
Business Analysts and Developers
Provides data cleansing, visualization and sharing
capabilities
Support for analytic notebooks
Supports R, python, Scala, Rstudio, Shiny, and sparklyr (R
interface to Spark), Java
Watson Machine Learning built on Apache Spark
automatically can build models on structured and
unstructured information
Apache SparkML (also available from Bluemix.net )
Cognitive Assistance for Data Science technology scores
machine learning algorithms against the data to
recommend best match
Watson Data Platform with Machine Learning (new 2017)
25. © 2015 International Business Machines Corporation38
IBM
MLK Speech Analyzed by Personality Insights
26. © 2015 International Business Machines Corporation39
IBM
Obama Farewell Speech Analyzed by Personality Insights
27. © 2015 International Business Machines Corporation41
IBM
Better Data = Better Outcomes, need to curate the ingested corpus, be careful
about the ontology
Significant Upfront work training the system – but it will pay off as the system
improves over time
Cognitive and Cloud Services go together
Domain adaptation requires domain expertise
Address user anxiety over AI
Partnership on AI – established with Microsoft, Amazon, Google and Facebook
Will conduct and publish research in such areas as Ethics,
Fairness/inclusiveness, transparency, privacy; trustworthiness, reliability,
and robustness
What we’ve learned so far