3. Augmented Intelligence:
Helping humans make
smarter decisions
HPE Big Data – Cognitive Analytics
Platform for Text and Rich Media
John Horman – Chief Field Technologist, HPE Big Data
5. Great Expectations...
Asimov’s Three Laws of Robotics:
1. A robot may not injure a human being or,
through inaction, allow a human being to come
to harm.
2. A robot must obey the orders given it by human
beings except where such orders would conflict
with the First Law.
3. A robot must protect its own existence as long
as such protection does not conflict with the First
or Second Laws.
5
6. Next attempt: neural networks
Attempt to imitate human learning
– Much closer match to human intelligence, contrasted well against rigid logic-based approaches
– Captures our fuzziness, ability to fail gracefully (i.e. guess)
– Very effective but unfortunately still only applicable to specific applications like speech
recognition, image recognition.
6
7. Which brings us to today
Smart machines, but very specific domains
– Very often better than humans, but only in the
specific tasks for which they were built
– However, they lack general intelligence, eg.
- Learn abstract concepts
- Think cleverly about strategy
- Compose flexible plans
- Make a wide range of ingenious logical deductions
- …
– Immense social change needed for human
acceptance of AI and delegating control
7
9. How do you bridge the gap between data and outcomes?
9
How do you consume
any data generated
or understood by
humans?
How do you identify
key aspects and
patterns to determine
outcomes?
How do you
automate to take
action?
Data sources Diverse Modern
Apps
Q1 Q2 Q3
10. Augmented Intelligence
power apps for competitive advantage
10
Augmented Intelligence
powered by HPE
Artificial intelligence, machine learning and natural
language processing using advanced analytics functions.
11. Human data
Connected people, apps and things generating massive data
in many forms
Machine data
Business data
faster growth
than
traditional
business data
10x
13. Human Information is made up of ideas, is diverse and has context
Why is processing human data different?
– Ideas don’t exactly match like data does; they have distance.
– Human Information is not static – it’s dynamic and lives everywhere.
– Legacy techniques have all fallen short.
Social Media Video Audio Email Texts Mobile
Transactional Data IT/OTDocuments Search Engine Images
14. Using Cognitive Analysis to form a human-like understanding of content
HPE Natural Language Processing (NLP)
engine
Fundamentally created to understand natural
human language using probabilistic modeling
and NLP algorithms
• Allows incoming data to dictate the model,
not pre-defined rules, dictionaries, or semantic webs
Self-Learning / Machine Learning
• Updates as more data is added or removed
• Adapts to changing definitions or meaning
Fundamentally language-independent
• Treats words as symbols
Optimized with language packs
• Eduction, sentiment analysis, speech analytics
Information Theory
and Bayesian
Inference
16. Unfortunately, most existing structured data solutions are full
of compromise
Traditional Enterprise Databases
–The original SQL databases did not envision today’s
data volumes
–Vendors scrambling to handle bigger data volumes
by tacking on Hadoop technologies and retrofitting
legacy technologies
–Either use reduced data sets or eye-watering costs
Hadoop-Based Solutions
–Major Hadoop vendors strive to meet the standard
with SQL on Hadoop
–NoSQL is incomplete SQL
–Analytics Performance is very limited
–Not a substitute for a full implementation of SQL
Manage
Huge Data
Volumes
Deliver
Fast
Analytics
Compromise
17. HPE Augmented Intelligence Real-Time
Analytics
Leverages BI, ETL,
Hadoop/MapReduce
and OLTP investments
No disk I/O bottleneck
simultaneously load &
query
Native DB-aware
clustering on low-cost
x86 Linux nodes
ML algorithms, such as
k-means and
regression, built into
the core engine
Automatic setup,
optimization, and DB
management
Up to 90% space
reduction using 10+
algorithms
50x – 1000x faster
than traditional
RDBMS
Scales from TB to
PB with industry-
standard hardware
Simple integration
with existing ETL
and BI solutions
SQL-99+ compliant
Ultimate deployment
flexibility
Extended advanced
analytics
24/7 Load & Query
17
21. HPE Haven OnDemand Combination APIs
Reusable Machine Learning building blocks for cognitive apps and services
Machine Learning API Combinations
Reduce Implementation Effort and Accelerate Development – 75% faster to build apps
Your Apps
2. Copy & Paste1. Select
22. Search video as easily as text
Transform rich media into intelligent assets
Inquire
“Search your data”
Investigate
“Analyze your
data”
Interact
“Personalize your
data”
Improve
“Enhance your
data”
Live video or playback
from archived footage
On-screen text
recognition
Face identification
Automatically generated
transcript using speech
recognition
Speaker identification
Timecode
synchronization
Automatic keyframe
generation
Automate
Automatically create metadata,
keyframes, transcriptions
Understand
Understand video footage and
audio streams in real time
Act
Apply advanced analytics such as
clustering and categorization, and link
with other file types
23. Intuitive Knowledge Discovery for Self-Service Analytics
23
Visualization to simplify analytics workflow Topics Map
Sunburst
Result Comparison
Rich Contextual View
Business Intelligence for Human Information (BIFHI)
24. HPE Virtual Assistant – Cognitive Chat Bot
An illustrative case study
A few ways to approach this:
1. Build a big long list of 5,000-10,000 Q&A pairs
Not really cognitive AI though is it?
2. Build a cognitive solution that automatically
extracts answers from data
Conceptually understands the ideas and meaning.
Seamlessly combines multiple analysis techniques
(Probabilistic Conceptual Analysis, Machine Learning,
Neural Networks, etc.)
24
This kind of full automation requires a platform with a few pre-requisites:
1. Universal connectivity, out of the box
2. Automatic processing and fact extraction
3. Cognitive Analytics platform supporting all data formats and including a broad range of
algorithms
25. 25
HPE Augmented Intelligence automatically identifies and
extracts facts from documents
ASOS Annual Report 2015
ASOS Summary
Chief Executive Officer = Nick Beighton
Total revenue growth = 18%
Profit before tax = £47.5m
Cash position = £119.2m
UK Retail sales = £473,885,000
Group total revenues = £1,1550,788,000
…
• Language independent
• Automatic table recognition and field extraction
HPE
Augmented
Intelligence
26. HPE cognitive analytics is trained to understand user dialogue,
and continues to learn from each user interaction
26
Loan
Multilateral
Loan
Bilateral
Loan
Trade
Finance
Guarantee
Cash
Management
/ ALM
Investing
“I’m interested in
borrowing money
to invest in a new
production line”
“I’m not sure I
completely
understood you.
Did you want a
loan; or were you
asking for a credit
line, or securities
account and
brokerage
service?"
Intent Score
Loan 0.72
Credit Line 0.58
Investing 0.49
User
HPE Virtual
Assistant
28. Summary
• Artificial Intelligence is not here yet, and likely will not be for some decades at least
• Instead, the focus in on Augmented Intelligence – using machines to make people smarter and
more effective
• The key to success and achieving business value is agility and innovation. Build fast, fail fast.
• Everything is derived from the data – never underestimate the importance of being able to
access, ingest and process the raw data
• A broad range of analytic tools and algorithms are key to this agility and innovation. An open
and transparent architecture is critical for futureproofing and allowing for further innovation.
• Only HPE’s pioneering AI platform is uniquely able to facilitate all of the above – through
connectivity, breadth of analysis, and ease of application development and innovation.
28
Notes de l'éditeur
Transparenta ringar.
Logic, rule-based problem solving approach
Great early progress, but very difficult to extend to wider variety or harder problems.
-> combinatorial explosion, hardware limitations as well as programming
Chess example
If we still don’t fully know how the human brain works…
Simple neural network models known since the 50s
But took the recent advancement of computing power to make them practically feasible
IBM Watson – great if you want to play Jeopardy, but struggle to apply this to other applications
Tesla Autopilot – fantastic at driving, almost certainly better than humans, but not that great at conversation
Social acceptance: Tesla, travel booking example
Broad idea of AI is an abstraction, and maps down lower and lower to a wide set of precise functionality
Very clearly not a mature, commoditised area. Therefore success in Augmented Intelligence is achieved through agility and being able to leverage these many functions quickly and easily. Build fast, fail fast. Open architecture is key.
Connect: Thousands of connectors for text, video, image and audio available designed to handle enterprise and government
scale volumes of data. Available in over
155 languages + 55 languages for speech to text”
Process & Analyze - Established and proven technology to determine outcomes bas
Build - Portfolio of hundreds of advanced analytics functions and APIs to automate and take actioned on machine learning and deep neural networks to enrich data findings, identify key aspects and patterns
Today, being data driven is about:
• Harnessing all the relevant data available today and in the future – including business, human and machine
• Democratizing the data by empowering and delivering insights for all stakeholders collaboratively in your organization – from LOB leadership, operations, line workers, etc. – irrespective of level or function, in-real time, at the moments that matter
• Operationalizing analytics through many applications, resulting in better results across your entire business/operations
• Achieving greater value through insight and foresight analytics – answering ‘why did something happen?’ or ‘what will happen?’ instead of just reactively what happened, so you can take action and be proactive
In yesterday’s data driven enterprises, analytics and insights were limited to (and for) traditional business data – the data generated from business-process applications like CRM, ERP, HRM, and supply chain. But as we have all seen, the data landscape has been radically changing over the past few years – 90% of the data available today was created in just the last 2 years - and the landscape will continue to change due to the fastest growing data segments: Human and Machine.
Human data includes all the content we create – some of which is highly regulated for compliance purposes (contracts, legal docs) , but much of it is social media, emails, call logs, images, audio, and video.
Machine data is the complete opposite of Human. It’s the high-velocity information generated by the computers, networks, and sensors embedded in just about everything—the Internet of Things.
Together, Human Data and Machine Data are growing 10x faster than traditional Business Data, and organizations that are data-driven are not only able to leverage this data to create new value, but they are able to bridge the interconnection of data across the silos and repositories for integrated intelligence.
For example, in retail – retailers can maximize customer loyalty across multiple channel by integrating data from real-time inventory, in-store location positioning sensors, RFID, and social media.
What do we mean by unstructured or human information? Well, anything that is produced by a human to be understood by a human – any of the information sources that we ourselves easily and naturally interact with every single day. The problem is, human information is very difficult for computers to process, and there are two key reasons for this:
1. Different words can have totally different meanings depending on the context, and these meanings can even change over time or completely new meanings can appear suddenly. For example, consider the term “ground zero” – before 9/11 this would most likely have referred to the centre of a nuclear test site, but immediately after took on a new meaning as the former site of the World Trade Centre in New York. Now both meanings exist, but the most likely reference is now to the latter concept or meaning.
2. Ideas and concepts expressed in human information have an idea of distance from each other, for example we understand that a “coat” and a “jacket” are almost the same thing, but are very distant from the concept of a “tree”, and may even be synonyms in certain contexts (eg. a police report, where some witnesses describe a suspect as wearing a dark coat and other a dark jacket, should be interpreted as the same thing). Understanding how closely related or otherwise different human information is essential to analysis and full understanding.
Both of these challenges are impossible to address with any keyword-based analysis platform, and as a result nearly all Big Data analysis solutions continue to focus on the traditional structured business data / BI analysis approach and just ignore the unstructured or human information. But human information makes up about 90% of the data in any organisation, so it is foolish to ignore such a significant proportion of data. Furthermore, human information is usually the key part of the data set that allows us to uncover and explain the “why” behind any patterns or trends discovered in the structured data – for example, hospital mortality figures may reveal a different survival rate for an operation over two sets of people, but only analysis of the relevant written consultant reports can reveal why that is and allow us to take action.
That is why there was a need to develop a new engine, from the ground up, specifically to process and understand human information in the same way that we as human do – the IDOL engine. Because of this, IDOL is able to understand the actual different meaning of different words and concepts in the unstructured data, and to build an understanding of how and in what way these different concepts are related to each other.
So how does it do it? There are two key algorithms that were used to develop this unique capability – Bayesian Inference and Shannon’s Information Theory. Many of you will have heard of the first, but I’ll give a very simple outline of each and how we’ve developed them to work together to achieve this human understanding of unstructured data.
So that is how we use those two algorithms to gain a human-like understanding of unstructured information in just the same way that the human brain does. The probabilistic approach to Natural Language Processing and Machine Learning has several advantages over other approaches such as linguistic analysis or semantic webs.
Firstly, linguistic and semantic web approaches both have significant limitations in that they need to have a set of predefines rules, dictionaries or tuples (semantic web) in order to make any sense of what they are processing, and these rules need to be constantly . So while they can sometimes perform well in very specific, niche domains or use cases, as a general analysis tool they are extremely limited. In contrast, IDOL applies machine learning over the data itself, allowing it to dictate the model – and as new words appear, or existing words take on new meanings, these changes are immediately identified, learned and understood (in just the same way you can learn the meaning of a new word through its context, seeing it used several times). In Big Data analysis applications, this is absolutely essential – because it’s very rare that you know exactly what is in the data you want to analyse (hence why you're analyzing it), and indeed it’s very often these new, unusual concepts or outliers that are of particular value.
Secondly, IDOL’s probabilistic model also has the advantage of making the analysis process entirely language independent. This is because IDOL is not trying to break down a sentence into nouns and verbs in a completely language-dependent way, but rather IDOL looks at each word as a symbol or black-box and builds up it’s understanding around how it is related to other symbols to determine its meaning. This means a single IDOL engine can natively index and understand human information in any of over 160 languages, irrespective of character set and so on.
Alison
Coupled with this, we offer rapid deploy services to help you eliminate challenges and get it right the first time. We are moving away from the build from scratch model to create
Our Methodology includes four phases:
Assess & Align: HPE consultants help you to identify key customer metrics that truly impact your bottom line and the data sources that contribute to these. We create an effective project plan based on blueprints that have been created based on best practices.
Map & Integrate: Our experts do the heavy lifting by normalizing and mapping your data into the Voice of the Customer solution. Our Rapid Deploy Package includes aggregating data from three data sources (social media, CRM, web analytics) in order to get you started. We configure the system to aggregate six months of historical data from each of these.
Configure & Connect: We configure your unique end user environment to meet the business needs identified in the Assess phase. We leverage our prebuilt reference queries to create various visualizations are the most meaningful to you and connect the reference platform configuration to your chosen data sources.
Test & Deploy: Before go-live, we ensure that your solution is up and running properly. After a series of User Acceptance Testing, the solution is ready for production. HPE consultants will provide comprehensive documentation and hand-off. User training is conducted at the facility or remotely. Knowledge transfer is completed so that your team fully understands their implementation.
What is it?
A new end-user GUI provides a straightforward analytics workflow for diverse use cases. BIFHI incorporates visualization functionality such as topic map to highlight key concepts, sunburst diagram to enable easy filtering based upon extracted entities (e.g. people, place, company),result set comparison to examine how a change of search parameter may impact the outcome and rich contextual view where the query result includes not only the document itself, it shows the metadata and other relevant information such as documents by the same author or documents from around the same period.
Why does it matter?
The intuitive interface enables business users to perform self-service analytics and shortens time to insight. For example, the user can search for a topic, visualize the result breakdowns on the main panel, and refine the search parameters on the side panel with automated guidance based upon IDOL’s deep understanding of queried data, and see real-time result changes, all within the same window.
So this is what we built – 2-3 resources, 3-4 weeks
Simplicity of what you see hides the intelligence – not scripted
Fully cognitive chat bot
Ask it anything (across the data domain)
No manual data loading