Wed 1130 aasman_jans_color

When a relational database doesn’t
work

And why a graph database might help

Contents

• Franz and customers
• Two Use Cases
– Amdocs: a real time semantic platform for telecom that
knows everything about everyone in real time
– Real time news and social network analysis using the
Linked Open Data Cloud
Linked Open Data Cloud
• Scalability?
• Integration with other NoSQL databases – Solr, MongoDB
g , g

Franz Inc Who We Are
Franz Inc – Who We Are
• Private, founded 1984
• We are an AI and
Semantic Technology company
• Out of Berkeley
Out of Berkeley

(1 (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15)(16 17) (18 19 20 21 22 23 24 27 28) (29 30))

Bob

Craig Alice

Bill

How is it different from an RDB
and why is it more flexible?
d h i i fl ibl ?
• No Schema.
– Say whatever you want to say but
– ontologies may constrain what you put in triple store
• No Link Tables
– because you can do one‐to‐many relationships directly
• No Indexing Choices
– Can add new data attributes (predicates) on‐the‐fly that
will be real time available for querying, because
will be real‐time available for querying because
everything is automatically indexed.
• Takes anything you give it: it is trivial to consume
– Rows and columns from RDB, XML, RDF(S), OWL, Text and
Extracted Entities, JSON

AllegroGraph: RDF Graph Store
AllegroGraph: RDF Graph Store

Backup/Restore REST

Replication
Rules
Rules Java‐
Java
Sparql Prolog Geo SNA Time RDFS+
Clif++ Script
Warm Failover

Security Session Management, Query Engine, Federation

Management
Storage layer ( compression, indexing, freetext, transactions )

Use Case Amdocs
Use Case Amdocs

Build a semantic platform
that knows everything
about everyone
b
in real time.

Telco Call Center Volume
Quadruples
Quadruples
Since 2007

• On average, each call
– Lasts 10 minutes
– Go thru 68 screens
• One call costs 3 months’ profit from that customer
One call costs 3 months profit from that customer
• It’s getting worse every day!

Typical Interaction Begins in the
Dark
Bill

Past
Payments
Plan The unknown – why
calling? How to help?
g p
Calculator
(avg peak Device
usage)

No real‐time context
Past
Statements Interactions
(Memos)
g g
‐ insight & guidance

High AHT, poor FCR, low customer and agent satisfaction

AIDA Maps Events to
Concepts
C t
Events from many source systems are transformed into a set of related business concepts
Many events
Triple Store with business concepts
Interactions
Orders
Bills

Payments

Collections

Charge dispute
g p
Customer
Pay instructions
Subjective "good payer"
Individual
Patterns
a e s "always pays 2 days late"
a ays pays days a e
Device Activated
Trends “improving payer"
Device heartbeat
Geospatial “within 5 miles of the tower"
Subscriptions
Time Chronology of events “within 5 minutes of an outage"
Device h
D i changes
Probability “probably will call about the bill"
Absence of occurrence “missed payment"
Relationship between " friend of a friend"

Events Decision Engine Actions
SBA Application Server
Container
Container
Amdocs Amdocs
Event Collector Integration
Event Framework
Ingestion Inference
Inference
Engine
(Business
Events
Rules)
Bayesian
y
Scheduled
Belief
Events
Network
RM CRM OMS CRM

“Sesame”
Operational Systems
NW Web 2.0

Event Data Sources AllegroGraph
Triple Store DB

AIDA Event Collection
AIDA Event Collection
Inference &
Amdocs Event Collector
Amdocs Event Collector Decision

Event Sources Collection Parsing Mapping Publishing Ingestion

• Events are collected from many heterogeneous,
configured event sources
– Phone calls, texting, video upload, roaming, etc.
Phone calls texting video upload roaming etc
– iTune download, web site interaction, media upload
– Emails, support calls
– Bill payment or non‐payment
Bill payment or non payment
– Phones stop working or disconnect
• All fused and mapped into a single event
knowledge base

AIDA Semantic Inference
AIDA Semantic Inference
• Define rules to operate to create higher level concepts
– Event (mapping) rules ‐ Map event data into the domain ontology
– Automatic rules – Compute new properties defined by the ontology
– On‐demand rules ‐ perform inference for the services
• Rules triggered upon event ingestion, service request or schedule
• Semantic rule inference generates new triples from existing ones

Charges Amount
Bills
Payment
Payments
P t Due Date Pattern
P

Make Good
“Timeliness”
Customer
Bad

Devices Model Early
Improving
Late
Worsening
Status
OnTime

Semantic Inference – Using Business
Rules to generate high level concepts
R l hi h l l
• AIDA provides “Late Payment” defined in Workbench

Workbench for business
rule construction
• Utilizes a sophisticated
magnetic block GUI for
business analysts
b i l
• Rules triggered to infer
and generate new
business concepts
business concepts

Each business rule defines an attribute. This rule defines
rule PaymentDetails.timeliness
an attribute of the PaymentDetails class called timeliness
{
if date within EarlyPeriod days after customerBill.billDate
then timeliness = Early ;
else if date not within LatePeriod days after customerBill.billDate
then timeliness = Late ; Java code
else timeliness = OnTime ; All classes and their attributes are
} defined in the application ontology

Decisioning – Probabilistic
Assessment
• AIDA incorporates also Bayesian Belief Networks (BBN)
• These are graphical models for reasoning under uncertainty
• Important part of decision making – the likelihood of something happenning
estimated by how often it occurred in the past (primarily used in medical research
until recently)
til tl )
• Evidence consists of observations on certain nodes leading to conclusions

Evidence Conclusions

Bill
Expect Payment
Arrangement
Setup
Payment
Pattern

Expect
Payment
Payment

Presenting insight to the CSR
ese t g s g t to t e CS

Process opens
Prediction on reason for the
Prediction on reason for the
relevant screen for
call – ranked by probability
reference and action

Presentation of recent
interactions and events
d

Prioritized Recommended
treatment and script

First application: CRM
Amdocs Guided Interaction Advisor

First Call Resolution
First Call Resolution
• Increase up to 15%

Average Handling Time
• Reduce up to 30%

Training Costs
•R d
Reduce up to 25%
25%

Triples all the way down
Triples all the way down

So why a triple store
So why a triple store

• Flexibility, flexibility and flexibility
y, y y
– Change the schema on a daily basis
– Customers create new policies which in turn will create
new schemas on the fly
• Needed to work with meaning
– Rdf describes data
Rdf describes data
• Needed to be declarative for everything
– Most RTBI is a combination of data in the DB and java
Most RTBI is a combination of data in the DB and java
variables in the application.

Text Intelligence for DOD/IS
Text Intelligence for DOD/IS

How would you do this with
your standard search engine
d d h i
• Give me a newspaper text with a republican and a democrat that serve on
two subcommittees that have the same parent committee.

[ | p ] p
• Which [democrat|republican] is most vocal in the oil spill disaster

• Given this text, find all the other texts that have the same people and the
same main topics but not democrats in the text.
same main topics but not democrats in the text

• Which newspaper favors [democrats|republicans]

• Which [democrate|republican|senator|representative] get most of the
attention in the last week.

• Give me the distribution of the most important topics yesterday

The process
The process

• We spider daily >  300 on‐line newspapers and thousands of
p y p p
blogs

• And search specifically for all the member of the senate and
house of representatives and the executive branch

• Apply entity extractor to the text and extract main concepts
– About 150 triples per text…
p p

• Hook up these concepts with a detailed database of  each
politician and with information from the linked open data
cloud

From News Article to
From News Article to

• People (has‐people)
p ( p p )
– And their roles
• Places (has‐places)
– And the county, state, country they are in
• Organizations (has‐organizations)
– Government departments, company names, etc.
• Main Categories (has‐domains)
– Politics sports ministries energy finance economics
Politics, sports, ministries, energy, finance, economics,
ecology, oil, mining industry, etc..
• Main Concepts (has‐main‐groups)
– Other important nouns and phrases in a text

LOD cloud Sept 22 2010
LOD cloud – Sept 22 2010

latest LOD cloud

How scalable is this?
How scalable is this?

Queries

• Query planner now takes 99% of SPARQL 1.0, automatically
Q yp Q , y
compiles it into query graph flow language…

You can write this by hand if you
want to optimize yourself.
i i lf

This will actually work on Prolog
with rules too!
ih l !

Query performance notes:
Wins
i
• Indices are small enough to fit in memory of convential
g y
machines

• Simultaneous access to indices (see next slide)

• Pipe line architecture
Pipe line architecture
– Stream based processing (all nodes can be active in
p
parallel. Most nodes can begin before the end of data is
g
reached.)

Wed 1130 aasman_jans_color

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à Wed 1130 aasman_jans_color

Similaire à Wed 1130 aasman_jans_color (20)

Plus de DATAVERSITY

Plus de DATAVERSITY (20)

Dernier

Dernier (20)

Wed 1130 aasman_jans_color