Everyone talks about the challenges of managing big data, but applications built for the next decade will need more than “bigger” and “faster” versions of the RDBMS systems that dominated at the end of the last century and updates to the NoSQL databases popularized at the beginning of this century. They will need tools that are optimized to manage streaming data and structures that map naturally to knowledge representations such as ontologies and taxonomies.
In this webinar, participants will learn:
Why graph database usage is growing rapidly and what to look for from vendors,
How transaction & analytic processing are converging in real time, and
How market leaders are building apps today with modern data management solutions, (short case studies from a variety of industries)
Smart Data Webinar: Emerging Data Management Options
1. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Emerging Data Management Options
Adrian Bowles, PhD
Founder, STORM Insights, Inc.
info@storminsights.com
2. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Basic Life Advice
“When the map and the terrain disagree, believe the terrain.”
Gause and Weinberg (Exploring Requirements)
It is the pervading law of all things organic, and inorganic, of all things physical
and metaphysical, of all things human and all things superhuman, of all true
manifestations of the head, of the heart, of the soul, that the life is recognizable in
its expression, that form ever follows function. That is the law.
Louis Sullivan: The Tall Office Building Artistically Considered, 1896
3. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
How You Think About a Domain…
…influences your choice of maps and models…
rules and representations…and required operations.
4. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
” To solve really hard problems, we'll have to use several different representations.
This is because each particular kind of data structure has its own virtues and
deficiencies, and none by itself would seem adequate for all the different functions
involved with what we call common sense.”
Marvin Minsky
6. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
What Do You Want/Need to Store?
How much? How complex? How fast?
What Do You Want/Need to DO With What You Store?
Do you need a graph database?
Options Include…
Files, tables, trees, queues, stacks, lists…
Hierarchical
RDBMS
Object DBMS
NoSQL
Graph
7. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Perception/
NLP
Problem Solving
& Learning
Simple:
deterministic,
retrieve/calculate
Complex:
probabalistic
hypothesize, test,
rank, select
Creative:
discover, generate
ORGANIZED
Memory*
Input Class/Type
Visual
Text
Image
Aural
Speech
Music
Cues
Noise
Informative
Touch
Temperature
Tactile
Texture
Taste
Smell
Response Types
Visible (to the environment)
Verbal/NL Text
Behavioral (system changes)
Haptics/Touch/Proprioception
Invisible
Memory updates
*Corpus including data in taxonomies, ontologies, trees…
8. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Graphs 101
A graph is a structure with vertices and edges.
9. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Graphs 101
A graph is a structure with vertices and edges.
a
e
dc
b
Old Post Road
Cross Highway
Main Street
Shinbone Alley
Elk Road
10. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Graphs 101
A graph is a structure with vertices and edges.
a
e
dc
b
Old Post Road
Cross Highway
Main Street
Shinbone Alley
Elk Road
Old Post Road Paved
Old Post Road 11 miles
Elk Road Dirt
Elk Road 2 miles
Cross Highway toll road
Cross Highway 250 miles
Main Street 1 mile
Shinbone Alley .5 miles
a bus stop
b gas station
b Shell
c Elementary school
d House
e Office building
May be labeled, edges may be directed, all may
be stored/processed by properties
represented as key/value pairs.
11. Copyright (c) 2014-2016 by STORM Insights Inc. All Rights reserved.
Obvious structure is easy to process…
but most of the interesting stuff isn’t obvious to a computer.
Vertices,
edges,
properties
should
represent data
with higher-
level structure.
12. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
You Probably Already Think In Graphs if…
You watch detective shows
You know trivia about movies
You remember relationships between people
You took a biology class
13. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
You Probably Already Think In Graphs if…
You took a biology class or played 20 questions (“animal, mineral or vegetable?”)
Wikipedia contributors. "Taxonomy (biology)." Wikipedia,
The Free Encyclopedia. Wikipedia, The Free Encyclopedia,
11 May. 2016. Web. 12 May. 2016.
14. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
You Probably Already Think In Graphs if…
You watch detective shows
Typical crazy wall whiteboard - from Fargo.
A screen from IBM I2 Coplink
15. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
You Probably Already Think In Graphs if…
You know trivia about movies
IMDB
16. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
You Probably Already Think In Graphs if…
You remember relationships between people
Family Tree
LinkedIn Tree
17. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Anonymized look at my
desk/wall on a typical day.
18. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Processes Can Be Represented As Graphs
19. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
A taxonomy represents the formal structure of classes or types of objects within a domain. Taxonomies are generally hierarchical and provide names
for each class in the domain. They may also capture the membership properties of each object in relation to the other objects. The rules of a specific
taxonomy are used to classify or categorize any object in the domain, so they must be complete, consistent, and unambiguous. This rigor in
specification should ensure that any newly discovered object must fit into one, and only one, category or object class.
20. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
1952 DSM I
1968 DSM II
Pervasive Developmental Disorder (PDD)
Childhood onset PDD Infantile Autism Atypical Autism
1980 DSM III
Taxonomies Evolve
The History of Autism in the Diagnostic & Statistical Manual of the American Psychiatric Association
Pervasive Developmental Disorder (PDD)
PDD-NOS Autistic Disorder
(Not Otherwise Specified)
1987 DSM III-R
Pervasive Developmental Disorder (PDD)
PDD-NOS Autistic Disorder Asperger Disorder Childhood Disintegrative Disorder Rett Syndrome
1994 DSM IV
2000 DSM IV-TR
Autism Spectrum Disorder (ASD)
2013 DSMV
21. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
1952 DSM I
1968 DSM II
Pervasive Developmental Disorder (PDD)
Childhood onset PDD Infantile Autism Atypical Autism
1980 DSM III
Pervasive Developmental Disorder (PDD)
PDD-NOS Autistic Disorder
(Not Otherwise Specified)
1987 DSM III-R
Pervasive Developmental Disorder (PDD)
PDD-NOS Autistic Disorder Asperger Disorder Childhood Disintegrative Disorder Rett Syndrome
1994 DSM IV
2000 DSM IV-TR
Autism Spectrum Disorder (ASD)
2013 DSMV
An ontology provides more detail than a taxonomy, although the boundary between them in practice is somewhat fuzzy. An ontology should
comprehensively capture the common understanding – vocabulary, definitions, rules - of a community as it applies to a specific domain.
22. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Key Concept… Graphs have well known mathematical properties:
e.g. If you represent a graph as a matrix M, then values in Mn
represent the number of paths of length n in the original graph.
a
e
dc
b
a b c d e
a 1
b 1
c 1
d 1
e 1
M =
23. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
a
e
dc
b
a b c d e
a 1
b 1
c 1
d 1
e 1
M2 =
Key Concept… Graphs have well known mathematical properties:
e.g. If you represent a graph as a matrix M, then values in Mn
represent the number of paths of length n in the original graph.
24. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
a
e
dc
b
a b c d e
a 1
b 1
c 1
d 1
e 1
M3 =
Key Concept… Graphs have well known mathematical properties:
e.g. If you represent a graph as a matrix M, then values in Mn
represent the number of paths of length n in the original graph.
25. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
The Market is Ready for You Now With Options
Commercial
Open Source
As a Service
26. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11 May. 2016. Web. 12 May. 2016.
27. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11 May. 2016. Web. 12 May. 2016.
Property graph
RDF
RDF - Resource Description Framework, W3C specs for
metadata modeling, now used in knowledge management
28. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
No SQL? (as opposed to NoSQL) No problem
Gremlin - Open source (Apache2 license)
- a graph traversal language, supported by Titan, Neo4j,
HadoopGiraph, Hadoop Spark, IBM…
Cypher - Neo4j, Objectivity…
Emerging graph query/traversal languages
SPARQL - Open source (SPARQL Protocol and RDF Query Language)
29. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.Wikipedia contributors. "Graph database." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 11 May. 2016. Web. 12 May. 2016.
This chart is representative of the
market, but incomplete.
30. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Apache TinkerPop, TinkerPop, Apache, Apache feather logo, and Apache TinkerPop project logo are
either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
Apache TinkerPop™ is a graph computing framework for both
graph databases (OLTP) and graph analytic systems (OLAP).
“A graph is a structure composed of vertices and edges. Both vertices and edges
can have an arbitrary number of key/value-pairs called properties. Vertices denote
discrete objects such as a person, a place, or an event. Edges denote relationships
between vertices. For instance, a person may know another person, have been
involved in an event, and/or was recently at a particular place. Properties express
non-relational information about the vertices and edges. Example properties include
a vertex having a name, an age and an edge having a timestamp and/or a weight.
Together, the aforementioned graph is known as a property graph and it is the
foundational data structure of Apache TinkerPop.”
Apache TinkerPop™ is an open source, vendor-agnostic, graph computing
framework distributed under the commercial friendly Apache2 license. When a data
system is TinkerPop-enabled, its users are able to model their domain as a graph
and analyze that graph using the Gremlin graph traversal language.
32. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Getting Started…
Why choose a graph database?
Speed to delivery when the data is naturally modeled as a graph
Simplifies multi-hop queries
Visualization? Baked-in
Ask Yourself
Do you need an on-premise solution, or to manage your own database?
Lots of options, Neo4J is the market leader
Do you want graphs as a service?
IBM offering graph as a service through BlueMix (in Beta now)
33. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Upcoming Webinar Dates & Topics
June 9 Advances in Natural Language Processing (NLP)
July 13 Modern AI and The Future of Work (With Steve Ardire)
adrian@storminsights.com Twitter @ajbowles Skype ajbowles
A hat-tip to Kamille Nixon…
34. Copyright (c) 2016 by STORM Insights Inc. All Rights reserved.
Upcoming Webinar Dates & Topics
June 9 Advances in Natural Language Processing (NLP)
July 13 Modern AI and The Future of Work (With Steve Ardire)
adrian@storminsights.com Twitter @ajbowles Skype ajbowles