Recent natural language processing advancements have propelled search engine and information retrieval innovations into the public spotlight. People want to be able to interact with their devices in a natural way. In this talk I will be introducing you to natural language search using a Neo4j graph database. I will show you how to interact with an abstract graph data structure using natural language and how this approach is key to future innovations in the way we interact with our devices.
2. We’ll be covering...
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and
map those to results?
How do you transform answers into questions?
3. What is Natural
Language Search?
Natural language search is like querying a
database using your own natural language.
In a way, it is kind of like programming a
person with words (Teaching, Evangelism,
Sales Pitches, Planning, etc.)
4. Progress
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and
map those to results?
How do you transform answers into questions?
5. What do brains and
graphs have in common?
Networks condense a lot of information into
small points.
These small points help us understand or
interpret a lot of information by exploring the
world from many different small points.
Graphs, like brains, help us explore a lot of
information from relative points.
6. But what is a network?
A network is a representation or model of the
interconnectedness of information.
A graph is the de facto mathematical
component that defines the level of
interconnectivity in a network.
A graph database merges these two concepts
into a persistent storage medium.
Networks (Information) + Graph (Mathematics) = Neo4j
7. Graph of people meeting people
Anne met Pam
Pam met Sally
Sally met Anne
John met Sally
8. Path Finding = Searching
The key component when using a graph
database is traversals.
Traversals model the pathways in a network
by enumerating over all possibilities.
Possibilities that meet a criteria are returned
by a query.
(Neo4j’s Cypher Query Language)
9. Progress
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and
map those to results?
How do you transform answers into questions?
10. Time based traversals
Time is a hierarchical method of categorizing
the linearity of global events.
Hours, minutes, seconds...
“Neo4j Meetup is at 6:00 PM on October 29th”
11. Time Scale Event Meta Model
Modeling events over time is easy in Neo4j
Let’s go over the GraphGist for the Time Scale
Event Meta Model
http://gist.neo4j.org/?github-kbastani%2Fgists%2F%2Fmeta%2FTimeScaleEventMetaModel.adoc
ac
12.
13. Progress
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a
graph?
How do you anticipate natural language queries and
map those to results?
How do you transform answers into questions?
14.
15. Progress
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language
queries and map those to results?
How do you transform answers into questions?
16. Neo4j allows you to store information as a
series of paths, and that is really valuable for
giving a user options when it comes to search.
It starts with something I call
“Search Cache”
17. Search Cache
A search cache is a repository of all relevant paths condensed into
a hierarchical data store.
A hierarchical data store is like folder paths that model a storage
collection into a linear path. (Dimensionality Reduction)
An address is a hierarchy, revealing a path.
ex. http://www.neo4j.com/download
ex. > rootneo4j-communitybinneo4j.sh
Natural language path:
> what is the matrix?
18.
19. Type Ahead / Autocomplete
For search it comes down to enumerating over
all possibilities and then mapping those paths
to an action.
http://kbastani.github.io/predictive-autocomplete
Never do real time processing for natural
language search (It is a hard problem -which means it will take time*)
20. Distributed Caching Frameworks
Take a distributed approach to building out
your search cache.
Use Neo4j to model your network and then
enumerate over all possibilities as a query and
add each possibility to a search cache.
Distribute the load to a network of compute
instances like MapReduce.
In C# at
http://kbastani.github.io/predictive-autocomplete
21. How do I build a search cache?
The best way to do this is using blob storage.
I use Windows Azure, but you can use any
data storage as long as it maps to a JSON
file via HTTP GET request.
ex. HTTP GET
../natural/language/search/is/cool
.. Working on open source project using C#
22. Progress
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and
map those to results?
How do you transform answers into
questions?
23. How to transform answers into
questions?
You have a bunch of answers already in natural
language.
Each language has a specific template that allows you
to transform an answer into a question.
“X is Y” -> “What is X?”
Is X a Person? Then “Who is X?”
Add “What is X?” to the search cache.
Example: http://www.arktera.com/