1. Mining OpenStack Community Network
with Neo4j
OpenStack Summit Atlanta
vBrownBag | May 13, 2014
Kamesh Raghavendra
kamesh@netapp.com
2. 2
What?
Property graph representation of the global OpenStack
community network including:
• People
• Developers, Customers, Service Operators
• Interactions
• Mailing lists, Blueprints, Code check-ins
• Contexts
• Location, Parent Organization, Project
Opened to the community in the form of canned and
adhoc graph queries
4. 4
Motivations
• Product strategy & management
• Discover patterns of OpenStack consumption & deployment
• Demographic trends across organizations, industry
verticals & geography
• Segment consumers by demography
• Analyze multi-faceted roles
• Community members playing roles of consumers,
developers & service operators – more often multiple at the
same time
5. 5
OpenStack Data Sources Integrated
• Mail Archives [58,702: http://openstack.markmail.org/]
• Support Form [10,344: https://ask.openstack.org/en/questions/]
• Bug Tracker [6,520: https://bugs.launchpad.net/openstack]
• Blueprints [6,311: https://blueprints.launchpad.net/openstack]
• More sources being integrated
6. 6
OpenStack Network Graph Data Model
Hosted on Neo4j 2.0.3 Community Edition Server
Demographic Context
• Parent organization
• Country
• Industry Vertical
Interaction Context
• Project
• Sentiment
Person
8. 8
Sample Graph Queries
Which are most popular OpenStack projects in Japan?
MATCH (C:COUNTRY)--()--()--()--(Q)--(N:Project)
WHERE C.NAME=’Japan' WITH C,COUNT(Q) AS Count,N ORDER BY Count DESC
RETURN N.TAGNAME,Count
Which are the most popular industries in UK adopting OpenStack?
MATCH (I:INDUSTRY)--(O:ORGANIZATION)--(C:COUNTRY)
WHERE C.NAME='Uk' WITH COUNT(I) AS S,I ORDER BY S DESC
RETURN I.NAME,S LIMIT 5
9. 9
More Sample Graph Queries
Who are the top 5 weekend contributors?
MATCH (O:ORGANIZATION)--(D:DOMAIN)--(P:PERSON)--(Q)
WHERE Q.TIMESTAMP=~".*Sat.*" OR Q.TIMESTAMP=~".*Sun.*" WITH COUNT(Q) AS N,
P, O ORDER BY N DESC
RETURN P.FULL_NAME AS Name, O.NAME AS Organization, N AS
Weekend_Contributions LIMIT 5
10. 10
Road Ahead
• Enhance & automate data ETL
• Integrate more data sources
• Extract more contexts – sentiment, expertise, role
• Enhance query user experience – schema, syntax
• Bring popular queries as canned reports
Seeking early users & collaborators to accelerate
development