Kendall Clark, CEO of Clark & Parsia, LLC, presented an overview of their new RDF database called Stardog. Key points include that Stardog is fast, lightweight, supports rich APIs, logical and statistical inference, and full-text search. It aims to be the fastest RDF database and supports OWL 2 reasoning and SPARQL queries. Stardog is currently in alpha testing and plans to launch a private beta in early April ahead of its 1.0 release in mid-summer.
2. About C&P
• We build semantic technology infrastructure
and enterprise solutions
• Pellet, the leading OWL reasoner
• POPS Expertise Location system
• Bootstrapped since 2005
• Offices in DC and Cambridge, MA
• Government & enterprise customers
• First talk ever was at LOC in 2005 :)
Thursday, March 17, 2011 2
4. TLDR?
• Java RDF database (“quad store”) (no
native code)
• Freemium model:
• enterprise & community editions
• OEM
• Performance for complex SPARQL queries
• Best available reasoning support
Thursday, March 17, 2011 4
5. NoSQL and SemWeb
• Semweb is schemaless and schema-rich
• As agile as NoSQL stores
• More expressive than SQL
• Standards based
• Graph DBs are all ad hoc
• Query Language and, you know, joins
• Do you really want to write map-reduce
programs...only?! We sure don’t...!
Thursday, March 17, 2011 5
6. Why another RDF DB?
• We’re scratching our itch for fast query for
integration & decision support apps
• aimed at db-reasoner “tweener” space
• operationally agile
• There’s a hole in the market; or: markets
are normal distributions (probably)
• Gives us a complete semantic application
platform
Thursday, March 17, 2011 6
7. Commercial Market
• 6 products
• Technically homogenous:
• Sagan-like scale obsession
• Mostly ad hoc reasoning
• Weak perf on complex queries
• Ho-hum feature sets & integrations
• See http://bit.ly/92P8eN for more
Thursday, March 17, 2011 7
8. Stardog1.0: Overview
• Fast
• Lightweight
• Rich API support
• Logical & statistical inference
• Transactions
• Full-text search
• Graph algorithms and path language
• awesome mascot!
Thursday, March 17, 2011 8
9. Fast? No, Really Fast!
• First design goal in Stardog is performance
of complex SPARQL query eval on single
machine in the default configuration
• Next, total total queries per second
• In-memory mode available, when needed
• Early testing is promising: fastest RDF DB
on SP2B benchmark. Often several times
faster.
Thursday, March 17, 2011 9
10. Performance
• Do yr own testing; the only queries that
matter are yours; don’t trust, test.
• It’s not ready till it’s very, very fast.
• Flatten the RDF performance tax
• About 256 GB for ~2B triples in main-
memory mode, i.e., $20k Dell box.
• When in doubt: Add. More. RAM.
Thursday, March 17, 2011 10
11. Scalability
• Stardog 1.0: scale up
• Disk-based joins for very large
intermediate structures
• Triples compression
• Ideally efficient on-disk indices
• Stardog 2.0: scale out (shared-disk cluster)
• We think it’s easier to scale a fast DB than
to speed up a scalable one...
Thursday, March 17, 2011 11
12. Lightweight
• ~34 KLOC for core system, ~10 KLOC of
tests (1034 unit tests)
• Trivially simple installation:
• copy JAR & restart servlet container
• If you’ve ever used Sesame...
• May run: embedded, client-server; main
memory or disk-backed modes; any
combination of these
Thursday, March 17, 2011 12
13. Interfaces
• SNARL (Stardog Native API for RDF
Language)
• Avro RPC—esp. the low-level TCP
transport (coming soon...)—for Java & non-
Java
• Sesame & Jena
• SPARQL Protocol (HTTP)
Thursday, March 17, 2011 13
14. Logical Inference
1. OWL 2 QL, EL, and RL “query-time”
reasoning
• No materialization (so: fast bulk loading)
• reasoning enabled per-query
2. OWL 2 DL reasoning via Pellet 3.0
• in-memory, schema reasoning
3. Integrity Constraint Validation via OWL2
4. user-defined & SWRL rules
Thursday, March 17, 2011 14
15. OWL validation of RDF
• Use OWL ontologies to validate RDF
instance data in Stardog.
• May be used as a guard to database
modifications (so, if resulting data is invalid,
transaction fails).
• W3C Member Submission to formalize this
approach; stay tuned for details.
• See http://clarkparsia.com/pellet/icv/ for
details
Thursday, March 17, 2011 15
16. OWL 2 Support
• Stardog 1.0: query-time, query rewriting
reasoner for SPARQL entailment regimes
• It will support all of OWL 2 QL, EL, and
RL, with exceptions:
• limited support for datatypes reasoning
• i.e., won’t support user-defined datatypes
• will depend on customer demand
Thursday, March 17, 2011 16
17. Statistical Inference
• Corleone is a machine learning system for
RDF and OWL
• Optimized for Stardog
• Multiple classifier & cluster algorithms
• Clusters (similarity) and classifies (predicts)
by RDF class & individual
• Machine learning must still be tuned; no
magic bullets
Thursday, March 17, 2011 17
18. Transactions
• Supports optional ACID transactions on
database mutations
• 2-phase commit based on Java Transaction
API
• Tx’d writes 2x to 8x slower, depending on
lots of variables
• Writes may be asynchronous & queued
Thursday, March 17, 2011 18
19. Search
• Indexes RDF individuals and literals
• Results are 2-tuples (url|value, score)
• Based on Lucene: very fast, very scalable
• Can use 1 of 6 algorithms to partition RDF
individuals from a graph
• via SPARQL DESCRIBE hook
• Will be integrated with SPARQL syntax...
Thursday, March 17, 2011 19
20. RDF as Graph
• SPARQL isn’t ideal for every use case
• Graph algorithm processing on RDF purely
as a graph
• Stardog supports Gremlin, the ad hoc
standard for graph database query
languages
• Gremlin makes graph algorithms easy to
write
• More optimized Gremlin support for 1.0
Thursday, March 17, 2011 20
21. Implementations
Sesame Jena Empire
Stardog API
HTTP API Native API Avro API
Stardog Core
SPI Runtime
Transactions
Stardog RDF
Query
Exec
Plan API Query Rewriting/
Optimizer Reasoning
Plan Filter API
Index API SPI
CP Util IO Util Stardog Util Sesame Ext
Thursday, March 17, 2011 21
22. Status
• Stardog 0.4.6 alpha release to alpha testers
on 15 March 2011
• It feels damn good to ship code, even if it’s
just an alpha! :)
• Weekly updates till beta period starts, then
bimonthly updates till 1.0 release
Thursday, March 17, 2011 22
23. The Private Beta
• Doin’ it old school: private beta, invitation
only
• Helps us keep commercial focus
• ~1 April to 30 May
• kendall@clarkparsia.com if yr interested:
give name, org, area of interest, etc.
• Rolling releases, new features, bug fixes, etc
• ~90 organizations signed up for beta so far
Thursday, March 17, 2011 23
24. Roadmap
• 1.0 in mid-Summer
• SPARQL 1.1, MRMW
• stored procedures in any JVM lang
• Shiro-based security layer
• native OWL 2 RL reasoner
• provenance API
• graph algorithms & an RDF path language
• performance improvements continuously
Thursday, March 17, 2011 24