RavenDB is a schema-less document database that offers fully ACID transactions, fast and flexible search, replication, sharding, and a simple RESTful API wrapped by clients in a growing number of languages. In this session, we will discuss the experience of developing and maintaining a RavenDB-backed CMS for one of the largest news sites in the US.
We'll cover:
- Supporting rapid evolution of the content/data model.
- Indexing for full-text, map-reduce, geospatial and other types of search.
- Replicating and sharding across servers and data centers for high-availability.
- Deploying with no downtime.
- Handling huge traffic spikes.
4. • Schema-less document database with RESTful API.
• Fully ACID and all writes saved to disk (ESENT).
• Indexing/queries executed with Lucene.NET.
• Easily extended with custom logic using “bundles”.
• Management UI provided in Silverlight.
• Host as Windows Service, IIS app, or embedded
in your app.
Raven server
5. • .NET client provided. Third-party clients exist for
JavaScript, PHP, and Ruby.
• Wraps HTTP API.
• Provides client-side caching, change notification, LINQ
querying.
• Easily extended with many, many hooks into almost all
operations.
Raven client
6. • Open source: http://github.com/ravendb/ravendb
• License is AGPL (free) or commercial (paid).
• Exception: Your project can use any OSI-approved license
and still use Raven for free.
• Commercial licenses based on max parallelism and RAM.
• Windows clustering support and storage compression/
encryption available with Enterprise license only.
Raven licensing
9. • Includes nbcnews.com, today.com and more.
• 1.2 billion pageviews/month.
• 140 million video streams/month.
• 58 million unique users/month.
• Traffic spikes up to 100x normal when
big news events happen.
NBC News Digital network
10. • Very fast page load required
• “Instant” publish time required
• 6 to 8 code deployments each day
• High availability: zero* downtime allowed
One of the largest US news sites
11. High availability
is when the answer to:
“What’s the longest outage
before you wind up
in your boss’s office?”
is < 5 seconds.
13. • Rolling deployments and rollbacks.
• Apps and services decoupled physically and temporally.
• Designed for both auto-failover/recovery and
manual reconfiguration by ops.
• Seamless scale out by adding instances of any process.
• And more…
Some prerequisites for HA
14. • Data schema can evolve rapidly
• Apps shouldn’t know where data is
• Apps should talk to the closest data replica
• Apps should automatically find a new replica if the closest
becomes unavailable
• Ops can add/remove replicas quickly and easily, without
affecting any running apps
HA data: a private data cloud
15. • Schema-less document database allows rapid change.
• Fully ACID model fit business needs.
• Strong replication functionality supported HA needs.
• Easily customizable on both client and server.
• Easily deployed and managed.
• First class .NET client.
Why we chose RavenDB
16. • Raven used behind:
• NBC News and TODAY apps: Windows 8, iOS,
Android, Windows Phone, XBox, Roku.
• Growing number of sections of nbcnews.com and
today.com.
• Raven usage stats:
• ~10 million docs, +1000s of new docs/day.
• 10s of writes/sec.
• 100s of reads/sec (after 3 layers of caching).
Current* state of Raven usage
18. • Each doc cached as long as memory available.
• Requests include If-Modified-Since header.
• 304 Not Modified response saves bandwidth.
• Aggressive caching avoids the round-trip. Tunable by ops
at runtime (custom).
Client-side caching
19. • You define sharding strategy – a method.
• Raven manages storing each doc to the correct instance
and fanning/merging queries.
• No auto-rebalancing of shards if you change number of
instances.
Raven sharding
20. • All queries are performed against indexes.
• Indexes can be predefined or auto-created.
• Indexing/queries are executed in Lucene.NET.
• Fielded.
• Full text with built-in or custom analyzers.
• Geo-spatial.
• Map-reduce.
• Result transformers can load other docs.
• Query with LINQ or Lucene syntax.
• Indexes may be stale. Can force wait for non-stale results.
(Danger! Primarily for unit tests.)
• Projections occur on server, reducing data on the wire.
• Super-cool stuff: eval patching, index scripts.
Raven indexing and querying
21. • Need indexes up to date before letting a client talk to a
replica.
• Indexes are created by the client app:
• Static: CreateIndexes() at startup scans assemblies
for index classes.
• Dynamic: when client issues a query.
Indexing catch-22
22. • Define new index, with no code using it.
• Deploy and allow new index to build.
• Redeploy with code using the new index.
• Redeploy after deleting old index definition.
• Delete old index on each replica.
Updating a static index – a pain
23. • If you do it by Id, it is consistent (within a single Raven
server)
• Load()
• Store()
• Delete()
• Queries are only eventually consistent
(“eventually” is measured in milliseconds)
Consistency
24. • Eventual consistency – replication is async in background.
• All replication is one-way and managed by source.
• Can enable transitive replication – useful for new
instances.
• Set W value to ensure replication to minimum number of
instances (v2.5). Or timeout.
• Client will auto-failover to replication destinations,
configurable to reads only or reads and writes.
Raven replication
25. • Sequential guids.
• Unique for every write to a database.
• Used for caching in client, concurrency control, and
replication.
Etags
26. Source: What’s the last etag I replicated to you?
Destination: 42
Source: I’m up to 49, so here’s a POST with some docs in it.
Destination: Got ‘em.
Source: What’s the last etag I replicated to you?
Destination: 49
The replication conversation
27. • Replication from each instance to all other instances.
• Any instance could receive writes.
• Reduce replication conflicts by forcing writes to single
“master”.
• Handle conflicts in your app or with custom server bundle
– in our case, “last in wins” bundle.
Multi-master replication
28. • Null Id and tag can be extracted:
client generates with Hi-Lo
• Null Id received at server: guid
• Id ending in / received at server: append auto-increment
integer.
• Otherwise: use the value in the object.
• Server prefix protects against edge-case failures.
Id generation
29. • Control where reads and writes go. Implemented in a
custom DocumentStore wrapper.
• Control aggressive caching time.
• Deploy new instances with replication.
• Backup – but probably never restore in production.
• Copy indexes.
• Monitor with stats endpoints.
Raven operations tasks
30. • Modeling/versioning
• Replication
• Client failover
• Consistency
Keep in mind…
• Concurrency control
• Indexing and updates
• Id generation
• Caching
31. • http://ravendb.net
• GitHub: http://github.com/ravendb
• Ayende’s blog: http://ayende.com
• RavenDB Google group
• @RavenDB on Twitter
• Me: @jtbennett on Twitter
More info on Raven
33. Many thanks to:
You.
NoSql NOW!
Huge.
Rhinos:
@ayende,@synhershko.
Peacocks:
@benlakey,@johncoder,@pkdotnet,
Colin Hicks,Peter Durham,BryanWheeler.