Delivering big content at NBC News with RavenDB

NoSql NOW! 2013
Delivering big content at NBC News
with RavenDB

•  Schema-less document database with RESTful API.
•  Fully ACID and all writes saved to disk (ESENT).
•  Indexing/queries executed with Lucene.NET.
•  Easily extended with custom logic using “bundles”.
•  Management UI provided in Silverlight.
•  Host as Windows Service, IIS app, or embedded
in your app.
Raven server

•  .NET client provided. Third-party clients exist for
JavaScript, PHP, and Ruby.
•  Wraps HTTP API.
•  Provides client-side caching, change notification, LINQ
querying.
•  Easily extended with many, many hooks into almost all
operations.
Raven client

•  Open source: http://github.com/ravendb/ravendb
•  License is AGPL (free) or commercial (paid).
•  Exception: Your project can use any OSI-approved license
and still use Raven for free.
•  Commercial licenses based on max parallelism and RAM.
•  Windows clustering support and storage compression/
encryption available with Enterprise license only.
Raven licensing

•  Includes nbcnews.com, today.com and more.
•  1.2 billion pageviews/month.
•  140 million video streams/month.
•  58 million unique users/month.
•  Traffic spikes up to 100x normal when
big news events happen.
NBC News Digital network

•  Very fast page load required
•  “Instant” publish time required
•  6 to 8 code deployments each day
•  High availability: zero* downtime allowed
One of the largest US news sites

High availability
is when the answer to:
“What’s the longest outage
before you wind up
in your boss’s office?”
is < 5 seconds.

Credit: Mitch Canter @studionashvegas http://twitpic.com/z13bw

•  Rolling deployments and rollbacks.
•  Apps and services decoupled physically and temporally.
•  Designed for both auto-failover/recovery and
manual reconfiguration by ops.
•  Seamless scale out by adding instances of any process.
•  And more…
Some prerequisites for HA

•  Data schema can evolve rapidly
•  Apps shouldn’t know where data is
•  Apps should talk to the closest data replica
•  Apps should automatically find a new replica if the closest
becomes unavailable
•  Ops can add/remove replicas quickly and easily, without
affecting any running apps
HA data: a private data cloud

•  Schema-less document database allows rapid change.
•  Fully ACID model fit business needs.
•  Strong replication functionality supported HA needs.
•  Easily customizable on both client and server.
•  Easily deployed and managed.
•  First class .NET client.
Why we chose RavenDB

•  Raven used behind:
•  NBC News and TODAY apps: Windows 8, iOS,
Android, Windows Phone, XBox, Roku.
•  Growing number of sections of nbcnews.com and
today.com.
•  Raven usage stats:
•  ~10 million docs, +1000s of new docs/day.
•  10s of writes/sec.
•  100s of reads/sec (after 3 layers of caching).
Current* state of Raven usage

•  Each doc cached as long as memory available.
•  Requests include If-Modified-Since header.
•  304 Not Modified response saves bandwidth.
•  Aggressive caching avoids the round-trip. Tunable by ops
at runtime (custom).
Client-side caching

•  You define sharding strategy – a method.
•  Raven manages storing each doc to the correct instance
and fanning/merging queries.
•  No auto-rebalancing of shards if you change number of
instances.
Raven sharding

•  All queries are performed against indexes.
•  Indexes can be predefined or auto-created.
•  Indexing/queries are executed in Lucene.NET.
•  Fielded.
•  Full text with built-in or custom analyzers.
•  Geo-spatial.
•  Map-reduce.
•  Result transformers can load other docs.
•  Query with LINQ or Lucene syntax.
•  Indexes may be stale. Can force wait for non-stale results.
(Danger! Primarily for unit tests.)
•  Projections occur on server, reducing data on the wire.
•  Super-cool stuff: eval patching, index scripts.
Raven indexing and querying

•  Need indexes up to date before letting a client talk to a
replica.
•  Indexes are created by the client app:
•  Static: CreateIndexes() at startup scans assemblies
for index classes.
•  Dynamic: when client issues a query.
Indexing catch-22

•  Define new index, with no code using it.
•  Deploy and allow new index to build.
•  Redeploy with code using the new index.
•  Redeploy after deleting old index definition.
•  Delete old index on each replica.
Updating a static index – a pain

•  If you do it by Id, it is consistent (within a single Raven
server)
•  Load()
•  Store()
•  Delete()
•  Queries are only eventually consistent
(“eventually” is measured in milliseconds)
Consistency

•  Eventual consistency – replication is async in background.
•  All replication is one-way and managed by source.
•  Can enable transitive replication – useful for new
instances.
•  Set W value to ensure replication to minimum number of
instances (v2.5). Or timeout.
•  Client will auto-failover to replication destinations,
configurable to reads only or reads and writes.
Raven replication

•  Sequential guids.
•  Unique for every write to a database.
•  Used for caching in client, concurrency control, and
replication.
Etags

Source: What’s the last etag I replicated to you?
Destination: 42
Source: I’m up to 49, so here’s a POST with some docs in it.
Destination: Got ‘em.
Source: What’s the last etag I replicated to you?
Destination: 49
The replication conversation

•  Replication from each instance to all other instances.
•  Any instance could receive writes.
•  Reduce replication conflicts by forcing writes to single
“master”.
•  Handle conflicts in your app or with custom server bundle
– in our case, “last in wins” bundle.
Multi-master replication

•  Null Id and tag can be extracted:
client generates with Hi-Lo
•  Null Id received at server: guid
•  Id ending in / received at server: append auto-increment
integer.
•  Otherwise: use the value in the object.
•  Server prefix protects against edge-case failures.
Id generation

•  Control where reads and writes go. Implemented in a
custom DocumentStore wrapper.
•  Control aggressive caching time.
•  Deploy new instances with replication.
•  Backup – but probably never restore in production.
•  Copy indexes.
•  Monitor with stats endpoints.
Raven operations tasks

•  Modeling/versioning
•  Replication
•  Client failover
•  Consistency
Keep in mind…
•  Concurrency control
•  Indexing and updates
•  Id generation
•  Caching

•  http://ravendb.net
•  GitHub: http://github.com/ravendb
•  Ayende’s blog: http://ayende.com
•  RavenDB Google group
•  @RavenDB on Twitter
•  Me: @jtbennett on Twitter
More info on Raven

Many thanks to:
You.
NoSql NOW!
Huge.
Rhinos:
@ayende,@synhershko.
Peacocks:
@benlakey,@johncoder,@pkdotnet,
Colin Hicks,Peter Durham,BryanWheeler.

hugeinc.com
info@hugeinc.com
45 Main St. #220 Brooklyn, NY 11201
+1 718 625 4843

Delivering big content at NBC News with RavenDB

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Delivering big content at NBC News with RavenDB

Similaire à Delivering big content at NBC News with RavenDB (20)

Dernier

Dernier (20)

Delivering big content at NBC News with RavenDB