2. AGENDA
• Who Betfair are
• Why Couchbase was chosen at Betfair
• What it is being used for
• Some thoughts on adopting NoSQL in Enterprises
• Q&A session later on today
2
6. IN NUMBERS
4.0m+ 30,000
Funded 140 bets placed
Accounts locations one minute
120,000+ £288m
requests per funds on £2.2bn
second deposit Mobile FY12
6
8. DATA AT BETFAIR
• >30,000 markets that can change every 100ms
• Jurisdictionally sensitive navigation
• Multiple web applications for multiple channels
• Large volumes of data from other products
• Transactional data
• Operational monitoring too - large amount of logging data
8
9. TECHNOLOGY AT BETFAIR
Application Stack
• JVM heavy
• Linux on commodity hardware
• Heavy use of Virtualisation/Private Cloud
Data Storage Stack
• Oracle
• Some Informix & MySQL
• NoSQL
9
10. RELATIONAL DATABASES
We love Oracle!
The lifeblood of our transaction system
Highly performant
Well understood
Resilient
Other databases but they are effectively integrated products
10
11. BUT…
Impedance mismatch with object orientated languages
Object models possible in RDBMS but at what cost?
Must have serious skills at this scale
Scaling not easy
Often very heavyweight solution
Integration with Continuous Delivery?
So what about NoSQL?
11
13. NOSQL
Matches well to object orientated languages
Inherently scalable
Very fast look ups
Integrates very well with Continuous Delivery
Combines to give a lower time to delivery
13
15. WHY SO MANY?
Different categories of NoSQL, therefore different usage: K/V, Document, Columnar
Some are wrapped by other products
• CouchDB & Chef
• HBase & OpenTSDB
But what about cases where we have direct usage?
What was the selection criteria for these solutions?
15
16. THE PRESSURE OF DELIVERY
Just finished a cycle of high product delivery focus
Time to step back and reassess the selections
But without negatively affecting current product delivery!
16
17. STRATEGIC REVIEW
Good News
We had a fair amount of experience with different NoSQL solutions
Bad News
Fairly certain that some of the uses were less than optimal
17
18. ADOPTION AND ASSESSMENT PROCESS
• What were our use cases?
• What would be the optimum solutions?
18
19. NOSQL ASSESSMENT PROCESS
• Background/Maturity of the technology
• Data Model Category
• Consistency Model Requirements
• Performance
• Replication strategy (inc. Concurrency Control)
• Caching Model
• Query Model
• Integration with Continuous Delivery
19
21. INITIAL USE CASES FOR NOSQL
Web Tier Persistence
• Session and Cross session storage – e.g. Betslip
• Memcached
• Strong consistency
• Cookie abuse
• Cassandra as current solution
21
22. INITIAL USE CASES FOR NOSQL
User Preferences
• Historically tied to customer account
• Map of keys and values
• Multiple channels with multiple applications
• RDBMS as current solution
22
23. CURRENT ARCHITECTURE
Server side rendered content
SOA Data Services exposed
Supports >200,000 concurrent users
23
27. COUCHBASE PERFORMANCE
• Seriously fast
• Highly deterministic
• Cache ejection/eviction
• Avoids Cold Cache on offlined instances
• Ideal for our architecture – virtualisation/private cloud
• Far better option than our current solution
27
28. COUCHBASE SCALING
• Inherently scalable
• Impressive ability to add nodes under load
• Manual rebalance gives control for highly loaded applications
• Replica promotion avoids failure cascades under load
28
29. COUCHBASE SCHEMA FLEXIBILITY
• Giving the developers ownership of the data storage
• Decouples data migration from application deployment
• Important requirement for Feature Throttles
• Removes many of the requirements for having DB devs/DBAs
• Allows preferences to deal with A/B tests
29
30. OTHER COUCHBASE FEATURES
• Multi-tenancy when required
• Stable and Resilient
• Great ease of use for both Devs and Ops
• Enterprise support
• Elastic Search integration
• Secured with a Service Layer
30
32. COUCHBASE DEPLOYMENTS
• Version 1.8 in production, some 2.0 in pre-prod
• 3 instance clusters for individual web applications
• Larger (4-6) instance clusters for service storage
• We are about 6 months in with our production instances
32
36. COUCHBASE AT BETFAIR
Couchbase is now our strategic document NoSQL solution
• Session state
• Cross session state
• Service Persistence for key-based Entities
• Familiarity will likely see this extend out into other areas
36
37. INTRODUCING NOSQL IN ENTERPRISE
AKA CULTURE HACKING WITH NOSQL
• Remember it's an umbrella term - non-experts will ask why we need so many
different types of NoSQL
• Remember the business benefits
• Present the business with both the use cases you want to adopt NoSQL for and
the assessment of the candidates
• When you can use it, get it out there ASAP in a low risk way
• It’s not about choosing what’s cool, it’s about choosing what’s best for the
business
37
38. THANK YOU!
Martin Anderson @mdjanderson
http://betfair.jobs
38
Notes de l'éditeur
Hello there everyone. Those who registered early - Tim and Abe – gazumped by businessMy name is Martin Anderson. I’m currently a Technical Consultant working for Betfair Australia out of Sydney but previous to that I was the Chief Site Architect at Betfair for 2 years and I’ve been with the business for almost 4 years. My main responsibilities have been heading up the complete replacement of our web tier for Betfair.com. That was a brand new platform for all our web channels, both desktop and mobile, including the introduction of Continuous Delivery and NoSQL.
So this is what I hope to cover in this talkSome background on BetfairWhy Couchbase was chosen at BetfairSome thoughts on adopting NoSQL in EnterprisesIt’s worth mentioning that there is a Q&A session at 5pm so you can catch me there or feel free to grab me during the conference
Before we go into why Betfair selected Couchbase and who we use it, we need to know a bit more about who Betfair are, what they do and what technologies they use to do thisSo who are Betfair?Betfair was created in 1999 as a startup between a developer and a city trader around the concept of a Betting exchange. Exactly like a stock exchange but with bets rather than shares. Since then the company has grown to be one of the largest online gambling companies in the world offering not just the exchange but also sportsbook betting, casino, arcade, bingo and poker. It is very much a dom.com success and very much a British one with the headquarters being in Hammersmith although there are development offices in Romania, Portugal and Australia.
We have a lot of products but the main one that we are known for is the betting exchange. Unlike a normal book maker, where you can only back an outcome like I want , you are able to lay it too. Laying is just effectively taking a back bet from another person.Size wise, we do a fair bit of business and that means that there is a fair amount of data flying around.Here are some numbers for you
This all comes from a volume of bets that exceeds the combined volumes from all the stock exchanges in Europe combined.My favourite is that 20% of customers admitted that they have used their mobile to bet at a weddingWe are practically a bank - we deal with massive volumes of money so people are very interested in our site staying up, being secure and being fastThe company has development centers in the UK, US, Portugal, Romania and Aus. We have a whole host of products, not just the exchange and of course our products have very strict rules from regulatorsThere is a massive amount of complexity. The complexity is not just around data volumes and the speed that we have to process them but also that we offer multiple products across multiple channels in multiple jurisdictions with overview from multiple regulators.But we are going to focus on data
So what sort of data are we looking atPretty much the full gamutMarket data – new markets are created all the time and they need to be surfaced on the site when they doMarkets pricing data – this is the one that changes every The navigational hierarchy data is actually a Directed Acyclic Graph that needs to be correct for each userAs an example, Italian users will have a specific markets for only them while Danish users cannot be offered events like horse racing since an animal is involved.Transactional data – of course since people are placing betsOperational monitoring – we are big exponents of DevOps and making sure that we know what’s happening in the business. Because the system is not simple, this is the only way we can know what is going on.Over 500Gb of data per month just from logging – not including the rest of operational monitoring
Java is not cool butGood knowledge already at Betfair • Real concurrency – great for heavy server workload • Large Community • Great Toolset • Operations Teams understand Java – stats, GC logs, deployment process Oracle
WE LOVE ORACLE!The lifeblood of our transaction system – in fact our core exchange business is based around OracleHighly performant – this might be surprising but just because something is an RDBMS it is not essentially slowWell understood – we have a lot of experience with this. We are comfortable using it and know what to do when this don’t go the way we plannedResilient – Given that we are a bank in many ways – how happy would you be if your bank went down? We are in the business of staying up
Impedance mismatch with object orientated languages – the rise (and fall?) of technology like Hibernate and other ORMs highlights this. When you are developing there is a clear break between your application logic and the persistence technology wth RDBMS.Object models possible in RDBMS but at what cost? – you can solve this issue but what are the costs both in the development cost and then the on going maintenance as you fit a square peg into a round holeMust have serious skills at this scale – we are one of the top 5 hottest Oracle databases in the worldScaling not easy – clustering and sharding – easy to say, not so easy to doIntegration with Continuous Delivery? – We deploy at least once a week.I don’t want to go on too much about Continuous Delivery but I firmly believe that it is no longer an optional requirement for software development. One of the fundamental tenets of CD is that your process is automated. For this to happen it needs to be deterministic and one of the easiest ways to guarantee this is to make the process simple. Unfortunately things like database migrations and green/blue deployments are inhenrently complex even with tooling like DBDeploy
So why should we use NoSQL?Well the reasons are these…The time from concept to cash
So why so many?From one perspective, since NoSQL is an umbrella term, you would naturally expect to have multiple typesSome of these technologies are dependencies of other technologies: for example the deployment tool Chef uses CouchDB and OpenTSDB which we use extensively for monitoring uses Hbase under the hoodSo what about direct usage, where our applications are directly using these technologies?Coherence – distributed caching in various tiersMemcached – distributed caching in web tierCassandra – storage in web and service tierMongoDB – storage in prototypes and caching in AustraliaRedis – high speed sorted set delivery in US ExchangeDo we understand why we chose that solution?
It’s fairly common for large organizations to cycle between product delivery and then delivering efficiencies/optimisations on those products especially in an Agile world.We were just coming of the back of not just the delivery of a new web platform but actually a raft of new deliveries across multiple products and channel and even countries. This mean that sometimes our technology is chosen based from what is currently supported rather than
Good NewsWe’ve had experience with K/V, Document and Columnar stores and seen how these things breakBad NewsCassandra is a great piece of tech. Very good for high writes but not optimal for read heavy or even equal read/writes especially when you want strong consistency. Since the client is unaware of the server topology you need to have quorum (explain) read/writes to achieve this. You get intermittent high p95 unless you go to SSDs or front it with Memcached.http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.htmlhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Pretty simple really since there are only two questionsThe trick is to remember that your use cases need to include your future use cases not just the ones you have nowAnd that your optimum solution needs to answer a whole raft of questions, not just the obvious ones
Web Tier PersistenceSession storage – easy to do with Memcached or Coherence but we want cross session as well so we need persistenceStrong consistency at scale with high performanceknown issues of cold cache and deployment complexityToo many people try to use and abuse cookies for this not secure size constraints so pipe separated magic values that quickly become just magic numbersMobile usage avoids issues with things like low powered devices
User PreferencesSimple settings for how you want your experience to beThis is effectively a map of key values but can be mutimapDates back to when there was a single product on a single channelNow multi-product on multiple channels so an obvious need to be separateWhy? They move at different speeds with different deployments frequencyAlso, feature throttles for rollout and A/B testing – Lazy load the data and let the application be in control
Server side rendered contentSOA Data Services exposedSupports >200,000 concurrent users
22 MINUTE CHECKThe importance of the blocking calls at the top being fastRemember to loop back and talk about task dedupllication
Mention the performance work done by AltorosClient being aware of the server topology means that you have more deterministic behaviourEjection onto disk allows great overflow from RAMMemcached replacementRe Cassandra - Mention the two Netflix articles on their website – I will tweet them afterwards
Very important for developersEmpowers them to own the entire stack from top to bottom
We tend to have dedicated stacks for various reasons: including independence, compliance and regulation but we also have multi-tenancyWe’ve found it very stable - for example we have had no examples of data loss with Couchbase – not something I can say for other solutions (both Hbasenamenode SPOF and Cassandra Read only VM with no hinted handoff blew the stack)We’ve had no trouble spinning the devs up to speed or the ops guys who support it. For example it’s been great that we had some work in Aus where the full env was not yet ready so the guys spun up some local instances so they could just get to workFor a large organisation like ourselves, having experts we can call on is a great help
We tend to have dedicated stacks for various reasons: includingWe’ve found it very stable - for example we have had no examples of data loss with Couchbase – not something I can say for other solutions (both Hbasenamenode SPOF and Cassandra Read only VM with no hinted handoff blew the stack)We’ve had no trouble spinning the devs up to speed or the ops guys who support it. For example it’s been great that we had some work in Aus where the full env was not yet ready so the guys spun up some local instances so they could just get to workFor a large organisation like ourselves, having experts we can call on is a great help
Here are some examples of couchbase in useThis is for our sportsbookYou can see the spiky nature of the demand as it is skewed towards events that happen from midday to early evening and especially on the weekendTotal Doc data size is around 2.5 Gb
Here’ another bucket for the same applicationThis one is slightly higher Ops per second and has a data set of 3 Gb
Smaller data set here – it’s under 1 GbThese are just a sample of our couchbase usage but it’s fairly representativeIf you have any specific questions on our couchbase instances, please come and find me later.
Session and cross session storageYou can do funky things like share session data across channels, e.g. add a bet to your betslip on your desktop and then access it on your mobile deviceStorage – like user preferences
K/V, Document, Structured Data, Columnar, Graph – each has their own use case, the sweet spot where they work the bestFor us it was delivering faster with less resources – e.g.DBDev, DBAThis should just be putting down on paper your thoughts on the topic so it’s not a wasted exerciseIdeally find something with ephemeral data where going bang does not being down your siteFor use, Couchbase has shown itself to be the best document NoSQL store for our business
Any questions that have interesting answers I will either tweet the answer or tweet a link.Same goes for links that relate to what I’ve covered todayThank you very much