SlideShare une entreprise Scribd logo
1  sur  49
Evolving from
 Relational to
Document Store
           Graham Tackley
Web Platform Team Lead, guardian.co.uk
“It is not the strongest of the species that
survives, nor the most intelligent. It is the one
       that is most adaptable to change.”
Early Period

         circa ’95

The “Lash It Together” era
Early Period (95, the “Lash It Together” era)


 Perl, CGI, apache

  Experimental
Manual processes
Bespoke software

 RDBMS, scripts
  & static files
Mid Period

      circa ’00

The “Vendor CMS” era
Mid Period: 2000s (The “Vendor CMS era”)


 Vignette / AOLserver
 TCL, Apache, Oracle

  Platform for online
       publishing

Initially scales well with
acceleration in delivery
        of features
Mid Period: 2000s (The “Vendor CMS era”)


 Surprise! Vendor’s CMS
doesn’t do what we want!

 Mish-mash in templates:
 HTML, JavaScript, TCL,
      SQL, PL-SQL

No model in app tier, only
in RDBMS schema created
    in Oracle Designer
Mid Period: 2000s (The “Vendor CMS era”)
Mid Period: 2000s (The “Vendor CMS era”)
Mid Period: 2000s (The “Vendor CMS era”)



After a few years, very
   difficult to extend

  Database schema
becomes fixed due to
  dependencies in
     templates
Mid Period: 2000s (The “Vendor CMS era”)




If you can’t change the
        system:
Modern Period

       circa ’05-09

The “J2EE Monolithic” era
Web server       Web server         Web server



  I bring you NEWS!!!
App server      App server          App server




                 Oracle


         CMS                  Data feeds
Web server         Web server         Web server
              Modern java app
  I bring you NEWS!!!
App server      App server            App server
           Spring / Hibernate

                 DDD / TDD

             Strong Oracle in java
                    model

 Database abstracted away with ORM
         CMS                    Data feeds
Problems
Each release involves schema upgrade

Schema upgrade = downtime for journalists
Complexity still increasing:

               300+ tables,
  10,000 lines of hibernate XML config
1,000 domain objects mapped to database
   70,000 lines of domain object code
      Very tight binding to database
ORM not really masking complexity:

 Database has strong influence on domain model: many
domain objects made more complex mapping joins in
                      RDBMS

Complex hibernate features used, interceptors, proxies

               Complex caching strategy
ORM not really masking complexity:

 Database has strong influence on domain model: many
domain objects made more complex mapping joins in
                      RDBMS

Complex hibernate features used, interceptors, proxies

               Complex caching strategy


                     And:
  We still hand code most queries in SQL!
Load becoming an issue

RDBMS difficult to scale
Partial NoSQL

       circa ’09-10

The “Sticking Plaster” era
Introduce yet more caching to patch up load problems




                       Text




                   Introduction of memcached
Decouple applications from database by building APIs

Power APIs using alternative, more scalable technologies

         APIs used to scale out database reads

               Writes still go to RDBMs
Mutualised news!
               Content API
       http://content.guardianapis.com

Read only access to 10 years worth of content
Core
                             Api
   Web servers

                            Solr/API
    App servers
                            Solr/API
Memcached (20Gb)
                            Solr/API

     oracle        Solr
                            Solr/API

                            Solr/API

      CMS                 Cloud, EC2
Mutualised news!
We’ve solved our load problem (for now)

                 but

       Increased our complexity
Mutualised news!
      We now have 3 models!

           RDBMS tables

            Java Objects

             JSON API
Mutualised news!
Mutualised news!
Mutualised news!
Mutualised news! understandable
      JSON API is simple and

Multiple domain concepts expressed in single document

     Can be designed in forwardly extensible way
Mutualised news! understandable
      JSON API is simple and

Multiple domain concepts expressed in single document

     Can be designed in forwardly extensible way



What if the JSON API was our primary model?
Full NoSQL

    in development

The “It’s the future!” era
MongoDB

 Mutualised news! database
     Document oriented
       Stores parsed JSON documents

        Can express complex queries

      Can be flexible about consistency

Malleable schema: can easily change at runtime

    Can work at both large & small scales
Flexible Schema


Mutualised news!
Flexible Schema


Mutualised news!
Flexible Schema


Mutualised news!
Can easily represent different classes of tag as
                 documents

    Both documents can be inserted into
             same collection

    Far simpler than equivalent hibernate
       mapped subclass configuration
Flexible Schema

        Simple to query:
Mutualised news!
Flexible Schema

              Simple to query:
Mutualised news!
            Query operators:
  $ne, $nin, $all, $exists, $gt, $lt, $gte ...
Modifying the schema


Mutualised news!
Modifying the schema


Mutualised news!
Modifying the schema


Mutualised news!
The first project: Identity

Current login/registration system still in TCL/PL-SQL

          3M+ users in relational database

          Very complex schema + PL-SQL

               New system required

      Can we migrate from Oracle to NoSql?
Build API that can support both backends


    Registration app         guardian.co.uk




                       API             This bit is hard!


                                       Oracle
Build API that can support both backends


    Registration app         guardian.co.uk




                       API



         MongoDB                       Oracle
Migrate using API & decommision


 Registration app         guardian.co.uk




                    API



      MongoDB
Add new stuff!


    Registration app           guardian.co.uk




                         API



MongoDB                Solr?                    Redis?
Summary: Evolving to Document Store

       Relational databases are dead

   Ruthlessly reject relational complexity

     Think hard about your json model

            Plan your evolution


       http://content.guardianapis.com
 graham.tackley@guardian.co.uk - @tackers

Contenu connexe

Tendances

AWS Update | London - Overview of New Releases
AWS Update | London - Overview of New ReleasesAWS Update | London - Overview of New Releases
AWS Update | London - Overview of New ReleasesAmazon Web Services
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesAmazon Web Services
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSDesign, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSAmazon Web Services
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWSAmazon Web Services
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
AWS Summit 2011: Designing Fault Tolerant Applicatons
AWS Summit 2011: Designing Fault Tolerant ApplicatonsAWS Summit 2011: Designing Fault Tolerant Applicatons
AWS Summit 2011: Designing Fault Tolerant ApplicatonsAmazon Web Services
 
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...Michael Elder
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Amazon Web Services
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Distributed Design and Architecture of Cloud Foundry
Distributed Design and Architecture of Cloud FoundryDistributed Design and Architecture of Cloud Foundry
Distributed Design and Architecture of Cloud FoundryDerek Collison
 
Aws platform overview
Aws platform overviewAws platform overview
Aws platform overviewVinay Yelluri
 
Getting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container ServiceGetting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container ServiceAmazon Web Services
 
Transforming Software Development
Transforming Software DevelopmentTransforming Software Development
Transforming Software DevelopmentAmazon Web Services
 
Aplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAmazon Web Services
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Amazon Web Services
 
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS SummitDesign, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS SummitAmazon Web Services
 

Tendances (20)

AWS Update | London - Overview of New Releases
AWS Update | London - Overview of New ReleasesAWS Update | London - Overview of New Releases
AWS Update | London - Overview of New Releases
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application Services
 
AWS for Startups
AWS for StartupsAWS for Startups
AWS for Startups
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWSDesign, Deploy, and Optimize Microsoft SQL Server on AWS
Design, Deploy, and Optimize Microsoft SQL Server on AWS
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Scalable Database Options on AWS
Scalable Database Options on AWSScalable Database Options on AWS
Scalable Database Options on AWS
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
AWS Summit 2011: Designing Fault Tolerant Applicatons
AWS Summit 2011: Designing Fault Tolerant ApplicatonsAWS Summit 2011: Designing Fault Tolerant Applicatons
AWS Summit 2011: Designing Fault Tolerant Applicatons
 
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
Streaming Movies brings you Streamlined Applications -- How Adopting Netflix ...
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
Design, Deploy, and Optimize Microsoft SQL Server on AWS - WIN306 - re:Invent...
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Distributed Design and Architecture of Cloud Foundry
Distributed Design and Architecture of Cloud FoundryDistributed Design and Architecture of Cloud Foundry
Distributed Design and Architecture of Cloud Foundry
 
Aws platform overview
Aws platform overviewAws platform overview
Aws platform overview
 
Getting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container ServiceGetting Started with Amazon EC2 Container Service
Getting Started with Amazon EC2 Container Service
 
Transforming Software Development
Transforming Software DevelopmentTransforming Software Development
Transforming Software Development
 
Aplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuarios
 
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
Building Fault Tolerant Applications in the cloud - AWS Summit 2012 - NYC
 
AWSome Day Brasil - Junho 2020
AWSome Day Brasil - Junho 2020AWSome Day Brasil - Junho 2020
AWSome Day Brasil - Junho 2020
 
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS SummitDesign, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
 

En vedette

Social Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustrySocial Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustryBrandwatch
 
Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.codefluency
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentationMat Wall
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukGraham Tackley
 
Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Mark Fidelman
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 

En vedette (9)

Redis at the guardian
Redis at the guardianRedis at the guardian
Redis at the guardian
 
Neuropsicologia
NeuropsicologiaNeuropsicologia
Neuropsicologia
 
The New Guardian
The New GuardianThe New Guardian
The New Guardian
 
Social Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper IndustrySocial Insights on the UK Newspaper Industry
Social Insights on the UK Newspaper Industry
 
Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.Riak: A friendly key/value store for the web.
Riak: A friendly key/value store for the web.
 
NoSql presentation
NoSql presentationNoSql presentation
NoSql presentation
 
Why we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.ukWhy we chose mongodb for guardian.co.uk
Why we chose mongodb for guardian.co.uk
 
Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?Are Content Strategists the Next Corporate Rock Stars?
Are Content Strategists the Next Corporate Rock Stars?
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 

Similaire à Moving from Relational to Document Store

Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukRoger Xia
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The GuardianMat Wall
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinAmazon Web Services
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinIan Massingham
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012Amazon Web Services
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 
Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architectureArangoDB Database
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraJeff Bollinger
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Scale, baby, scale
Scale, baby, scaleScale, baby, scale
Scale, baby, scaleJulien SIMON
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Viridians on Rails
Viridians on RailsViridians on Rails
Viridians on RailsViridians
 
What is Amazon Web Services & How to Start to deploy your apps ?
What is Amazon Web Services & How to Start to deploy your apps ?What is Amazon Web Services & How to Start to deploy your apps ?
What is Amazon Web Services & How to Start to deploy your apps ?Sébastien ☁ Stormacq
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptmartinlippert
 
NWCloud Cloud Track - Best Practices for Architecting in the Cloud
NWCloud Cloud Track - Best Practices for Architecting in the CloudNWCloud Cloud Track - Best Practices for Architecting in the Cloud
NWCloud Cloud Track - Best Practices for Architecting in the Cloudnwcloud
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012Appirio
 

Similaire à Moving from Relational to Document Store (20)

Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
 
No SQL at The Guardian
No SQL at The GuardianNo SQL at The Guardian
No SQL at The Guardian
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architecture
 
Understanding Database Options
Understanding Database OptionsUnderstanding Database Options
Understanding Database Options
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Scale, baby, scale
Scale, baby, scaleScale, baby, scale
Scale, baby, scale
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Viridians on Rails
Viridians on RailsViridians on Rails
Viridians on Rails
 
What is Amazon Web Services & How to Start to deploy your apps ?
What is Amazon Web Services & How to Start to deploy your apps ?What is Amazon Web Services & How to Start to deploy your apps ?
What is Amazon Web Services & How to Start to deploy your apps ?
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScript
 
NWCloud Cloud Track - Best Practices for Architecting in the Cloud
NWCloud Cloud Track - Best Practices for Architecting in the CloudNWCloud Cloud Track - Best Practices for Architecting in the Cloud
NWCloud Cloud Track - Best Practices for Architecting in the Cloud
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
 

Plus de Graham Tackley

Newsgeist 2017 - Ignite Talk
Newsgeist 2017 - Ignite TalkNewsgeist 2017 - Ignite Talk
Newsgeist 2017 - Ignite TalkGraham Tackley
 
How elasticsearch powers the Guardian's newsroom
How elasticsearch powers the Guardian's newsroomHow elasticsearch powers the Guardian's newsroom
How elasticsearch powers the Guardian's newsroomGraham Tackley
 
Scala: simplifying development at guardian.co.uk
Scala: simplifying development at guardian.co.ukScala: simplifying development at guardian.co.uk
Scala: simplifying development at guardian.co.ukGraham Tackley
 
Java to Scala: Why & How
Java to Scala: Why & HowJava to Scala: Why & How
Java to Scala: Why & HowGraham Tackley
 
LSUG: How we (mostly) moved from Java to Scala
LSUG: How we (mostly) moved from Java to ScalaLSUG: How we (mostly) moved from Java to Scala
LSUG: How we (mostly) moved from Java to ScalaGraham Tackley
 

Plus de Graham Tackley (6)

Newsgeist 2017 - Ignite Talk
Newsgeist 2017 - Ignite TalkNewsgeist 2017 - Ignite Talk
Newsgeist 2017 - Ignite Talk
 
How elasticsearch powers the Guardian's newsroom
How elasticsearch powers the Guardian's newsroomHow elasticsearch powers the Guardian's newsroom
How elasticsearch powers the Guardian's newsroom
 
Scala: simplifying development at guardian.co.uk
Scala: simplifying development at guardian.co.ukScala: simplifying development at guardian.co.uk
Scala: simplifying development at guardian.co.uk
 
Java to Scala: Why & How
Java to Scala: Why & HowJava to Scala: Why & How
Java to Scala: Why & How
 
LSUG: How we (mostly) moved from Java to Scala
LSUG: How we (mostly) moved from Java to ScalaLSUG: How we (mostly) moved from Java to Scala
LSUG: How we (mostly) moved from Java to Scala
 
Java to scala
Java to scalaJava to scala
Java to scala
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Moving from Relational to Document Store

  • 1. Evolving from Relational to Document Store Graham Tackley Web Platform Team Lead, guardian.co.uk
  • 2. “It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.”
  • 3. Early Period circa ’95 The “Lash It Together” era
  • 4. Early Period (95, the “Lash It Together” era) Perl, CGI, apache Experimental Manual processes Bespoke software RDBMS, scripts & static files
  • 5. Mid Period circa ’00 The “Vendor CMS” era
  • 6. Mid Period: 2000s (The “Vendor CMS era”) Vignette / AOLserver TCL, Apache, Oracle Platform for online publishing Initially scales well with acceleration in delivery of features
  • 7. Mid Period: 2000s (The “Vendor CMS era”) Surprise! Vendor’s CMS doesn’t do what we want! Mish-mash in templates: HTML, JavaScript, TCL, SQL, PL-SQL No model in app tier, only in RDBMS schema created in Oracle Designer
  • 8. Mid Period: 2000s (The “Vendor CMS era”)
  • 9. Mid Period: 2000s (The “Vendor CMS era”)
  • 10. Mid Period: 2000s (The “Vendor CMS era”) After a few years, very difficult to extend Database schema becomes fixed due to dependencies in templates
  • 11. Mid Period: 2000s (The “Vendor CMS era”) If you can’t change the system:
  • 12. Modern Period circa ’05-09 The “J2EE Monolithic” era
  • 13.
  • 14. Web server Web server Web server I bring you NEWS!!! App server App server App server Oracle CMS Data feeds
  • 15. Web server Web server Web server Modern java app I bring you NEWS!!! App server App server App server Spring / Hibernate DDD / TDD Strong Oracle in java model Database abstracted away with ORM CMS Data feeds
  • 17. Each release involves schema upgrade Schema upgrade = downtime for journalists
  • 18. Complexity still increasing: 300+ tables, 10,000 lines of hibernate XML config 1,000 domain objects mapped to database 70,000 lines of domain object code Very tight binding to database
  • 19. ORM not really masking complexity: Database has strong influence on domain model: many domain objects made more complex mapping joins in RDBMS Complex hibernate features used, interceptors, proxies Complex caching strategy
  • 20. ORM not really masking complexity: Database has strong influence on domain model: many domain objects made more complex mapping joins in RDBMS Complex hibernate features used, interceptors, proxies Complex caching strategy And: We still hand code most queries in SQL!
  • 21. Load becoming an issue RDBMS difficult to scale
  • 22. Partial NoSQL circa ’09-10 The “Sticking Plaster” era
  • 23. Introduce yet more caching to patch up load problems Text Introduction of memcached
  • 24. Decouple applications from database by building APIs Power APIs using alternative, more scalable technologies APIs used to scale out database reads Writes still go to RDBMs
  • 25. Mutualised news! Content API http://content.guardianapis.com Read only access to 10 years worth of content
  • 26. Core Api Web servers Solr/API App servers Solr/API Memcached (20Gb) Solr/API oracle Solr Solr/API Solr/API CMS Cloud, EC2
  • 27. Mutualised news! We’ve solved our load problem (for now) but Increased our complexity
  • 28. Mutualised news! We now have 3 models! RDBMS tables Java Objects JSON API
  • 32. Mutualised news! understandable JSON API is simple and Multiple domain concepts expressed in single document Can be designed in forwardly extensible way
  • 33. Mutualised news! understandable JSON API is simple and Multiple domain concepts expressed in single document Can be designed in forwardly extensible way What if the JSON API was our primary model?
  • 34. Full NoSQL in development The “It’s the future!” era
  • 35. MongoDB Mutualised news! database Document oriented Stores parsed JSON documents Can express complex queries Can be flexible about consistency Malleable schema: can easily change at runtime Can work at both large & small scales
  • 38. Flexible Schema Mutualised news! Can easily represent different classes of tag as documents Both documents can be inserted into same collection Far simpler than equivalent hibernate mapped subclass configuration
  • 39. Flexible Schema Simple to query: Mutualised news!
  • 40. Flexible Schema Simple to query: Mutualised news! Query operators: $ne, $nin, $all, $exists, $gt, $lt, $gte ...
  • 44. The first project: Identity Current login/registration system still in TCL/PL-SQL 3M+ users in relational database Very complex schema + PL-SQL New system required Can we migrate from Oracle to NoSql?
  • 45. Build API that can support both backends Registration app guardian.co.uk API This bit is hard! Oracle
  • 46. Build API that can support both backends Registration app guardian.co.uk API MongoDB Oracle
  • 47. Migrate using API & decommision Registration app guardian.co.uk API MongoDB
  • 48. Add new stuff! Registration app guardian.co.uk API MongoDB Solr? Redis?
  • 49. Summary: Evolving to Document Store Relational databases are dead Ruthlessly reject relational complexity Think hard about your json model Plan your evolution http://content.guardianapis.com graham.tackley@guardian.co.uk - @tackers

Notes de l'éditeur

  1. g.co.uk: leading liberal news website40m uniques p/m, half from US\n
  2. publishing newspaper for nearly 200 years; website for 15 yrs\nadapting to change is critical - start with history of fighting relational dbs & attempts to tame; then how we’re moving to mongo\n
  3. \n
  4. Ancient system\nScripts & database\nBespoke software, changes difficult\n
  5. \n
  6. Site oriented to broadcast publishing model\nCMS helps. No longer lashing things together \n\n
  7. Template & rdbms oriented design, and TCL = no real domain model\nHeavyweight schema change process\n\n
  8. This is from a TEMPLATE!\nscroll down to reveal HTML\n(about 10,000 of these)\n
  9. bottom of template\nabout 10,000 of these!\n\n
  10. Can’t change schema easily, to many dependencies in templates\n\n
  11. dodo\ne.g. at start just articles; now video, interactives, audio, galleries, live blogs...\n
  12. \n
  13. “Web 2.0”, community, RSS, discoverability, tagging.\n\n
  14. Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  15. Very standard 3 tier application\nScale application servers on load\nCaching local to application server at first. Memcached added later (in next era!)\nRead heavy, broadcast model. Almost no writes compared to reads\n\n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  22. My team spent 2 yrs making relational db inconsitant:\nmuch better to deliever 30 sec old data in 200ms, than up to data data in 5 secs\nWe don’t have a scale problem with current application & model\n\n\n
  23. Talk: beginning to use NoSql in real organisation. Change in journalism affecting platform\n\n
  24. Content api - both for external people (engaging with the internet) and for us. \nMost new stuff from content api, e.g. m.guardian, iPhone, iPad etc.\n
  25. Introduction of memcached & Solr\nSolr hosted in the cloud (EC2)\n
  26. “Out” service\n
  27. \n
  28. Most recent content\n\n
  29. Most recent content with tags, fields\n(this is pretty well how we went live with the content api)\n\n\n
  30. Single article with media\nExtensible schema, eg: adding geotagging to images. Hard in DB, easy in JSON\nThis document represents at least 30 database tables!\n\n\n
  31. \n\n
  32. \n
  33. Not a million miles from a RDBMS\nSimpler\n
  34. Experiments with mongodb & content API\nGuardian site categorises content with tags\nTone tag represents “editorial tone” of content\n\n
  35. Different tag types can have different schemas\nKeywords (subjects) are in a section, music / madonna\n\n
  36. \n\n
  37. \n\n
  38. \n\n
  39. Suppose we want to add external musicbrainz ID to tag?\nAn update can modify the schema at runtime. No downtime.\n\n
  40. Where clause: id\n$push atomically ads external reference onto tag\n\n
  41. Resulting document now looks like this\nWe haven’t had to change the code yet - old code still works, no downtime\n\n
  42. Migration project, not green fields\n(SKIP IF LESS THAN 10 MINS TO GO!)\n\n
  43. REST API\nMapped initially just to oracle, then (next slide) to both datastores\nIntegration tested\n\n
  44. API supports both data stores - lazy migration\nCurrently writing this in Scala + Casbah - so far 60-70% less code for mongo version\n\n
  45. Then batch migration and bye bye oracle\n
  46. In the future?\n\n
  47. 1. we will never use a relational database for a new project\n2. kill: orm; memcached etc\n3. spend the time thinking about how your problem space should be defined\n4. things will change - plan for it!\n