SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
MongoDB at the energy frontier
    Valentin Kuznetsov, Cornell University

    MongoNYC, May, 2012


Monday, May 21, 12                           1
Outline


    ✤   CMS :: LHC :: CERN

    ✤   Data Aggregation System and MongoDB

    ✤   Experience

    ✤   Summary




Monday, May 21, 12                            2
CMS :: LHC :: CERN




         Large Hadron Collider located at CERN, Geneva, Switzerland
       CMS is one of the 4 experiments to probe our knowledge of particle
                   interactions and search for a new physics

Monday, May 21, 12                                                          3
CMS :: LHC :: CERN




                     Compact Muon Solenoid (CMS)

Monday, May 21, 12                                 4
CMS :: LHC :: CERN




                     Typical proton-proton collision in CMS detector
Monday, May 21, 12                                                     5
CMS :: LHC :: CERN

    ✤   40 countries, 172 institutions, more then 3000 scientists

    ✤   CMS experiment produces a few PB of real data each year and we
        collect ~TB of meta-data

    ✤   CMS relies on GRID infrastructure for data processing and uses 100+
        computing centers word-wide

    ✤   CMS software consists of 4M lines of C++(framework), 2M lines of
        python (data management), plus Java, perl, etc.

         ✤    ORACLE, MySQL, SQLite, NoSQL

Monday, May 21, 12                                                            6
Dilemma

                                     GenDB

                                                     LumiDB

                       Data
                      Quality



                                                                  Phedex   How I can find
                                                                             my data?
                DBS



                                                                  PSetDB




                            SiteDB

                                                       Overview

                                             RunDB




Monday, May 21, 12                                                                         7
Motivations
  ✤    Users want to query different
       data services without knowing                        Data Aggregation System
       about their existence

  ✤    Users want to combine
                                               RunSummary                  run          DataQuality                           LumiDB
       information from different data   run, trigger, detector, ...             trigger, ecal, hcal, ...            lumi, luminosity, hltpath


                                                          run,                         run                                lumi
       services                                           lumi


                                                   Phedex                                    DBS
                                                                        block,                               MC id
                                                                                                                             GenDB
  ✤    Some users may have domain        block, file, block.replica,
                                         file.replica, se, node, ...     site
                                                                                 run, file, block, site,
                                                                                 config, tier, dataset,
                                                                                 lumi, parameters, ....
                                                                                                                     generator, xsection,
                                                                                                                     process, decay, ...


       knowledge, but they need to
                                                          site
       query X services, using Y                  SiteDB                                 Overview
                                                                                                                              pset
                                                                                                                       Parameter Set DB
                                         site, admin, site.status, ..            country, node, region, ..           CMSSW parameters
       interface and dealing with Z
       data formats to get our data                                                     Service E
                                                                                 param1, param2, DC
                                                                                         Service ..
                                                                                           Service
                                                                                  param1, param2, .. B
                                                                                            Service
                                                                                    param1, param2, .. A
                                                                                              Service
                                                                                     param1, param2, ..
                                                                                       param1, param2, ..



Monday, May 21, 12                                                                                                                               8
Implementation idea

                     ✤   When we talk we may use different
                         languages (English, French, etc.) or
                         different conventions (pounds vs kg)

                     ✤   In order to establish communication
                         we use translation, dictionary,
                         thesaurus




Monday, May 21, 12                                              9
Implementation idea




Monday, May 21, 12        10
Pros
    ✤   Separate data management from discovery service

    ✤   Data are safe and secure

    ✤   Pluggable architecture (new translations)

    ✤   Users never bother with interface, naming and schema conflicts, data-
        formats, security policies

    ✤   Information is aggregated in a real-time over distributed services

    ✤   Data consistency checks for free

    ✤   DB and API changes are transparent for end-users
Monday, May 21, 12                                                             11
Cons
    ✤   DAS does not own the data

         ✤    lots of writes/reads/translations

    ✤   Data-services are real bottleneck

         ✤    nothing is guaranteed, e.g. service can go down, no control of its
              performance, requested data can be really large, etc.

         ✤    cache often and preemptive


                              MongoDB to rescue !!!

Monday, May 21, 12                                                                 12
Data Aggregation System
                      Invoke the same API(params)
                      Update cache periodically
                                                                      DAS robot                    Fetch popular
                                                                                                   queries/APIs




                      DAS                                          DAS                         DAS                                DAS
                     mapping         Map data-service             cache                        merge                            Analytics
                                     output to DAS
                                     records

                                                                                                                          record query, API
                                                                                                                          call to Analytics
                      runsum                                mapping               aggregator



                      lumidb
                                  data-services




                                                                                          parser

                                                                      DAS core
                                                                                                                                DAS web
                                                  plugins




                      phedex                                          CPU core                        RESTful interface
                                                                                                                                 server
                                                                      DAS core             UI
                       sitedb

                        dbs                                  DAS Cache server


Monday, May 21, 12                                                                                                                            13
Mapping DB
    ✤   Holds translation between user keywords and data-service APIs,
        resolve naming conflicts, etc.

         ✤    city=Ithaca query translates into Google API call

              {'das2api': [{'api_param': 'q', 'das_key': 'city.name', 'pattern': ''}],
               'daskeys': [{'key': 'city', 'map': 'city.name', 'pattern': ''}],
               'expire': 3600,
               'format': 'JSON',
               'params': {'output': 'json', 'q': 'required'},
               'system': 'google_maps',
               'url': 'http://maps.google.com/maps/geo',
               'urn': 'google_geo_maps'}
Monday, May 21, 12                                                                       14
Analytics DB

    ✤   Keep tracks of user queries, data-service API calls

        {'api': {'params': {'q': 'Ithaca', 'output': 'json'}, 'name': 'google_geo_maps'}, 'qhash':
        '7272bdeac45174823d3a4ea240c124ec', 'system': 'google_maps', 'counter': 5}

    ✤   Used by DAS analytics daemons to pre-fetch “hot” queries

               ✤     ValueHotSpot look-up data by popular values

               ✤     KeyHotSpot look-up data by popular key

               ✤     QueryMaintainer to keep given query always in cache


Monday, May 21, 12                                                                                   15
Caching DB
    ✤   Data coming out from data-service providers are translated into JSON
        and stored into cache collection

         ✤    naming translation are performed at this level

    ✤   Data records from cache collection are processed on common key, e.g.
        city.name, and merged into merge collection
                 cache collection                       merge collection
    {'city': {'name': 'Ithaca',
               'lat':42, 'lng':-76}}           {'city': {'name': 'Ithaca',
                                                          'lat':42, 'lng':-76,
    {'city': {'name': 'Ithaca',
                                                          'zip':14850}}
               'zip':14850}}

Monday, May 21, 12                                                               16
DAS workflow                                          query




                                                           DAS              DAS
                                                           core           logging
    ✤   Query parser
                                                           parser


    ✤   Query DAS merge collection                 yes                      no
                                                           query
                                                         DAS merge


         ✤    Query DAS cache collection                                  yes
                                                                                      query
                                                                                    DAS cache
                                                                                                no




               ✤     invoke call to data service          DAS                         DAS           query         DAS
                                                          merge                      cache       data-services   Mapping



               ✤     write to analytics
                                                                     Aggregator                        DAS
                                                                                                     Analytics

    ✤   Aggregate results                                results



    ✤   Represent results on web UI or via               Web UI

        command line interface
Monday, May 21, 12                                                                                                    17
Example




Monday, May 21, 12   18
DAS QL & MongoDB QL

    ✤   DAS Query Language built on top of MongoDB QL; it represents
        MongoDB QL in human readable form

    ✤   UI level:

        block dataset=/a/b/c | grep block.size | count(block.size)

    ✤   DB level:

        col.find(spec={‘dataset.name’:‘/a/b/c’}, fields=[block.size]).count()


    ✤   We enrich QL with additional filters (grep, sort, unique) and
        implement set of coroutines for aggregator functions

Monday, May 21, 12                                                              19
DAS & MongoDB

    ✤   DAS works with 15 distributed data-services

         ✤    their size vary, on average O(100GB)

    ✤   DAS uses 40 MongoDB collections

         ✤    caching, mapping, analytics, logging (normal, capped, gridfs cols)

    ✤   DAS inserts/deletes O(1M) records on a daily basis

    ✤   We operate on a single 64-bit Linux node with 8 CPUs, 24 GB of RAM
        and 1TB of disk space, sharding were tested, but it is not enabled

Monday, May 21, 12                                                                 20
MongoDB benefits

    ✤   Fast I/O and schema-less database are ideal for cache implementation

         ✤    you’re not limited by key:value approach

    ✤   Flexible query language allows to build domain specific QL

         ✤    stay on par with SQL

    ✤   No administrative costs with DB

         ✤    easy to install and maintain


Monday, May 21, 12                                                             21
MongoDB issues (ver 2.0.X)
    ✤   We were unable to directly store DAS queries into analytics collection,
        due to the dot constrain, e.g. {‘a.b’:1}

         ✤    queries <=> storage format {‘key’:‘a.b’, ‘value’:1}

    ✤   Scons is not suitable in fully controlled build environment

         ✤    it removes $PATH/$LD_LIBRARY_PATH for compiler commands;
              it forces to use -L/lib64. As a result we used wrappers.

    ✤   Uncompressed field names and limitation with pagination/
        aggregation

         ✤    should be addressed in new MongoDB aggregation framework
Monday, May 21, 12                                                                22
Tradeoffs

    ✤   Query collisions: DAS does not own the data and there is no
        transactions, we rely on query status and update it accordingly

    ✤   Index choice: initially one per select key, later one per query hash

    ✤   Storage size: we compromise storage vs data flexibility vs naming
        conventions

    ✤   Speed: we compromise simple data access vs conglomerate of
        restrictions (naming, security policies, interfaces, etc.), but we tuning-
        up our data-service APIs based on query patterns


Monday, May 21, 12                                                                   23
Results

    ✤   The service in production over one year

    ✤   Users authenticated via GRID certificates and DAS uses proxy server
        to pass credentials to back-end services

    ✤   Single query request yields few thousand records and resolved within
        few seconds

    ✤   Pluggable architecture allows to query your service(s)

         ✤    unit tests are done against public data-services, e.g. Google, IP
              look-up, etc.

Monday, May 21, 12                                                                24
NoSQL @ CERN

    ✤   MongoDB is used by other experiments at CERN

         ✤    logging, monitoring, data analytics

    ✤   MongoDB is not the only NoSQL solution used at CERN

         ✤    One size does not fit all

         ✤    CouchDB, Cassandra, HBase, etc.

    ✤   There is on-going discussion between experiments and CERN IT
        about adoption of NoSQL

Monday, May 21, 12                                                     25
Summary
    ✤   CMS experiment built Data Aggregation System as an intelligent
        cache to query distributed data-services

    ✤   MongoDB is used as DAS back-end

    ✤   During first year of operation we did not experience any significant
        problems

    ✤   I’d like to thank MongoDB team and its community for their constant
        support

    ✤   Questions? Contact: vkuznet@gmail.com

         ✤    https://github.com/vkuznet/DAS/
Monday, May 21, 12                                                            26
Back-up slides




Monday, May 21, 12   27
From query to results

                                      Data service
                                       generator               Aggreator

                               API    Data service   Merge
                     Query                                     Aggreator
                             lookup    generator     results


                                      Data service             Aggreator
                                       generator




Monday, May 21, 12                                                         28
From query to results

                                      Data service
                                       generator               Aggreator

                               API    Data service   Merge
                     Query                                     Aggreator
                             lookup    generator     results


                                      Data service             Aggreator
                                       generator




Monday, May 21, 12                                                         28
From query to results

                                                       Data service
                                                        generator               Aggreator

                                              API      Data service   Merge
                            Query                                               Aggreator
                                            lookup      generator     results


                                                       Data service             Aggreator
                                                        generator
                     block dataset=/a/b/c




                        MongoDB spec

                                       Mapping DB
                                           holds
                                       relationships

Monday, May 21, 12                                                                          28
From query to results

                                                       Data service
                                                        generator                  Aggreator

                                              API      Data service      Merge
                            Query                                                  Aggreator
                                            lookup      generator        results


                                                       Data service                Aggreator
                                                        generator
                     block dataset=/a/b/c




                        MongoDB spec

                                       Mapping DB       Caching DB
                                           holds            holds
                                       relationships   service records

Monday, May 21, 12                                                                             28
From query to results

                                                       Data service
                                                        generator                       Aggreator

                                              API      Data service       Merge
                            Query                                                       Aggreator
                                            lookup      generator         results


                                                       Data service                     Aggreator
                                                        generator
                     block dataset=/a/b/c




                        MongoDB spec

                                       Mapping DB       Caching DB       Merge DB
                                           holds            holds          holds
                                       relationships   service records merged records

Monday, May 21, 12                                                                                  28
From query to results

                                                       Data service
                                                        generator                       Aggreator

                                              API      Data service       Merge
                            Query                                                       Aggreator
                                            lookup      generator         results


                                                       Data service                     Aggreator
                                                        generator
                     block dataset=/a/b/c




                        MongoDB spec

                                       Mapping DB       Caching DB       Merge DB
                                           holds            holds          holds
                                       relationships   service records merged records

Monday, May 21, 12                                                                                  28

Contenu connexe

Tendances

SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
 
Nagios Conference 2012 - John Murphy - Rational Configuration Design
Nagios Conference 2012 - John Murphy - Rational Configuration DesignNagios Conference 2012 - John Murphy - Rational Configuration Design
Nagios Conference 2012 - John Murphy - Rational Configuration DesignNagios
 
SD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons LearnedSD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons LearnedTed Leung
 
Linked In Lessons Learned And Growth And Scalability
Linked In Lessons Learned And Growth And ScalabilityLinked In Lessons Learned And Growth And Scalability
Linked In Lessons Learned And Growth And ScalabilityConSanFrancisco123
 
Xml messages
Xml messagesXml messages
Xml messagesDeb Wolfe
 
Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...
Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...
Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...slashn
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu
 
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraMichaël Figuière
 
NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra
NoSQL Matters 2012 - Real Time Big Data in practice with CassandraNoSQL Matters 2012 - Real Time Big Data in practice with Cassandra
NoSQL Matters 2012 - Real Time Big Data in practice with CassandraMichaël Figuière
 
Advanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part IAdvanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part IAngelo Corsaro
 

Tendances (13)

SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Nagios Conference 2012 - John Murphy - Rational Configuration Design
Nagios Conference 2012 - John Murphy - Rational Configuration DesignNagios Conference 2012 - John Murphy - Rational Configuration Design
Nagios Conference 2012 - John Murphy - Rational Configuration Design
 
No Sql
No SqlNo Sql
No Sql
 
SD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons LearnedSD Forum 1999 XML Lessons Learned
SD Forum 1999 XML Lessons Learned
 
Session18 Madduri
Session18  MadduriSession18  Madduri
Session18 Madduri
 
Linked In Lessons Learned And Growth And Scalability
Linked In Lessons Learned And Growth And ScalabilityLinked In Lessons Learned And Growth And Scalability
Linked In Lessons Learned And Growth And Scalability
 
Demo cloud ert_withoutvideos
Demo cloud ert_withoutvideosDemo cloud ert_withoutvideos
Demo cloud ert_withoutvideos
 
Xml messages
Xml messagesXml messages
Xml messages
 
Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...
Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...
Slash n: Technical Session 3 - Storage @ Scale: Quest for the mythical silver...
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFP
 
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with CassandraApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
ApacheCon Europe 2012 - Real Time Big Data in practice with Cassandra
 
NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra
NoSQL Matters 2012 - Real Time Big Data in practice with CassandraNoSQL Matters 2012 - Real Time Big Data in practice with Cassandra
NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra
 
Advanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part IAdvanced OpenSplice Programming - Part I
Advanced OpenSplice Programming - Part I
 

En vedette

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
TCO Comparison MongoDB & Oracle
TCO Comparison MongoDB & OracleTCO Comparison MongoDB & Oracle
TCO Comparison MongoDB & OracleEl Taller Web
 
TCO - MongoDB vs. Oracle
TCO - MongoDB vs. OracleTCO - MongoDB vs. Oracle
TCO - MongoDB vs. OracleJeremy Taylor
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityMongoDB
 
Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyJohn Kanagaraj
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingSergey Bushik
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationMongoDB
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningMongoDB
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Dave Segleau
 
MongoDB as a fast and queryable cache
MongoDB as a fast and queryable cacheMongoDB as a fast and queryable cache
MongoDB as a fast and queryable cacheMongoDB
 

En vedette (17)

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
TCO Comparison MongoDB & Oracle
TCO Comparison MongoDB & OracleTCO Comparison MongoDB & Oracle
TCO Comparison MongoDB & Oracle
 
TCO - MongoDB vs. Oracle
TCO - MongoDB vs. OracleTCO - MongoDB vs. Oracle
TCO - MongoDB vs. Oracle
 
Optimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and CreativityOptimizing Slow Queries with Indexes and Creativity
Optimizing Slow Queries with Indexes and Creativity
 
Oracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the uglyOracle vs NoSQL – The good, the bad and the ugly
Oracle vs NoSQL – The good, the bad and the ugly
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
 
MongoDB as a fast and queryable cache
MongoDB as a fast and queryable cacheMongoDB as a fast and queryable cache
MongoDB as a fast and queryable cache
 

Similaire à MongoDB at the energy frontier

Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
 
2010-dec-08 HL7 Detailed Clinical Modelling and Architecture
2010-dec-08 HL7 Detailed Clinical Modelling and Architecture2010-dec-08 HL7 Detailed Clinical Modelling and Architecture
2010-dec-08 HL7 Detailed Clinical Modelling and ArchitectureMichael van der Zel
 
Apache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source IntegrationApache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source Integrationprajods
 
Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Logica_hummingbird
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Brent Salisbury
 
Capacity planning in mobile data networks experiencing exponential growth in ...
Capacity planning in mobile data networks experiencing exponential growth in ...Capacity planning in mobile data networks experiencing exponential growth in ...
Capacity planning in mobile data networks experiencing exponential growth in ...Dr. Kim (Kyllesbech Larsen)
 
Services Oriented Infrastructure in a Web2.0 World
Services Oriented Infrastructure in a Web2.0 WorldServices Oriented Infrastructure in a Web2.0 World
Services Oriented Infrastructure in a Web2.0 WorldLexumo
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
The Potential Impact of Software Defined Networking SDN on Security
The Potential Impact of Software Defined Networking SDN on SecurityThe Potential Impact of Software Defined Networking SDN on Security
The Potential Impact of Software Defined Networking SDN on SecurityBrent Salisbury
 
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...Channy Yun
 
Decade architecture discussion 20110311
Decade architecture discussion 20110311Decade architecture discussion 20110311
Decade architecture discussion 20110311chenlijiang
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebookparallellabs
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosBrent Salisbury
 
Zotonic presentation Erlang Camp Boston, august 2011
Zotonic presentation Erlang Camp Boston, august 2011Zotonic presentation Erlang Camp Boston, august 2011
Zotonic presentation Erlang Camp Boston, august 2011Arjan
 

Similaire à MongoDB at the energy frontier (20)

Data Aggregation System
Data Aggregation SystemData Aggregation System
Data Aggregation System
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
2010-dec-08 HL7 Detailed Clinical Modelling and Architecture
2010-dec-08 HL7 Detailed Clinical Modelling and Architecture2010-dec-08 HL7 Detailed Clinical Modelling and Architecture
2010-dec-08 HL7 Detailed Clinical Modelling and Architecture
 
Apache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source IntegrationApache Camel: The Swiss Army Knife of Open Source Integration
Apache Camel: The Swiss Army Knife of Open Source Integration
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012Software Defined Data Centers - June 2012
Software Defined Data Centers - June 2012
 
Capacity planning in mobile data networks experiencing exponential growth in ...
Capacity planning in mobile data networks experiencing exponential growth in ...Capacity planning in mobile data networks experiencing exponential growth in ...
Capacity planning in mobile data networks experiencing exponential growth in ...
 
Services Oriented Infrastructure in a Web2.0 World
Services Oriented Infrastructure in a Web2.0 WorldServices Oriented Infrastructure in a Web2.0 World
Services Oriented Infrastructure in a Web2.0 World
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
Xrm xensummit
Xrm xensummitXrm xensummit
Xrm xensummit
 
NoSQL
NoSQLNoSQL
NoSQL
 
The Potential Impact of Software Defined Networking SDN on Security
The Potential Impact of Software Defined Networking SDN on SecurityThe Potential Impact of Software Defined Networking SDN on Security
The Potential Impact of Software Defined Networking SDN on Security
 
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
The Construction of the Internet Geological Data System Using WWW+Java+DB Tec...
 
Decade architecture discussion 20110311
Decade architecture discussion 20110311Decade architecture discussion 20110311
Decade architecture discussion 20110311
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow Demos
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Zotonic presentation Erlang Camp Boston, august 2011
Zotonic presentation Erlang Camp Boston, august 2011Zotonic presentation Erlang Camp Boston, august 2011
Zotonic presentation Erlang Camp Boston, august 2011
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

MongoDB at the energy frontier

  • 1. MongoDB at the energy frontier Valentin Kuznetsov, Cornell University MongoNYC, May, 2012 Monday, May 21, 12 1
  • 2. Outline ✤ CMS :: LHC :: CERN ✤ Data Aggregation System and MongoDB ✤ Experience ✤ Summary Monday, May 21, 12 2
  • 3. CMS :: LHC :: CERN Large Hadron Collider located at CERN, Geneva, Switzerland CMS is one of the 4 experiments to probe our knowledge of particle interactions and search for a new physics Monday, May 21, 12 3
  • 4. CMS :: LHC :: CERN Compact Muon Solenoid (CMS) Monday, May 21, 12 4
  • 5. CMS :: LHC :: CERN Typical proton-proton collision in CMS detector Monday, May 21, 12 5
  • 6. CMS :: LHC :: CERN ✤ 40 countries, 172 institutions, more then 3000 scientists ✤ CMS experiment produces a few PB of real data each year and we collect ~TB of meta-data ✤ CMS relies on GRID infrastructure for data processing and uses 100+ computing centers word-wide ✤ CMS software consists of 4M lines of C++(framework), 2M lines of python (data management), plus Java, perl, etc. ✤ ORACLE, MySQL, SQLite, NoSQL Monday, May 21, 12 6
  • 7. Dilemma GenDB LumiDB Data Quality Phedex How I can find my data? DBS PSetDB SiteDB Overview RunDB Monday, May 21, 12 7
  • 8. Motivations ✤ Users want to query different data services without knowing Data Aggregation System about their existence ✤ Users want to combine RunSummary run DataQuality LumiDB information from different data run, trigger, detector, ... trigger, ecal, hcal, ... lumi, luminosity, hltpath run, run lumi services lumi Phedex DBS block, MC id GenDB ✤ Some users may have domain block, file, block.replica, file.replica, se, node, ... site run, file, block, site, config, tier, dataset, lumi, parameters, .... generator, xsection, process, decay, ... knowledge, but they need to site query X services, using Y SiteDB Overview pset Parameter Set DB site, admin, site.status, .. country, node, region, .. CMSSW parameters interface and dealing with Z data formats to get our data Service E param1, param2, DC Service .. Service param1, param2, .. B Service param1, param2, .. A Service param1, param2, .. param1, param2, .. Monday, May 21, 12 8
  • 9. Implementation idea ✤ When we talk we may use different languages (English, French, etc.) or different conventions (pounds vs kg) ✤ In order to establish communication we use translation, dictionary, thesaurus Monday, May 21, 12 9
  • 11. Pros ✤ Separate data management from discovery service ✤ Data are safe and secure ✤ Pluggable architecture (new translations) ✤ Users never bother with interface, naming and schema conflicts, data- formats, security policies ✤ Information is aggregated in a real-time over distributed services ✤ Data consistency checks for free ✤ DB and API changes are transparent for end-users Monday, May 21, 12 11
  • 12. Cons ✤ DAS does not own the data ✤ lots of writes/reads/translations ✤ Data-services are real bottleneck ✤ nothing is guaranteed, e.g. service can go down, no control of its performance, requested data can be really large, etc. ✤ cache often and preemptive MongoDB to rescue !!! Monday, May 21, 12 12
  • 13. Data Aggregation System Invoke the same API(params) Update cache periodically DAS robot Fetch popular queries/APIs DAS DAS DAS DAS mapping Map data-service cache merge Analytics output to DAS records record query, API call to Analytics runsum mapping aggregator lumidb data-services parser DAS core DAS web plugins phedex CPU core RESTful interface server DAS core UI sitedb dbs DAS Cache server Monday, May 21, 12 13
  • 14. Mapping DB ✤ Holds translation between user keywords and data-service APIs, resolve naming conflicts, etc. ✤ city=Ithaca query translates into Google API call {'das2api': [{'api_param': 'q', 'das_key': 'city.name', 'pattern': ''}], 'daskeys': [{'key': 'city', 'map': 'city.name', 'pattern': ''}], 'expire': 3600, 'format': 'JSON', 'params': {'output': 'json', 'q': 'required'}, 'system': 'google_maps', 'url': 'http://maps.google.com/maps/geo', 'urn': 'google_geo_maps'} Monday, May 21, 12 14
  • 15. Analytics DB ✤ Keep tracks of user queries, data-service API calls {'api': {'params': {'q': 'Ithaca', 'output': 'json'}, 'name': 'google_geo_maps'}, 'qhash': '7272bdeac45174823d3a4ea240c124ec', 'system': 'google_maps', 'counter': 5} ✤ Used by DAS analytics daemons to pre-fetch “hot” queries ✤ ValueHotSpot look-up data by popular values ✤ KeyHotSpot look-up data by popular key ✤ QueryMaintainer to keep given query always in cache Monday, May 21, 12 15
  • 16. Caching DB ✤ Data coming out from data-service providers are translated into JSON and stored into cache collection ✤ naming translation are performed at this level ✤ Data records from cache collection are processed on common key, e.g. city.name, and merged into merge collection cache collection merge collection {'city': {'name': 'Ithaca', 'lat':42, 'lng':-76}} {'city': {'name': 'Ithaca', 'lat':42, 'lng':-76, {'city': {'name': 'Ithaca', 'zip':14850}} 'zip':14850}} Monday, May 21, 12 16
  • 17. DAS workflow query DAS DAS core logging ✤ Query parser parser ✤ Query DAS merge collection yes no query DAS merge ✤ Query DAS cache collection yes query DAS cache no ✤ invoke call to data service DAS DAS query DAS merge cache data-services Mapping ✤ write to analytics Aggregator DAS Analytics ✤ Aggregate results results ✤ Represent results on web UI or via Web UI command line interface Monday, May 21, 12 17
  • 19. DAS QL & MongoDB QL ✤ DAS Query Language built on top of MongoDB QL; it represents MongoDB QL in human readable form ✤ UI level: block dataset=/a/b/c | grep block.size | count(block.size) ✤ DB level: col.find(spec={‘dataset.name’:‘/a/b/c’}, fields=[block.size]).count() ✤ We enrich QL with additional filters (grep, sort, unique) and implement set of coroutines for aggregator functions Monday, May 21, 12 19
  • 20. DAS & MongoDB ✤ DAS works with 15 distributed data-services ✤ their size vary, on average O(100GB) ✤ DAS uses 40 MongoDB collections ✤ caching, mapping, analytics, logging (normal, capped, gridfs cols) ✤ DAS inserts/deletes O(1M) records on a daily basis ✤ We operate on a single 64-bit Linux node with 8 CPUs, 24 GB of RAM and 1TB of disk space, sharding were tested, but it is not enabled Monday, May 21, 12 20
  • 21. MongoDB benefits ✤ Fast I/O and schema-less database are ideal for cache implementation ✤ you’re not limited by key:value approach ✤ Flexible query language allows to build domain specific QL ✤ stay on par with SQL ✤ No administrative costs with DB ✤ easy to install and maintain Monday, May 21, 12 21
  • 22. MongoDB issues (ver 2.0.X) ✤ We were unable to directly store DAS queries into analytics collection, due to the dot constrain, e.g. {‘a.b’:1} ✤ queries <=> storage format {‘key’:‘a.b’, ‘value’:1} ✤ Scons is not suitable in fully controlled build environment ✤ it removes $PATH/$LD_LIBRARY_PATH for compiler commands; it forces to use -L/lib64. As a result we used wrappers. ✤ Uncompressed field names and limitation with pagination/ aggregation ✤ should be addressed in new MongoDB aggregation framework Monday, May 21, 12 22
  • 23. Tradeoffs ✤ Query collisions: DAS does not own the data and there is no transactions, we rely on query status and update it accordingly ✤ Index choice: initially one per select key, later one per query hash ✤ Storage size: we compromise storage vs data flexibility vs naming conventions ✤ Speed: we compromise simple data access vs conglomerate of restrictions (naming, security policies, interfaces, etc.), but we tuning- up our data-service APIs based on query patterns Monday, May 21, 12 23
  • 24. Results ✤ The service in production over one year ✤ Users authenticated via GRID certificates and DAS uses proxy server to pass credentials to back-end services ✤ Single query request yields few thousand records and resolved within few seconds ✤ Pluggable architecture allows to query your service(s) ✤ unit tests are done against public data-services, e.g. Google, IP look-up, etc. Monday, May 21, 12 24
  • 25. NoSQL @ CERN ✤ MongoDB is used by other experiments at CERN ✤ logging, monitoring, data analytics ✤ MongoDB is not the only NoSQL solution used at CERN ✤ One size does not fit all ✤ CouchDB, Cassandra, HBase, etc. ✤ There is on-going discussion between experiments and CERN IT about adoption of NoSQL Monday, May 21, 12 25
  • 26. Summary ✤ CMS experiment built Data Aggregation System as an intelligent cache to query distributed data-services ✤ MongoDB is used as DAS back-end ✤ During first year of operation we did not experience any significant problems ✤ I’d like to thank MongoDB team and its community for their constant support ✤ Questions? Contact: vkuznet@gmail.com ✤ https://github.com/vkuznet/DAS/ Monday, May 21, 12 26
  • 28. From query to results Data service generator Aggreator API Data service Merge Query Aggreator lookup generator results Data service Aggreator generator Monday, May 21, 12 28
  • 29. From query to results Data service generator Aggreator API Data service Merge Query Aggreator lookup generator results Data service Aggreator generator Monday, May 21, 12 28
  • 30. From query to results Data service generator Aggreator API Data service Merge Query Aggreator lookup generator results Data service Aggreator generator block dataset=/a/b/c MongoDB spec Mapping DB holds relationships Monday, May 21, 12 28
  • 31. From query to results Data service generator Aggreator API Data service Merge Query Aggreator lookup generator results Data service Aggreator generator block dataset=/a/b/c MongoDB spec Mapping DB Caching DB holds holds relationships service records Monday, May 21, 12 28
  • 32. From query to results Data service generator Aggreator API Data service Merge Query Aggreator lookup generator results Data service Aggreator generator block dataset=/a/b/c MongoDB spec Mapping DB Caching DB Merge DB holds holds holds relationships service records merged records Monday, May 21, 12 28
  • 33. From query to results Data service generator Aggreator API Data service Merge Query Aggreator lookup generator results Data service Aggreator generator block dataset=/a/b/c MongoDB spec Mapping DB Caching DB Merge DB holds holds holds relationships service records merged records Monday, May 21, 12 28