How We Use MongoDB in Our Advertising System

•

2 likes•1,966 views

This talk will go over why we chose to use MongoDB for storing billions of documents with only 3 replset nodes, and why we choose MongoDB for our report data store instead of MySQL.

Why and How I use MongoDB
{name:”Macro Huang”,
github:”macrohuang”,
location:”Beijing”,
email:[“macrohuang.whu@gamil.com”,”macroh
uang@126.com”]
}

Content
Where do I use MongoDB
Why I choose MongoDB
How I use MongoDB
What is the benefit of using MongoDB
Q&A

Where—Report System
 Data never change after importing
 Pre-compute before importing into MongoDB
 Good performance of query

Where—Log System
 Huge data totally
 Data grows very quickly
 Data never change
 No transaction required
 A little complex query

Why—Heavy VS Thin
 RDB is powerful, but too heavy to fit some
requirement

Why—Speed
 RDB is poor of performance while there is huge data
 {“CPU”:” Intel(R) Core(TM)2 Duo CPU E7200 @
2.53GHz”,
”RAM”:”8G DDR2 667”,
”Disk”:”SATA”,
”OS”:”Redhat 5.5”}

Why—Query QPS
 50 million records query QPS

Why—Insert QPS
 50 millions records insert QPS

Why—Insert TPS
 50 millions records insert TPS

Why—Insert CPU
 50 millions records insert CPU

Why—Extend
 RDB is hard to extend while MongoDB is easy

Why– Other
 Huge data will grow fast
 No transaction
 A little complex query
 Index support
 Multiple language support
 Auto sharding
 Map/Reduce support
 GridFS

How –Rule
 The rule is THINK and DESIGN

How – ODM
 Use ODM to minimize the cost of learning
 Spring Data,
Morm(https://github.com/macrohuang/mongo-orm)
and so on

How – Cluster
 Always use replica set
 Reading from a secondary (however, you will need to
deal with possible eventual consistency depending on
the write concern)

Kill them first,
they are copied
from me.

How – Replica Set
 Always assign replica set priority

How – Key design
 Keep document key short to save space

How – Sort rule
 Never sort on un-index field with lots of result

How – Index
 Keep all your indexes in RAM(for maximum
performance, but not required)

What – TPS
 Report transaction(Business transaction)
Transaction per second
+29%
100

80

60 Transaction per
second
40

20

0
Oracle MongoDB

What – Response time
 Report system max response time
Max Response Time

100 -28.55%
80

60
Max Response Time
40

20

0
Oracle MongoDB

What – CPU I/O
 Report system CPU I/O wait
CPU I/O wait

100
-43%
80

60 CPU I/O wait
40

20

0
Oracle MongoDB

What – CPU
 Report system CPU idle
CPU idle
+80%
100

80

60 CPU idle
40

20

0
Oracle MongoDB

What – Log system
 3 nodes replica set storing 2 billion documents
 12 million documents grow everyday
 100 thousand query request within 2 seconds average
response time

What's hot

Sql Server Best Practices

Shubham Sharma

(SDD404) Amazon RDS for Microsoft SQL Server Deep Dive | AWS re:Invent 2014

Amazon Web Services

Cassandra in Operation

niallmilton

A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3. Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery. About the Speaker Joshua Wickman Database Engineer, Knewton Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.

Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...

DataStax

Apc optimization

Alex Raskin

Postgres is a popular relational database and is the backend of a number of high traffic applications. Join AWS and PalominoDB, the company that helped Obama for America campaign optimize the database infrastructure on AWS, to learn about how you can run high throughput, I/O intensive Postgres clusters on the Amazon EBS storage platform. We will go over best practices including performance, durability and optimization related to deploying Postgres on AWS. You hear about the best practices learned and applied for the Obama for America campaign. In this webinar, you will learn about: - Amazon Elastic Block Store (EBS) - Why Provisioned IOPS volumes fit the needs of high I/O intensive applications - Best practices for deploying Postgres on AWS - How to leverage Provisioned IOPS volumes for Postgres

AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...

Amazon Web Services

4 use cases for C* to Scylla

◄ ★ Jack Pavlov ★ ►

Apache cassandra nio

Kazutaka Tomita

Making sure your Data Model will work on the production cluster after 6 months as well as it does on your laptop is an important skill. It's one that we use every day with our clients at The Last Pickle, and one that relies on tools like the cassandra-stress. Knowing how the data model will perform under stress once it has been loaded with data can prevent expensive re-writes late in the project. In this talk Christopher Batey, Consultant at The Last Pickle, will shed some light on how to use the cassandra-stress tool to test your own schema, graph the results and even how to extend the tool for your own use cases. While this may be called premature optimisation for a RDBS, a successful Cassandra project depends on it's data model. About the Speaker Christopher Batey Consultant / Software Engineer, The Last Pickle Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...

DataStax

Performance evaluation of apache tajo

Jihoon Son

Tales from production with postgreSQL at scale

Soumya Ranjan Subudhi

HDFSvTACHYON

Kevin Wong

Fusion-io and MySQL at Craigslist

Jeremy Zawodny

opentsdb in a real enviroment

Chen Robert

The Cassandra architecture shines at ensuring a very high availability of data even while nodes are failing or are overloaded. On the other hand, query latency will often rise during these events, especially on the higher percentiles. Many improvements have been made to reduce this effect over the past years. This talk will focus on one in particular: Speculative Retries. Introduced in Cassandra 2.0 on the server side and in the Java Driver 3.0 on the client side, this strategy remains complex to fully understand and to finely tune. This talk will deep dive into theoretical and practical aspects of Speculative Retries, showing the effect of tuning strategies with ad-hoc benchmarks. About the Speakers Michael Figuiere Cloud Platform Engineer, Netflix Michael is a senior software engineer at Netflix where he works on improving the cloud storage infrastructure. He previously worked at Apple and DataStax where he worked for several years on creating Drivers and Developer Tools for Cassandra. At ease with both enterprise applications and lower level technologies, he specializes in distributed architectures and topics such as databases, search engines, and cloud. Minh Do Senior Distributed Engineer, Netflix Minh Do has been working at Netflix for the last several years to run, patch, and troubleshoot Cassandra on both server and client sides, and is also a co-creator of Dynomite project. Prior to Netflix, at Tango, he spearheaded its Big Data pipeline system from the ground using Spark/Hadoop. Before that, at Qualys, he built a distributed queue system that bridges traffics between all major components. He has passion in distributed system, machine learning/deep learning, and data storages.

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...

DataStax

Presented to eRum (Budapest), May 2018 There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe the doAzureParallel package, a backend to the "foreach" package that automates the process of spawning a cluster of virtual machines in the Azure cloud to process iterations in parallel. This will include an example of optimizing hyperparameters for a predictive model using the "caret" package.

Speeding up R with Parallel Programming in the Cloud

Revolution Analytics

Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is disruptive technology in the database space, bringing a new architectural model and distributed systems techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share early customer experience from the field.

(DAT405) Amazon Aurora Deep Dive

Amazon Web Services

High Performance, Scalable MongoDB in a Bare Metal Cloud

MongoDB

Criteo had an Hadoop cluster with 39 PB raw stockage, 13404 CPUs, 105 TB RAM, 40 TB data imported per day and over 100000 jobs per day. This cluster was critical in both stockage and compute but without backups. After many efforts to increase our redundancy, we now have two clusters that, combined, have more than 2000 nodes, 130 PB, two different versions of Hadoop and 200000 jobs per day but these clusters do not yet provide a redundant solution to our all storage and compute needs. This talk discusses the choices and issues we solved in creating a 1200 node cluster with new hardware in a new data centre. Some of the challenges involved in running two different clusters in parallel will be presented. We will also analyse what went right (and wrong) in our attempt to achieve redundancy and our plans to improve our capacity to handle the loss of a data centre.

Redundancy for Big Hadoop Clusters is hard - Stuart Pook

Evention

Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Danny Abukalam

What's hot (20)

Sql Server Best Practices

(SDD404) Amazon RDS for Microsoft SQL Server Deep Dive | AWS re:Invent 2014

Cassandra in Operation

Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...

Apc optimization

AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...

4 use cases for C* to Scylla

Apache cassandra nio

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...

Performance evaluation of apache tajo

Tales from production with postgreSQL at scale

HDFSvTACHYON

Fusion-io and MySQL at Craigslist

opentsdb in a real enviroment

Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...

Speeding up R with Parallel Programming in the Cloud

(DAT405) Amazon Aurora Deep Dive

High Performance, Scalable MongoDB in a Bare Metal Cloud

Redundancy for Big Hadoop Clusters is hard - Stuart Pook

Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research

Viewers also liked

Quotes on Advertising

reducedata

Advertising automation

reducedata

Algorithmic marketplace

reducedata

Building an Analytics Engine on MongoDB to Revolutionize Advertising

MongoDB

Apache Cassandra Data Modeling with Travis Price

DataStax Academy

MongoFr : MongoDB as a log Collector

Pierre Baillet

Evaluating NoSQL Performance: Time for Benchmarking

Sergey Bushik

Scalable Event Analytics with MongoDB & Ruby on Rails

Jared Rosoff

Couchbase Performance Benchmarking

Renat Khasanshyn

This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

The Hive

Rainbird: Realtime Analytics at Twitter (Strata 2011)

Kevin Weil

豆瓣数据架构实践

Xupeng Yun

Optimizing MongoDB: Lessons Learned at Localytics

andrew311

Viewers also liked (13)

Quotes on Advertising

Advertising automation

Algorithmic marketplace

Building an Analytics Engine on MongoDB to Revolutionize Advertising

Apache Cassandra Data Modeling with Travis Price

MongoFr : MongoDB as a log Collector

Evaluating NoSQL Performance: Time for Benchmarking

Scalable Event Analytics with MongoDB & Ruby on Rails

Couchbase Performance Benchmarking

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

Rainbird: Realtime Analytics at Twitter (Strata 2011)

豆瓣数据架构实践

Optimizing MongoDB: Lessons Learned at Localytics

Similar to How We Use MongoDB in Our Advertising System

SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th

SnappyData

Learn the specifics of Amazon RDS for PostgreSQL’s capabilities and extensions that make it powerful. This session begins with a brief overview of the RDS PostgreSQL service, how it provides High Availability & Durability and will then deep dive into the new features that we have released since re:Invent 2014, including major version upgrade and newly added PostgreSQL extensions to RDS PostgreSQL. During the session, we will also discuss lessons learned running a large fleet of PostgreSQL instances, including specific recommendations. In addition we will present benchmarking results looking at differences between the 9.3, 9.4 and 9.5 releases.

(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features

Amazon Web Services

Sql server 2016 it just runs faster sql bits 2017 edition

Bob Ward

Troubleshooting SQL Server

Stephen Rose

Getting Started with Amazon Redshift

Amazon Web Services

Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.

Getting Started with Amazon Redshift

Amazon Web Services

How Many Slaves (Ukoug)

Doug Burns

Learn how Aerospike's Hybrid Memory Architecture brings transactions and analytics together to power real-time Systems of Engagement ( SOEs) for companies across AdTech, financial services, telecommunications, and eCommerce. We take a deep dive into the architecture including use cases, topology, Smart Clients, XDR and more. Aerospike delivers predictable performance, high uptime and availability at the lowest total cost of ownership (TCO).

Aerospike Hybrid Memory Architecture

Aerospike, Inc.

Sun Oracle Exadata V2 For OLTP And DWH

Mark Rabne

Abstract: Data exploration often requires running aggregation/slice-dice queries on data sourced from disparate sources. You may want to identify distribution patterns, outliers, etc and aid the feature selection process as you train your predictive models. As you begin to understand your data, you want to ask ad-hoc questions expressed through your visualization tool (which typically translates to SQL queries), study the results and iteratively explore the data set through more queries. Unfortunately, even when data sets can be in-memory, large data set computations take time breaking the train of thought and increasing time to insight . We know Spark can be fast through its in-memory parallel processing. But, Spark 1.x isn’t quite there. Spark 2.0 promises to offer 10X better speed than its predecessor. Spark 2.0 ushers some impressive improvements to interactive query performance. We first explore these advances - compiling the query plan eliminating virtual function calls, and other improvements in the Catalyst engine. We compare the performance to other popular popular query processing engines by studying the spark query plans. We then go through SnappyData (an open source project that integrates Spark with a database that offers OLTP, OLAP and stream processing in a single cluster) where we use smarter data colocation and Synopses data (.e.g. Stratified sampling) to dramatically cut down on the memory requirements as well as the query latency. We explain the key concepts in summarizing data using structures like stratified sampling by walking through some examples in Apache Zeppelin notebooks (a open source visualization tool for spark) and demonstrate how we can explore massive data sets with just your laptop resources while achieving remarkable speeds. Bio: Jags is a founder and the CTO of SnappyData. Previously, Jags was the Chief Architect for “fast data” products at Pivotal and served in the extended leadership team of the company. At Pivotal and previously at VMWare, he led the technology direction for GemFire and other distributed in-memory Bio: Jags Ramnarayan is a founder and the CTO of SnappyData. Previously, Jags was the Chief Architect for “fast data” products at Pivotal and served in the extended leadership team of the company. At Pivotal and previously at VMWare, he led the technology direction for GemFire and other distributed in-memory products.

Explore big data at speed of thought with Spark 2.0 and Snappydata

Data Con LA

Tweaking performance on high-load projects

Dmitriy Dumanskiy

Sql server scalability fundamentals

Chris Adkin

Scalable Apache for Beginners

webhostingguy

ABSTRACT: Fatture in Cloud was born in late 2013 on a single-server machine and scaled from zero to 35k customers at the end of 2018. Then, we faced the mandatory electronic invoicing which came into effect in Italy on 1st January 2019, and we experienced a huge growth to 350k customers in few months. In these 5 years, I've learned a lot about cloud architecture, scalability, optimization, DevOps, and we eventually achieved a 99,99% uptime even in the huge growth period. BIO: Daniele Ratti is the Founder and CEO of Fatture in Cloud, which is currently the leader invoicing platform in Italy, counting more than 350k customers.

Scaling an invoicing SaaS from zero to over 350k customers

Speck&Tech

Sqream DB on OpenPOWER performance

Ganesan Narayanasamy

MySQL NDB Cluster 8.0 SQL faster than NoSQL

Bernd Ocklin

Redis Reliability, Performance & Innovation

Redis Labs

Tweaking perfomance on high-load projects_Думанский Дмитрий

GeeksLab Odessa

Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.

Getting Started with Amazon Redshift

Amazon Web Services

SQL Server It Just Runs Faster

Bob Ward

Similar to How We Use MongoDB in Our Advertising System (20)

SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th

(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features

Sql server 2016 it just runs faster sql bits 2017 edition

Troubleshooting SQL Server

Getting Started with Amazon Redshift

How Many Slaves (Ukoug)

Aerospike Hybrid Memory Architecture

Sun Oracle Exadata V2 For OLTP And DWH

Explore big data at speed of thought with Spark 2.0 and Snappydata

Tweaking performance on high-load projects

Sql server scalability fundamentals

Scalable Apache for Beginners

Scaling an invoicing SaaS from zero to over 350k customers

Sqream DB on OpenPOWER performance

MySQL NDB Cluster 8.0 SQL faster than NoSQL

Redis Reliability, Performance & Innovation

Tweaking perfomance on high-load projects_Думанский Дмитрий

Getting Started with Amazon Redshift

SQL Server It Just Runs Faster

More from MongoDB

During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

How We Use MongoDB in Our Advertising System

1. Why and How I use MongoDB {name:”Macro Huang”, github:”macrohuang”, location:”Beijing”, email:[“macrohuang.whu@gamil.com”,”macroh uang@126.com”] }

2. Content Where do I use MongoDB Why I choose MongoDB How I use MongoDB What is the benefit of using MongoDB Q&A

3. Where—Report System  Data never change after importing  Pre-compute before importing into MongoDB  Good performance of query

4. Where—Log System  Huge data totally  Data grows very quickly  Data never change  No transaction required  A little complex query

5. Why—Heavy VS Thin  RDB is powerful, but too heavy to fit some requirement

6. Why—Speed  RDB is poor of performance while there is huge data  {“CPU”:” Intel(R) Core(TM)2 Duo CPU E7200 @ 2.53GHz”, ”RAM”:”8G DDR2 667”, ”Disk”:”SATA”, ”OS”:”Redhat 5.5”}

7. Why—Query QPS  50 million records query QPS

8. Why—Insert QPS  50 millions records insert QPS

9. Why—Insert TPS  50 millions records insert TPS

10. Why—Insert CPU  50 millions records insert CPU

11. Why—Speed

12. Why—Easy  Learning curve

13. Why—Extend  RDB is hard to extend while MongoDB is easy

14. Why– Other  Huge data will grow fast  No transaction  A little complex query  Index support  Multiple language support  Auto sharding  Map/Reduce support  GridFS

15. How –Rule  The rule is THINK and DESIGN

16. How – ODM  Use ODM to minimize the cost of learning  Spring Data, Morm(https://github.com/macrohuang/mongo-orm) and so on

17. How – Cluster  Always use replica set  Reading from a secondary (however, you will need to deal with possible eventual consistency depending on the write concern) Kill them first, they are copied from me.

18. How – Replica Set  Always assign replica set priority

19. How – Key design  Keep document key short to save space

20. How – _id  Customize your own _id

21. How – Sort rule  Never sort on un-index field with lots of result

22. How – Index  Keep all your indexes in RAM(for maximum performance, but not required)

23. What – TPS  Report transaction(Business transaction) Transaction per second +29% 100 80 60 Transaction per second 40 20 0 Oracle MongoDB

24. What – Response time  Report system max response time Max Response Time 100 -28.55% 80 60 Max Response Time 40 20 0 Oracle MongoDB

25. What – CPU I/O  Report system CPU I/O wait CPU I/O wait 100 -43% 80 60 CPU I/O wait 40 20 0 Oracle MongoDB

26. What – CPU  Report system CPU idle CPU idle +80% 100 80 60 CPU idle 40 20 0 Oracle MongoDB

27. What – Log system  3 nodes replica set storing 2 billion documents  12 million documents grow everyday  100 thousand query request within 2 seconds average response time

28. Q&A

How We Use MongoDB in Our Advertising System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to How We Use MongoDB in Our Advertising System

Similar to How We Use MongoDB in Our Advertising System (20)

More from MongoDB

More from MongoDB (20)

How We Use MongoDB in Our Advertising System