SlideShare a Scribd company logo
1 of 50
Download to read offline
Architectural
Anti Patterns
Notes on Data Distribution and Handling Failures

Gleicon Moraes
$whoami

work: http://locaweb.com

site: http://7co.cc

blog: http://zenmachine.wordpress.com

code: http://github.com/gleicon

RestMQ: http://restmq.com

twitter: @gleicon
Motivation


• User group meetings with friends to NoSQL:br, QConSP and now OSCon Data !
• Consulting cases
• Large volumes of data / High concurrent access
• Distribution and Disaster recovery
• Architecture
• Operations
• Live migrations
Required Listening
                     Tracklist

                       Elephant Talk
                       Frame by Frame
                       Matte Kudasai (please wait)
                       Indiscipline
                       Thela Hun Ginjeet (heat in the jungle)
                       The Sheltering Sky
                       Discipline
                       Matte Kudasai (Bonus Track – alternate mix)
Agenda
  Elephant Talk - Why Anti Patterns ?

  Frame by Frame - DML/DDL Anti Patterns

  Matte Kudasai (please wait) - Architectural Anti Patterns

  Indiscipline - Abusing the RDBMS

  Thela Hun Ginjeet (Heat in the jungle) - A word about Ops

  The Sheltering Sky - Architecture and Migration

  Discipline - The discussion over data organization + Questions
Elephant Talk
(Why Anti Patterns ?)
Why Anti Patterns



 Architectural Anti Patterns for Data Handling or 'Please don't do it with [MySQL|PGSQL|
           Oracle|SQLServer]' or 'Things that you shouldn't use a RDBMS for'.
Still, why Anti Patterns ?
  Mostly result of innocent assumptions

  You catch them at the server crashing or customer complaints

  NoSQL databases are not safe from it. Mistaken assumptions are like water, finds a way
   around everything.
Quick Question

“What is the main MySQL tuning technique used in panic situations ?”
Quick Question

“What is the main MySQL tuning technique used in panic situations ?”


A: To install PostgreSQL
RDBMS Anti Patterns
News Flash: Not everything fits on a RDBMS.

The Great Architectural Staple (tm), used as:

•  Message Queues
•  Job Schedulers
•  Distributed Counters
•  Distributed Lock
•  Document Storage
•  Filesystem
•  Cache
•  Logger
•  Workflow Orchestrator
Frame by Frame
(DML/DDL Anti Patterns)
The Eternal Tree
Description

Queries around a single table, self referencing an id or key.

Example

Most threaded discussion examples uses something like a table which contains all threads
and answers, relating to each other by an id.

Discussion

At any given point the application code will contain its own tree-traversal algorithm to manage
this.

Also most cases have a limit in depth for performance reasons. Any kind of sharding requires
that all comments be at the same instance.
The Eternal Tree
Table

ID                 Parent-ID         Author             Text
1                  0                 gleicon            Hello World
2                  1                 elvis              Shout!

Same data stored as a document

{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
     {
       'author': elvis, text:'shout', replies:[{...}]
     }
   ]
}

The same sharding and indexing issues still apply.
Dynamic Table Creation
Discussion

To avoid huge tables, one must come with a "dynamic schema".

Example

A Document storage/management company, which is adding new facilities over the
country. For each storage facility, a new table is created, structured like this:
Item_id          Row              Column            Description
1                10               20                Cat food
2                12               32                Trout
Discussion

To query the total capacity of a set of facilities over a given state, one query to an index table
would have to be issued, and then one query for each facility on its own table.
The Alignment
Description

Extra columns are created but not used immediately. Usually they are named as a1, a2, a3,
a4 and called padding.

Example

Data with variable attributes or highly populated tables which broke on migrations when
adding an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE

Discussion

Having no used columns might feel hackish as when one used padding on ANSI C structs,
but probably the best alternative to variable attributes is a document based database.

Specially if the attributes are local to each document
Hierarchical Sharding
Description
Using databases to shard data inside the same server/instance.

Example
Email accounts management. Distinct databases inside a RDBMS ranging from A to Z, each
database has tables for users starting with the proper letter. Each table has user data.

> show databases;
abcdefghijklmnopqrstuwxz
> use a
> show tables;
...alberto alice alma ... (and a lot more)
Hierarchical Sharding
Discussion

There is no way to query anything in common for all users without application side
processing. JOINs are not possible.

RDBMS have all tools to deal with this particular case of 'different clients and their data'

The letters A and M (at least for Brazilian Portuguese) will alone have more names than the
rest.
Embedded Lists
Description

As data complexity grows, one thinks that it's proper for the application to handle different
data structures embedded in a single cell or row.

Example

Flags for feature administration

> select group_field from that_email_sharded_database.user
"a@email1.net, b@email1.net,c@email2.net"

> select flags from email_feature_admin where id = 2
1|0|1|1|0|
Embedded Lists
Discussion

The popular 'Let's use commas to separe it'. You may find distinct separators as |, -, [] and so
on. Embedding CSV fields would fall on this category

Apart from space optimization, a very common argument is to save space.

A proper alternative is a clone of proper SETS and LISTS on K/V stores as Redis, acting as a
property storage. Or even a row for each flag.
Matte Kudasai
(Architectural Anti Patterns)
Heroes are hard to find
Table as Cache
Description

A workaround for complex queries which stores resultsets in a different table using the query -
or part of it as a key.

Discussion

Complex and recurrent queries that demand the result to be stored in a separated table, so
they result can be retrieved quickly.

Queries with many JOINs or complex data dictionaries, in late stage development or legacy
patching.

Also, it de-normalizes the data and demands cache maintenance as purging and invalidation.

In-memory cache vs DB backed cache access time
Table as Queue
Description

The use of a table, or more than one table, to hold messages to be processed, instead of a
Message Broker.

Discussion

Table/views with rows ordered by time of creation

Demands application polling and cleaning

Race condition when more than one client is polling

Wrapped on stored procedures/triggers to signal new items

BONUS: global procedure to generate job/message IDs
Table as Log File
Description

Logs written straight to one or more tables fully or in a round-robin fashion.

Discussion

Concurrent access in the same schema than application data

From time to time it needs to be purged/truncated

Ordered/Indexed by date, to avoid grep'ing log files

Usually application/transaction log
Scheme only fits in an A0 sheet
Description

High specialized and normalized data dictionaries

Example

The product description is local to a 'child' table, but the main expiration ref is represented by
the main product table.

Main_Product_Table
ID               Dept               Due_date        Expires_in

     Product_A           Product_B
ID    Desc              ID   Desc
Scheme only fits in an A0 sheet
Discussion

Extreme specialization creates tables which converges to key/value model.

To model the data scheme after modeling the applications objects

The normal form get priority over common sense to avoid de-normalization

Extreme (as in extreme sports) JOINs required and views to hide the complexity.
Your ORM
Description

Expectations over the use of an ORM over database performance and best practices.

Discussion

Your ORM issue full queries for each dataset iteration

Performance is not what was expected as the data volume grows

ORMs are trying to bridge two things with distinct impedance (domain(set) of each column ->
Objects atributes)
Indiscipline
(Abusing the RDBMS)
Stoned Procedures
Description

On the use of stored procedures to hold application logic and event signaling outside of the
database context. Triggers and Stored procedures combined to orchestrate applications

Discussion

Stored Procedures and Triggers has the magic property of vanishing of our memories and
being impossible to keep versioned. Also, they are hard to test properly.

Event handling on different architecture piece

Bundled Map/Reduce as stoned procedures for non-RDBMS ?
Distributed Global Locking
Description

On trying to emulate JAVA's synchronize in a distributed manner, initially as a ref counter

Example

> select COALESCE(GET_LOCK('my_lock',0 ),0 )

Discussion

RDBMS are not the proper place to locking (or ref counters, which may act as soft
locks)

Embedded in a magic classes called DistributedSynchronize, ClusterSemaphore,
DistributedCluster, SynchronizedSemaphore
Throttle Control
Description

On delegating to the database the control and access tracking to a given resource

Example

A sequence of statements is issued, varying from a update...select to a transaction block
using a stored procedure:

> select count(access) from throttle_ctl where ip_addr like ...
> update .... or begin ... commit
Throttle Control
Discussion

For each request would have to check on a transaction block.

Usually mixed with a table-based ACL.

Using memcached (or any other k/v store) to add and set expire time:

if (add 'IPADDR:YYYY:MM:DD:HH', 1) < your_limit: do_stuff()
Theela Hun Ginjeet
   (A word about Ops)
A word about operations
The obligatory quote ritual




"Without time constraints, any migration is easy, regardless data size"
   (Out of production environment, events flow in a different time)
Email Data stored on Databases
•  Email traffic accounting ~4500 msgs/sec in, ~2000 msgs out

•  10 GB SPAM/NOT SPAM tokenized data per day

•  ~ 300 GB logfile data/day

•  80 Billion DNS queries per day / 3.1 Gb/s traffic

•  ~1k req/sec tcptable queries (message routing)

•  0.8 PB of data migrated over mixed quality network.

•  Application logs on MySQL / High concurrent data on PostgreSQL
Fail seven times, stand up eight
The Sheltering Sky
(Architecture and Migration)
Cache

Think if the data you use aren't already de-normalized somewhere (cached)

Are you dependent on cache ? Does your application fails when there is no warm cache ?
Does it just slows down ? MTTR ?
Migration

How to put and get data back from the database ?
SQL queries, CSV, binary file, HTTP, Protocol Buffers

How to change the schema ? Is there a schema ?

Rough calculation to stream 1 Terabyte over a 1 Gigabit/sec link, assuming you have a
dedicated network segment for that and constant throughput of 70% (~ 716.8 Mbps/sec):
3h15m

Not considering:

•  protocol overhead
•  concurrent traffic (live migrations)
•  indexing
Architecture

Most of the anti-patterns signals that there are architectural issues instead of only database
issues.

Call it NoSQL, non relational, or any other name, but assemble a toolbox to deal with different
data types.
Discipline
(Data Organization)
The obligatory quote ritual




"Without operational requirements, it's a personal choice about data organization"
Data Organization - Storing
MongoDB                                                 Redis
{
 'id': 1,                                               SET book:1 {'title' : 'Diving into Python',
 'title' : 'Diving into Python',                        'author': 'Mark Pilgrim'}
 'author': 'Mark Pilgrim',                              SET book:2 { 'title' : 'Programing Erlang',
 'tags': ['python','programming', 'computing']          'author': 'Joe Armstrong'}
}                                                       SET book:3 { 'title' : 'Programing in Haskell',
                                                        'author': 'Graham Hutton'}
{
 'id':2,
 'title' : 'Programing Erlang',                         SADD tag:python 1
 'author': 'Joe Armstrong',                             SADD tag:erlang 2
 'tags': ['erlang','programming', 'computing',          SADD tag:haskell 3
'distributedcomputing', 'FP']                           SADD tag:programming 1 2 3
}                                                       SADD tag computing 1 2 3
                                                        SADD tag:distributedcomputing 2
{                                                       SADD tag:FP 2 3
 'id':3,
 'title' : 'Programing in Haskell',
 'author': 'Graham Hutton',
 'tags': ['haskell','programming', 'computing', 'FP']
}
Data Organization - Querying
MongoDB                                              Redis

Search tags for erlang or haskell:                   Search tags for erlang and haskell:

db.books.find({"tags":                               SINTER 'tag:erlang' 'tag:haskell'
     { $in: ['erlang', 'haskell']                    0 results
   }
})                                                   Search for programming and computing tags:

Search tags for erlang AND haskell (no results)      SINTER 'tag:programming' 'tag:computing'
                                                     3 results: 1, 2, 3
db.books.find({"tags":
     { $all: ['erlang', 'haskell']                   SUNION 'tag:erlang' 'tag:haskell'
   }                                                 2 results: 2 and 3
})
                                                     SDIFF 'tag:programming' 'tag:haskell'
This search yields 3 results                         2 results: 1 and 2 (haskell is excluded)
db.books.find({"tags":
     { $all: ['programming', 'computing']
   }
})
Obligatory book ref
Obligatory book ref
Questions ?
Thanks !

More Related Content

What's hot

Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Database Architecture and Basic Concepts
Database Architecture and Basic ConceptsDatabase Architecture and Basic Concepts
Database Architecture and Basic ConceptsTony Wong
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
Ado.net &amp; data persistence frameworks
Ado.net &amp; data persistence frameworksAdo.net &amp; data persistence frameworks
Ado.net &amp; data persistence frameworksLuis Goldster
 
How Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill SetsHow Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill SetsChad Petrovay
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedAnant Kumar
 
Introduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLIntroduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLArangoDB Database
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflakeSivakumar Ramar
 
Vb.net session 05
Vb.net session 05Vb.net session 05
Vb.net session 05Niit Care
 
Comparison between rdbms and nosql
Comparison between rdbms and nosqlComparison between rdbms and nosql
Comparison between rdbms and nosqlbharati k
 
Introduction to ADO.NET
Introduction to ADO.NETIntroduction to ADO.NET
Introduction to ADO.NETrchakra
 
Data warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and visionData warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and visionKlaudiia Jacome
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
 

What's hot (20)

Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Database Architecture and Basic Concepts
Database Architecture and Basic ConceptsDatabase Architecture and Basic Concepts
Database Architecture and Basic Concepts
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Ado.net &amp; data persistence frameworks
Ado.net &amp; data persistence frameworksAdo.net &amp; data persistence frameworks
Ado.net &amp; data persistence frameworks
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
How Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill SetsHow Clean is your Database? Data Scrubbing for all Skill Sets
How Clean is your Database? Data Scrubbing for all Skill Sets
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
Introduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLIntroduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQL
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
MS SQL Server
MS SQL ServerMS SQL Server
MS SQL Server
 
Vb.net session 05
Vb.net session 05Vb.net session 05
Vb.net session 05
 
Comparison between rdbms and nosql
Comparison between rdbms and nosqlComparison between rdbms and nosql
Comparison between rdbms and nosql
 
ADO.NET by ASP.NET Development Company in india
ADO.NET by ASP.NET  Development Company in indiaADO.NET by ASP.NET  Development Company in india
ADO.NET by ASP.NET Development Company in india
 
Introduction to ADO.NET
Introduction to ADO.NETIntroduction to ADO.NET
Introduction to ADO.NET
 
Discover database
Discover databaseDiscover database
Discover database
 
Data warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and visionData warehouse 2.0 and sql server architecture and vision
Data warehouse 2.0 and sql server architecture and vision
 
Chap14 ado.net
Chap14 ado.netChap14 ado.net
Chap14 ado.net
 
Ado.net
Ado.netAdo.net
Ado.net
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
NoSQL
NoSQLNoSQL
NoSQL
 

Viewers also liked

L'esprit de l'escalier
L'esprit de l'escalierL'esprit de l'escalier
L'esprit de l'escalierGleicon Moraes
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisSemi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisGleicon Moraes
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...Gleicon Moraes
 
You shall not get excited
You shall not get excitedYou shall not get excited
You shall not get excitedx697272
 
Database Anti Patterns
Database Anti PatternsDatabase Anti Patterns
Database Anti PatternsRobert Treat
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaSGleicon Moraes
 

Viewers also liked (9)

L'esprit de l'escalier
L'esprit de l'escalierL'esprit de l'escalier
L'esprit de l'escalier
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisSemi Automatic Sentiment Analysis
Semi Automatic Sentiment Analysis
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
 
Patterns of fail
Patterns of failPatterns of fail
Patterns of fail
 
NoSql Introduction
NoSql IntroductionNoSql Introduction
NoSql Introduction
 
You shall not get excited
You shall not get excitedYou shall not get excited
You shall not get excited
 
Locaweb cloud and sdn
Locaweb cloud and sdnLocaweb cloud and sdn
Locaweb cloud and sdn
 
Database Anti Patterns
Database Anti PatternsDatabase Anti Patterns
Database Anti Patterns
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 

Similar to Architectural Anti Patterns - Notes on Data Distribution and Handling Failures

Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handlingGleicon Moraes
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingGleicon Moraes
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementCloudbells.com
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseMohammad Shaker
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Bill Wilder
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Design Considerations For Storing With Windows Azure
Design Considerations For Storing With Windows AzureDesign Considerations For Storing With Windows Azure
Design Considerations For Storing With Windows AzureEric Nelson
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.pptAnandKonj1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'sankarapu posibabu
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.pptssuser8c8fc1
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 

Similar to Architectural Anti Patterns - Notes on Data Distribution and Handling Failures (20)

Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
7 data management design
7 data management design7 data management design
7 data management design
 
No sql
No sqlNo sql
No sql
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Design Considerations For Storing With Windows Azure
Design Considerations For Storing With Windows AzureDesign Considerations For Storing With Windows Azure
Design Considerations For Storing With Windows Azure
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 

More from Gleicon Moraes

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebramGleicon Moraes
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsGleicon Moraes
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...Gleicon Moraes
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Gleicon Moraes
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityGleicon Moraes
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by AccidentGleicon Moraes
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Gleicon Moraes
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueGleicon Moraes
 
NoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsNoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsGleicon Moraes
 

More from Gleicon Moraes (11)

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebram
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
 
API Gateway report
API Gateway reportAPI Gateway report
API Gateway report
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs Scalability
 
Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
 
NoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsNoSQL and SQL Anti Patterns
NoSQL and SQL Anti Patterns
 
Redis
RedisRedis
Redis
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Architectural Anti Patterns - Notes on Data Distribution and Handling Failures

  • 1. Architectural Anti Patterns Notes on Data Distribution and Handling Failures Gleicon Moraes
  • 2. $whoami work: http://locaweb.com site: http://7co.cc blog: http://zenmachine.wordpress.com code: http://github.com/gleicon RestMQ: http://restmq.com twitter: @gleicon
  • 3. Motivation • User group meetings with friends to NoSQL:br, QConSP and now OSCon Data ! • Consulting cases • Large volumes of data / High concurrent access • Distribution and Disaster recovery • Architecture • Operations • Live migrations
  • 4. Required Listening Tracklist   Elephant Talk   Frame by Frame   Matte Kudasai (please wait)   Indiscipline   Thela Hun Ginjeet (heat in the jungle)   The Sheltering Sky   Discipline   Matte Kudasai (Bonus Track – alternate mix)
  • 5. Agenda   Elephant Talk - Why Anti Patterns ?   Frame by Frame - DML/DDL Anti Patterns   Matte Kudasai (please wait) - Architectural Anti Patterns   Indiscipline - Abusing the RDBMS   Thela Hun Ginjeet (Heat in the jungle) - A word about Ops   The Sheltering Sky - Architecture and Migration   Discipline - The discussion over data organization + Questions
  • 7. Why Anti Patterns Architectural Anti Patterns for Data Handling or 'Please don't do it with [MySQL|PGSQL| Oracle|SQLServer]' or 'Things that you shouldn't use a RDBMS for'.
  • 8. Still, why Anti Patterns ?   Mostly result of innocent assumptions   You catch them at the server crashing or customer complaints   NoSQL databases are not safe from it. Mistaken assumptions are like water, finds a way around everything.
  • 9. Quick Question “What is the main MySQL tuning technique used in panic situations ?”
  • 10. Quick Question “What is the main MySQL tuning technique used in panic situations ?” A: To install PostgreSQL
  • 11. RDBMS Anti Patterns News Flash: Not everything fits on a RDBMS. The Great Architectural Staple (tm), used as: •  Message Queues •  Job Schedulers •  Distributed Counters •  Distributed Lock •  Document Storage •  Filesystem •  Cache •  Logger •  Workflow Orchestrator
  • 12. Frame by Frame (DML/DDL Anti Patterns)
  • 13. The Eternal Tree Description Queries around a single table, self referencing an id or key. Example Most threaded discussion examples uses something like a table which contains all threads and answers, relating to each other by an id. Discussion At any given point the application code will contain its own tree-traversal algorithm to manage this. Also most cases have a limit in depth for performance reasons. Any kind of sharding requires that all comments be at the same instance.
  • 14. The Eternal Tree Table ID Parent-ID Author Text 1 0 gleicon Hello World 2 1 elvis Shout! Same data stored as a document { thread_id:1, title: 'the meeting', author: 'gleicon', replies:[ { 'author': elvis, text:'shout', replies:[{...}] } ] } The same sharding and indexing issues still apply.
  • 15. Dynamic Table Creation Discussion To avoid huge tables, one must come with a "dynamic schema". Example A Document storage/management company, which is adding new facilities over the country. For each storage facility, a new table is created, structured like this: Item_id Row Column Description 1 10 20 Cat food 2 12 32 Trout Discussion To query the total capacity of a set of facilities over a given state, one query to an index table would have to be issued, and then one query for each facility on its own table.
  • 16. The Alignment Description Extra columns are created but not used immediately. Usually they are named as a1, a2, a3, a4 and called padding. Example Data with variable attributes or highly populated tables which broke on migrations when adding an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE Discussion Having no used columns might feel hackish as when one used padding on ANSI C structs, but probably the best alternative to variable attributes is a document based database. Specially if the attributes are local to each document
  • 17. Hierarchical Sharding Description Using databases to shard data inside the same server/instance. Example Email accounts management. Distinct databases inside a RDBMS ranging from A to Z, each database has tables for users starting with the proper letter. Each table has user data. > show databases; abcdefghijklmnopqrstuwxz > use a > show tables; ...alberto alice alma ... (and a lot more)
  • 18. Hierarchical Sharding Discussion There is no way to query anything in common for all users without application side processing. JOINs are not possible. RDBMS have all tools to deal with this particular case of 'different clients and their data' The letters A and M (at least for Brazilian Portuguese) will alone have more names than the rest.
  • 19. Embedded Lists Description As data complexity grows, one thinks that it's proper for the application to handle different data structures embedded in a single cell or row. Example Flags for feature administration > select group_field from that_email_sharded_database.user "a@email1.net, b@email1.net,c@email2.net" > select flags from email_feature_admin where id = 2 1|0|1|1|0|
  • 20. Embedded Lists Discussion The popular 'Let's use commas to separe it'. You may find distinct separators as |, -, [] and so on. Embedding CSV fields would fall on this category Apart from space optimization, a very common argument is to save space. A proper alternative is a clone of proper SETS and LISTS on K/V stores as Redis, acting as a property storage. Or even a row for each flag.
  • 22. Heroes are hard to find
  • 23. Table as Cache Description A workaround for complex queries which stores resultsets in a different table using the query - or part of it as a key. Discussion Complex and recurrent queries that demand the result to be stored in a separated table, so they result can be retrieved quickly. Queries with many JOINs or complex data dictionaries, in late stage development or legacy patching. Also, it de-normalizes the data and demands cache maintenance as purging and invalidation. In-memory cache vs DB backed cache access time
  • 24. Table as Queue Description The use of a table, or more than one table, to hold messages to be processed, instead of a Message Broker. Discussion Table/views with rows ordered by time of creation Demands application polling and cleaning Race condition when more than one client is polling Wrapped on stored procedures/triggers to signal new items BONUS: global procedure to generate job/message IDs
  • 25. Table as Log File Description Logs written straight to one or more tables fully or in a round-robin fashion. Discussion Concurrent access in the same schema than application data From time to time it needs to be purged/truncated Ordered/Indexed by date, to avoid grep'ing log files Usually application/transaction log
  • 26. Scheme only fits in an A0 sheet Description High specialized and normalized data dictionaries Example The product description is local to a 'child' table, but the main expiration ref is represented by the main product table. Main_Product_Table ID Dept Due_date Expires_in Product_A Product_B ID Desc ID Desc
  • 27. Scheme only fits in an A0 sheet Discussion Extreme specialization creates tables which converges to key/value model. To model the data scheme after modeling the applications objects The normal form get priority over common sense to avoid de-normalization Extreme (as in extreme sports) JOINs required and views to hide the complexity.
  • 28. Your ORM Description Expectations over the use of an ORM over database performance and best practices. Discussion Your ORM issue full queries for each dataset iteration Performance is not what was expected as the data volume grows ORMs are trying to bridge two things with distinct impedance (domain(set) of each column -> Objects atributes)
  • 30. Stoned Procedures Description On the use of stored procedures to hold application logic and event signaling outside of the database context. Triggers and Stored procedures combined to orchestrate applications Discussion Stored Procedures and Triggers has the magic property of vanishing of our memories and being impossible to keep versioned. Also, they are hard to test properly. Event handling on different architecture piece Bundled Map/Reduce as stoned procedures for non-RDBMS ?
  • 31. Distributed Global Locking Description On trying to emulate JAVA's synchronize in a distributed manner, initially as a ref counter Example > select COALESCE(GET_LOCK('my_lock',0 ),0 ) Discussion RDBMS are not the proper place to locking (or ref counters, which may act as soft locks) Embedded in a magic classes called DistributedSynchronize, ClusterSemaphore, DistributedCluster, SynchronizedSemaphore
  • 32. Throttle Control Description On delegating to the database the control and access tracking to a given resource Example A sequence of statements is issued, varying from a update...select to a transaction block using a stored procedure: > select count(access) from throttle_ctl where ip_addr like ... > update .... or begin ... commit
  • 33. Throttle Control Discussion For each request would have to check on a transaction block. Usually mixed with a table-based ACL. Using memcached (or any other k/v store) to add and set expire time: if (add 'IPADDR:YYYY:MM:DD:HH', 1) < your_limit: do_stuff()
  • 34. Theela Hun Ginjeet (A word about Ops)
  • 35. A word about operations
  • 36. The obligatory quote ritual "Without time constraints, any migration is easy, regardless data size" (Out of production environment, events flow in a different time)
  • 37. Email Data stored on Databases •  Email traffic accounting ~4500 msgs/sec in, ~2000 msgs out •  10 GB SPAM/NOT SPAM tokenized data per day •  ~ 300 GB logfile data/day •  80 Billion DNS queries per day / 3.1 Gb/s traffic •  ~1k req/sec tcptable queries (message routing) •  0.8 PB of data migrated over mixed quality network. •  Application logs on MySQL / High concurrent data on PostgreSQL
  • 38. Fail seven times, stand up eight
  • 40. Cache Think if the data you use aren't already de-normalized somewhere (cached) Are you dependent on cache ? Does your application fails when there is no warm cache ? Does it just slows down ? MTTR ?
  • 41. Migration How to put and get data back from the database ? SQL queries, CSV, binary file, HTTP, Protocol Buffers How to change the schema ? Is there a schema ? Rough calculation to stream 1 Terabyte over a 1 Gigabit/sec link, assuming you have a dedicated network segment for that and constant throughput of 70% (~ 716.8 Mbps/sec): 3h15m Not considering: •  protocol overhead •  concurrent traffic (live migrations) •  indexing
  • 42. Architecture Most of the anti-patterns signals that there are architectural issues instead of only database issues. Call it NoSQL, non relational, or any other name, but assemble a toolbox to deal with different data types.
  • 44. The obligatory quote ritual "Without operational requirements, it's a personal choice about data organization"
  • 45. Data Organization - Storing MongoDB Redis { 'id': 1, SET book:1 {'title' : 'Diving into Python', 'title' : 'Diving into Python', 'author': 'Mark Pilgrim'} 'author': 'Mark Pilgrim', SET book:2 { 'title' : 'Programing Erlang', 'tags': ['python','programming', 'computing'] 'author': 'Joe Armstrong'} } SET book:3 { 'title' : 'Programing in Haskell', 'author': 'Graham Hutton'} { 'id':2, 'title' : 'Programing Erlang', SADD tag:python 1 'author': 'Joe Armstrong', SADD tag:erlang 2 'tags': ['erlang','programming', 'computing', SADD tag:haskell 3 'distributedcomputing', 'FP'] SADD tag:programming 1 2 3 } SADD tag computing 1 2 3 SADD tag:distributedcomputing 2 { SADD tag:FP 2 3 'id':3, 'title' : 'Programing in Haskell', 'author': 'Graham Hutton', 'tags': ['haskell','programming', 'computing', 'FP'] }
  • 46. Data Organization - Querying MongoDB Redis Search tags for erlang or haskell: Search tags for erlang and haskell: db.books.find({"tags": SINTER 'tag:erlang' 'tag:haskell' { $in: ['erlang', 'haskell'] 0 results } }) Search for programming and computing tags: Search tags for erlang AND haskell (no results) SINTER 'tag:programming' 'tag:computing' 3 results: 1, 2, 3 db.books.find({"tags": { $all: ['erlang', 'haskell'] SUNION 'tag:erlang' 'tag:haskell' } 2 results: 2 and 3 }) SDIFF 'tag:programming' 'tag:haskell' This search yields 3 results 2 results: 1 and 2 (haskell is excluded) db.books.find({"tags": { $all: ['programming', 'computing'] } })