Crossing the Production Barrier: Development at Scale

jgoulah@etsy.com/@johngoulah
CrossingtheProductionBarrier
DevelopmentAtScale

The world’s handmade marketplace
platform for people to sell homemade, crafts, and vintage goods

1.5B+pageviews/mo.
42MMuniquevisitors/mo.

1.5B+pageviews/mo.
850Kshops/200countries

1.5B+pageviews/mo.
895MMsalesin2012
850Kshops/200countries

big cluster, 20 shards and adding 5 more

over 40% increase from last year in QPS (25K last year)
additional 30K moving over from postgres
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)

4TBInnoDBbuﬀerpool

20TB+datastored

60K+queries/secavg
20TB+datastored

60K+queries/secavg
20TB+datastored
~1.2Gbpsoutbound(plaintext)

60K+queries/secavg
20TB+datastored
99.99%queriesunder1ms
~1.2Gbpsoutbound(plaintext)

50+MySQLservers/800CPUs
ServerSpec
HPDL380G7
96GBRAM
16spindles/1TBRAID10
24Core
16 x 146GB

TheProblem
been around since ’05,
hit this a few years ago, every big company probably has this issue

DATA
sync prod to dev, until prod data gets too big
http://www.ﬂickr.com/photos/uwwresnet/6280880034/sizes/l/in/
photostream/

SomeApproaches
subsets have to end somewhere (a shop has favorites that are connected
to people, connected to shops, etc)
generated data can be time consuming to fake

SomeApproaches
subsetsofdata

SomeApproaches
subsetsofdata
generateddata

But...
but there is a problem with both of those approaches

EdgeCases
what about testing edge cases, difficult to diagnose bugs?
hard to model the same data set that produced a user facing bug
http://www.ﬂickr.com/photos/sovietuk/141381675/sizes/l/in/
photostream/

Perspective
another issue is testing problems at scale, complex and large gobs of
data
real social network ecosystem can be difficult to generate (favorites,
follows)
(activity feed, “similar items” search gives better results)
http://www.ﬂickr.com/photos/donsolo/2136923757/sizes/l/in/
photostream/

Prod Dev?
what most people do before data gets too big,
almost 2 days to sync 20Tb over 1Gbps link, 5 hrs over 10Gbps
bringing prod dataset to dev was expensive hardware/maint,
keeping parity with prod, and applying schema changes would take at least
as long

UseProduction
so we did what we saw as the last resort - used production
not for greenfield development, more for mature features and diagnosing bugs
we still have a dev database but the data is sparse and unreliable

UseProduction
(sometimes)
so we did what we saw as the last resort - used production
not for greenfield development, more for mature features and diagnosing bugs
we still have a dev database but the data is sparse and unreliable

goes without saying this can be dangerous
also difficult if done right, we’ve been working on this for a year
http://www.ﬂickr.com/photos/stuckincustoms/432361985/sizes/l/in/
photostream/

Approach
two big things: cultural and technical

SolveCultureIssuesFirst
part of ﬁguring this out was exhausting all other options
getting buy-in from major stakeholders

Two“Simple”TechnicalIssues

step1:
makeitsafe
how to have test data in production, prevent stupid mistakes

phasedrollout
read-only
r/wdevshardonly

phasedrollout
read-only
r/wdevshardonly
fullr/w

QuickOverview
high level view
http://www.ﬂickr.com/photos/h-k-d/7852444560/sizes/o/in/
photostream/

tickets index
shard1 shard2 shardN

tickets index
UniqueIDs

tickets index
ShardLookup

tickets index
Store/RetrieveData

devshard
introducing....
dev shard, shard used for initial writes of data created when coming from dev
env

tickets index
DEVshard

DEVshard
www.etsy.com www.goulah.vm
InitialWrites

proxy hits all of the shards/index/tickets
http://www.oreillynet.com/pub/a/databases/2007/07/12/getting-started-with-mysql-proxy.html

dangerous/unnecessaryqueries
-- filter dangerous queries - (queries without a WHERE)
-- remove unnecessary queries - (instead of DELETE, have a flag, ALTER
statements don’t run from dev)

(DEV) etsy_rw@jgoulah [test]>
select * from fred_test;

(DEV) etsy_rw@jgoulah [test]>
select * from fred_test;
ERROR 9001 (E9001): Selects from
tables must have where clauses

knownin/egressfunnel
we know where all of the queries from dev originate from
http://www.ﬂickr.com/photos/medevac71/4875526920/sizes/l/in/
photostream/

explicitlyenabled
% dev_proxy on
Dev-Proxy config is now ON. Use
'dev_proxy off' to turn it off.
Not on all the time

notify engineers they are using the proxy,
this is read-only mode

read-write mode, needed for login and other things that write data

stealthdata
hiding data from users
(favorites go on dev and prod shard, making sure test user/shops don’t
show up)
http://www.ﬂickr.com/photos/davidyuweb/8063097077/sizes/h/in/
photostream/

Security
http://www.ﬂickr.com/photos/sidelong/3878741556/sizes/l/in/
photostream/

PCI
token exchange only, locked down for most people

PCI
oﬀ-limits
token exchange only, locked down for most people

anomalydetection
another part of our security setup is detection

logging
basics of anomaly detection is log collection

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date threadid

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date threadid
sourceip

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date threadid
sourceip
uniqueidgeneratedbyproxy

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date threadid
sourceip
apprequestid

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date threadid
sourceip
apprequestid dest.shard

2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
date threadid
sourceip
apprequestid dest.shard script

login-as
(read only, logged w/ reason for access)

sourcesofrestoredata
Hadoop
Backups

sourcesofrestoredata
Hadoop
Backups
DelayedSlaves

DelayedSlaves
pt-slave-delay watches a slave and starts and stops its replication SQL thread as
necessary to hold it
http://www.ﬂickr.com/photos/xploded/141295823/sizes/o/in/
photostream/

DelayedSlaves
role of the delayed slave
also source of BCP
(business continuity planning - prevention and recovery of threats)

4hourdelaybehindmaster
DelayedSlaves
also source of BCP

producerowbasedbinarylogs
DelayedSlaves
also source of BCP

producerowbasedbinarylogs
DelayedSlaves
allowforquickrecovery
also source of BCP

pt-slave-delay--daemonize
--pid/var/run/pt-slave-delay.pid--log/var/log/pt-slave-delay.log
--delay4h--interval1m--nocontinue
last 3 options most important,
4h delay, interval is how frequently it should check whether slave
should be started or stopped
nocontinue - don’t continue replication normally on exitx
user/pass eliminated for brevity

R/W R/W
Slave
ShardPair
pt-slave-delay

R/W R/W
Slave
ShardPair
pt-slave-delay
rowbasedbinlogs

R/W R/W
Slave
ShardPair
HDFS
Vertica
Parse/
Transform
in addition can use slaves to send data to other stores for offline queries
1)parse each binlog ﬁle to generate sequence ﬁle of row changes
2)apply the row changes to a previous set for the latest version

somethingbadhappens...
bad query is run (bad update, etc)
http://www.ﬂickr.com/photos/focalintent/1332072795/sizes/o/in/
photostream/

A B
Slave
BeforeRestoration....
master.info should be pointing to the right place
step 2 could be ﬂipping physical box (for faster recovery such as index
servers)

A B
Slave
1)stopdelayed
slavereplication
servers)

B
Slave
1)stopdelayed
slavereplication
2)pull
sideA A
servers)

B
Slave
3)stopmaster-masterreplication
1)stopdelayed
slavereplication
2)pull
sideA A
servers)

> SHOW SLAVE STATUS
Relay_Log_File: dbslave-relay.007178
Relay_Log_Pos: 8666654
ondelayedslave
get the relay position

mysql> show relaylog events in "dbslave-relay.007178"
from 8666654 limit 1G
*************************** 1. row *******************
Log_name: dbslave-relay.007178
Pos: 8666654
Event_type: Query
Server_id: 1016572
End_log_pos: 8666565
Info: use ètsy_shard`; /*
[CVmkWxhD7gsatX8hLbkDoHk29iKo] [etsy_shard_001_B] [/
your/activity/index.php] */ UPDATE `news_feed_stats`
SET `time_last_viewed` = 1366406780, ùpdate_time` =
1366406780 WHERE òwner_id` = 30793071 AND
òwner_type_id` = 2 AND `feed_type` = 'owner'
2 rows in set (0.00 sec)
ondelayedslave
show relaylog events will show statements from relay log
pass relay log and position to start

filterbadqueries
cycle through all the logs, analyze Query events
rotate events - next log file
last relay log points to binlog master
(server_id is masters, binlog coord matches master_log_file/pos)
http://www.flickr.com/photos/chriswaits/6607823843/sizes/l/in/
photostream/

B
Slave
AfterDelayedSlaveDataIsRestored....
A
servers)

B
Slave
1)stop
mysqlonA
andslave
A
servers)

B
Slave
1)stop
mysqlonA
andslave
2)copy
dataﬁles
toA
A
servers)

B
Slave
1)stop
mysqlonA
andslave
2)copy
dataﬁles
toA
3)restartBtoAreplication,
letAcatchuptoB
A
servers)

Slave
1)stop
mysqlonA
andslave
2)copy
dataﬁles
toA
3)restartBtoAreplication,
letAcatchuptoB
A
4)restartAtoBreplication,
putAbackin,thenpullB
A B
servers)

OtherFormsofRecovery
MigrateSingleObject(user/shop/etc)
HadoopDeltas
Backup+Binlogs
migrate object from delayed slave (similar to shard migration)
can generate deltas from hadoop
if delayed slave has “played” the bad data, go from last nights backup
(slower)

UseCases
what are some use cases?
http://www.ﬂickr.com/photos/seatbelt67/502255276/sizes/o/
in/photostream/

userreportsabug...
a user files a bug, i can trace the code for the exact page they're on right from my
dev machine

testing“dry”writes
testing how application runs a “dry” write --
r/o mode, exception is thrown with the exact query it would have attempted to run,
the values it tried to use, etc.

searchadscampaign
consistency
starting campaigns and maintaining consistency for entire ad system is nearly impossible in
dev
Search ads data is stored in more than a dozen DB tables and state changes are driven by a
combination of browsers triggering ads,
sellers managing their campaigns, and a slew of crons running anywhere from once per 5 minutes
to once a month
eg) to test pausing campaigns that run out of money mid-day,
can pull large numbers of campaigns from prod and operate on those to verify that the data will
still be consistent

googleproductlistingads
GPLA is where we syndicate our listings to google to be used in google product search ads
we can test edge cases in GPLA syndication where it would be difficult to recreate the
state in dev

testingprototypes
features like similar items search gives better results in production because of the
amount of data,
allowed us to test the quality of listings a prototype was displaying

performancetesting
need a real data set to test pages like treasury search with lots of threads/avatars/etc
the dev data is too sparse, xhprof traces don’t mean anything, missing avatars change
perf characteristics

hadoopgenerated
datasets
dataset produced from hadoop (recommendations for users, or statistics
about usage)
but since hadoop is prod data its for prod users/listings/shops, so have to
check against prod
--- sync to dev would fill dev dbs and data wouldn’t line up (b/c prod data)

browseslices
browse slices have complex population so its easier to test experiment against prod data

not enough listings to populate the narrower subcategories, and it just takes too long

ThankYou
etsy.com/jobs
We’re hiring

Crossing the Production Barrier: Development at Scale

Recommandé

Recommandé

Contenu connexe

Similaire à Crossing the Production Barrier: Development at Scale

Similaire à Crossing the Production Barrier: Development at Scale (20)

Dernier

Dernier (20)

Crossing the Production Barrier: Development at Scale