SlideShare une entreprise Scribd logo
1  sur  52
Replacing	
  Datacenter	
  Oracle	
  with	
  
Global	
  Apache	
  Cassandra	
  on	
  AWS	
  
                 July	
  11,	
  2011	
  
                Adrian	
  Cockcro4	
  
               @adrianco	
  #ne8lixcloud	
  
       h;p://www.linkedin.com/in/adriancockcro4	
  
Ne8lix	
  Inc.	
  
     With	
  more	
  than	
  23	
  million	
  subscribers	
  in	
  the	
  United	
  
     States	
  and	
  Canada,	
  Ne9lix,	
  Inc.	
  is	
  the	
  world’s	
  leading	
  
     Internet	
  subscripAon	
  service	
  for	
  enjoying	
  movies	
  and	
  
                                      TV	
  shows.	
  
                                             	
  
                           InternaAonal	
  Expansion	
  
     We	
  plan	
  to	
  expand	
  into	
  an	
  addiAonal	
  market	
  in	
  the	
  
     second	
  half	
  of	
  2011…	
  If	
  the	
  second	
  market	
  meets	
  our	
  
     expectaAons…	
  we	
  will	
  conAnue	
  to	
  invest	
  and	
  expand	
  
                              aggressively	
  in	
  2012.	
  
Source:	
  h;p://ir.ne8lix.com	
  
Building	
  a	
  Global	
  Ne8lix	
  Service	
  
                      Ne8lix	
  Cloud	
  MigraKon	
  
                  Data	
  MigraKon	
  to	
  Cassandra	
  
       Highly	
  Available	
  and	
  Globally	
  Distributed	
  Data	
  
             Backups	
  and	
  Archives	
  in	
  the	
  Cloud	
  
                      Monitoring	
  Cassandra	
  
                 ContribuKons	
  and	
  OrganizaKon	
  
Why	
  Use	
  Public	
  Cloud?	
  
FricKonless	
  Deployment	
  	
  
         (JFDI)	
  
Things	
  We	
  Don’t	
  Do	
  
Be;er	
  Business	
  Agility	
  
Data	
  Center	
                  Ne8lix	
  could	
  not	
  
                                     build	
  new	
  
                                  datacenters	
  fast	
  
                                      enough	
  

  Capacity	
  growth	
  is	
  acceleraKng,	
  unpredictable	
  
  Product	
  launch	
  spikes	
  -­‐	
  iPhone,	
  Wii,	
  PS3,	
  XBox	
  
23	
  Million	
  Customers	
  
                       2011-­‐Q1	
  year/year	
  customers	
  +69%	
  	
  
  25	
  


  20	
  


   15	
  


   10	
  


     5	
  


      0	
                                                                                 2011Q1	
  
                                                2010Q1	
   2010Q2	
   2010Q3	
   2010Q4	
  
               2009Q2	
   2009Q3	
   2009Q4	
  

Source:	
  h;p://ir.ne8lix.com	
  
Out-­‐Growing	
  Data	
  Center	
  
             h;p://techblog.ne8lix.com/2011/02/redesigning-­‐ne8lix-­‐api.html   	
  


                               37x	
  Growth	
  Jan	
  
                               2010-­‐Jan	
  2011	
  


Datacenter	
  
Capacity	
  
Ne8lix.com	
  is	
  now	
  ~100%	
  Cloud	
  

   Account	
  sign-­‐up	
  is	
  currently	
  being	
  moved	
  to	
  cloud	
  
        All	
  internaKonal	
  product	
  is	
  cloud	
  based	
  
    USA	
  specific	
  logisKcs	
  remains	
  in	
  the	
  Datacenter	
  	
  
Ne8lix	
  Choice	
  was	
  AWS	
  with	
  our	
  
   own	
  pla8orm	
  and	
  tools	
  
     Unique	
  pla8orm	
  requirements	
  and	
  
       extreme	
  agility	
  and	
  flexibility	
  
Leverage	
  AWS	
  Scale	
  
   “the	
  biggest	
  public	
  cloud”	
  
 AWS	
  investment	
  in	
  features	
  and	
  automaKon	
  
Use	
  AWS	
  zones	
  and	
  regions	
  for	
  high	
  availability,	
  
         scalability	
  and	
  global	
  deployment	
  
We	
  want	
  to	
  use	
  clouds,	
  
we	
  don’t	
  have	
  Kme	
  to	
  build	
  them	
  
                  Public	
  cloud	
  for	
  agility	
  and	
  scale	
  
 AWS	
  because	
  they	
  are	
  big	
  enough	
  to	
  allocate	
  thousands	
  
           of	
  instances	
  per	
  hour	
  when	
  we	
  need	
  to	
  
Ne8lix	
  Deployed	
  on	
  AWS	
  

Content	
            Logs	
             Play	
          WWW	
             API	
  
    Video	
  
                           S3	
            DRM	
          Sign-­‐Up	
     Metadata	
  
   Masters	
  


                        EMR	
              CDN	
                            Device	
  
     EC2	
                                                Search	
  
                       Hadoop	
           rouKng	
                          Config	
  


                                                          Movie	
         TV	
  Movie	
  
      S3	
               Hive	
         Bookmarks	
  
                                                         Choosing	
       Choosing	
  

                       Business	
                                          Mobile	
  
     CDN	
                               Logging	
        RaKngs	
  
                     Intelligence	
                                        iPhone	
  
Port	
  to	
  Cloud	
  Architecture	
  

Short	
  term	
  investment,	
  long	
  term	
  payback!	
  
            Pay	
  down	
  technical	
  debt	
  
                   Robust	
  pa;erns	
  
TransiKon	
  
•  The	
  Goals	
  
       –  Faster,	
  Scalable,	
  Available	
  and	
  ProducKve	
  


•  AnK-­‐pa;erns	
  and	
  Cloud	
  Architecture	
  
       –  The	
  things	
  we	
  wanted	
  to	
  change	
  and	
  why	
  


•  Data	
  MigraKon	
  
       –  Minimizing	
  datacenter	
  dependencies	
  
	
  
Datacenter	
  AnK-­‐Pa;erns	
  

 What	
  do	
  we	
  currently	
  do	
  in	
  the	
  
datacenter	
  that	
  prevents	
  us	
  from	
  
         meeKng	
  our	
  goals?	
  
                       	
  
Old	
  Datacenter	
  vs.	
  New	
  Cloud	
  Arch	
  
    Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

 SKcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

       Cha;y	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

 Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

     Instrumented	
  Code	
              Instrumented	
  Service	
  Pa;erns	
  

    Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

  Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
The	
  Central	
  SQL	
  Database	
  
•  Datacenter	
  has	
  central	
  Oracle	
  databases	
  
   –  Everything	
  in	
  one	
  place	
  is	
  convenient	
  unKl	
  it	
  fails	
  
   –  Customers,	
  movies,	
  history,	
  configuraKon	
  
•  Schema	
  changes	
  require	
  downKme	
  
                              	
  
    AnA-­‐paOern	
  impacts	
  scalability,	
  availability	
  
The	
  Distributed	
  Key-­‐Value	
  Store	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
                     DBA	
  
•  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downKme	
  
•  Latency	
  for	
  typical	
  queries	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Cassandra	
  replicaKon	
  takes	
  a	
  few	
  milliseconds	
  
    –  Oracle	
  for	
  simple	
  queries	
  is	
  a	
  few	
  milliseconds	
  
    –  SimpleDB	
  replicaKon	
  and	
  REST	
  auth	
  overheads	
  >10ms	
  
Data	
  MigraKon	
  to	
  Cassandra	
  
TransiKonal	
  Steps	
  
•  BidirecKonal	
  ReplicaKon	
  
   –  Oracle	
  to	
  SimpleDB	
  
   –  Queued	
  reverse	
  path	
  using	
  SQS	
  
   –  Backups	
  remain	
  in	
  Datacenter	
  via	
  Oracle	
  
•  New	
  Cloud-­‐Only	
  Data	
  Sources	
  
   –  Cassandra	
  based	
  
   –  No	
  replicaKon	
  to	
  Datacenter	
  
   –  Backups	
  performed	
  in	
  the	
  cloud	
  
API	
  
AWS	
  EC2	
  
                                            Front	
  End	
  Load	
  Balancer	
  
             Discovery	
  
              Service	
                               API	
  Proxy	
                              API	
  etc.	
  

                                                   Load	
  Balancer	
  


          Component	
                                      API	
               SQS	
  
           Services	
                                                                           Oracl
                                                                                                 e	
  
                                                                                                 Oracle	
  
                                                                                                       Oracle	
  
Cassandra	
             memcached	
                                            ReplicaKon	
  
                                                            memcached	
  
           EC2	
  
         Internal	
  
           Disks	
  

                                                                                                Ne=lix	
  
                                   S3	
                                                         Data	
  Center	
  
                                                                         SimpleDB	
  
Cuvng	
  the	
  Umbilical	
  
•  TransiKon	
  Oracle	
  Data	
  Sources	
  to	
  Cassandra	
  
    –  Offload	
  Datacenter	
  Oracle	
  hardware	
  
    –  Free	
  up	
  capacity	
  for	
  growth	
  of	
  remaining	
  services	
  
•  TransiKon	
  SimpleDB+Memcached	
  to	
  Cassandra	
  
    –  Primary	
  data	
  sources	
  that	
  need	
  backup	
  
    –  Keep	
  simple	
  use	
  cases	
  like	
  configuraKon	
  service	
  
•  New	
  challenges	
  
    –  Backup,	
  restore,	
  archive,	
  business	
  conKnuity	
  
    –  Business	
  Intelligence	
  integraKon	
  
API	
  
AWS	
  EC2	
  
                                   Front	
  End	
  Load	
  Balancer	
  
            Discovery	
  
             Service	
                        API	
  Proxy	
  

                                          Load	
  Balancer	
  


          Component	
                             API	
  
           Services	
  



                 memcached	
                  Cassandra	
  
                                                              EC2	
  
                                                            Internal	
  
                                                              Disks	
  

                                 Backup	
  
                   S3	
  
                                                                           SimpleDB	
  
High	
  Availability	
  
•  Cassandra	
  stores	
  3	
  local	
  copies,	
  1	
  per	
  zone	
  
       –  Synchronous	
  access,	
  durable,	
  highly	
  available	
  
       –  Read/Write	
  One	
  fastest,	
  least	
  consistent	
  -­‐	
  ~1ms	
  
       –  Read/Write	
  Quorum	
  2	
  of	
  3,	
  consistent	
  -­‐	
  ~3ms	
  
•  AWS	
  Availability	
  Zones	
  
       –  Separate	
  buildings	
  
       –  Separate	
  power	
  etc.	
  
       –  Close	
  together	
  
	
  
Remote	
  Copies	
  
•  Cassandra	
  duplicates	
  across	
  AWS	
  regions	
  
    –  Asynchronous	
  write,	
  replicates	
  at	
  desKnaKon	
  
    –  Doesn’t	
  directly	
  affect	
  local	
  read/write	
  latency	
  
•  Global	
  Coverage	
  
    –  Business	
  agility	
  
    –  Follow	
  AWS…	
  
•  Local	
  Access	
                                        3
                                                        3
    –  Be;er	
  latency	
               3
                                                                            3
    –  Fault	
  IsolaKon	
  
    	
  
Cassandra	
  Backup	
  
•  Full	
  Backup	
                                                                             Cassandra	
  



    –  Cron	
  on	
  each	
  node	
  
                                                                         Cassandra	
                                   Cassandra	
  




    –  Snapshot	
  -­‐>	
  tar.gz	
  -­‐>	
  S3	
        Cassandra	
                                                                   Cassandra	
  



•  Incremental	
                                                                                  S3	
  

    –  SSTable	
  write	
  triggers	
                 Cassandra	
  
                                                                                                Backup	
  
                                                                                                                                          Cassandra	
  


       copy	
  to	
  S3	
  
•  ConKnuous	
                                                  Cassandra	
                                                     Cassandra	
  




    –  Scrape	
  commit	
  log	
                                                    Cassandra	
             Cassandra	
  



    –  Write	
  to	
  EBS	
  every	
  30s	
  
Cassandra	
  Restore	
  
•  Full	
  Restore	
                                                               Cassandra	
  

                                                            Cassandra	
                                   Cassandra	
  

    –  Replace	
  previous	
  data	
  
•  New	
  Ring	
  from	
  Backup	
          Cassandra	
                                                                   Cassandra	
  




    –  New	
  name	
  old	
  data	
                                                  S3	
  
                                                                                   Backup	
  
                                         Cassandra	
                                                                         Cassandra	
  

    –  One	
  line	
  command!	
  
                                                   Cassandra	
                                                     Cassandra	
  



                                                                       Cassandra	
             Cassandra	
  
Cassandra	
  Data	
  ExtracKon	
  
•  Business	
  Intelligence	
                                   Brisk	
  
                                                  Brisk	
                       Brisk	
  
   –  Re-­‐normalize	
  data	
  
      using	
  Hadoop	
  job	
        Brisk	
                                               Brisk	
  
•  Daily	
  ExtracKon	
  
                                                                  S3	
  
   –  Create	
  Brisk	
  ring	
     Brisk	
  
                                                                Backup	
  
                                                                                              Brisk	
  
   –  Extract	
  backup	
  
   –  Run	
  Hadoop	
  job	
              Brisk	
                                     Brisk	
  
   –  Remove	
  Brisk	
  ring	
                          Brisk	
         Brisk	
  
   –  Under	
  1hr…	
  
Cassandra	
  Online	
  BI	
  
•  Intra-­‐Day	
  ExtracKon	
                                                           Cassandra	
  

                                                                Brisk	
                                        Cassandra	
  

   –  Use	
  split	
  Brisk	
  ring	
  
   –  Size	
  each	
  separately	
              Brisk	
                                                                        Cassandra	
  




   –  Hourly	
  Hadoop	
  job	
                                                           S3	
  
                                                                                        Backup	
  
                                          Cassandra	
                                                                             Cassandra	
  




                                                    Cassandra	
                                                         Cassandra	
  



                                                                            Cassandra	
             Cassandra	
  
Cassandra	
  Archive	
  
                Appropriate	
  level	
  of	
  paranoia	
  needed…       	
  
•  Archive	
  could	
  be	
  un-­‐readable	
  
    –  Base	
  on	
  restored	
  S3	
  backup	
  and	
  BI	
  extracted	
  data	
  
•  Archive	
  could	
  be	
  stolen	
  
    –  Encrypt	
  archive	
  
•  AWS	
  East	
  Region	
  could	
  have	
  a	
  problem	
  
    –  Copy	
  data	
  to	
  AWS	
  West	
  
•  ProducKon	
  AWS	
  Account	
  could	
  have	
  an	
  issue	
  
    –  Separate	
  Archive	
  account	
  with	
  no-­‐delete	
  S3	
  ACL	
  
•  AWS	
  S3	
  could	
  have	
  a	
  global	
  problem	
  
    –  Create	
  an	
  extra	
  copy	
  on	
  a	
  different	
  cloud	
  vendor	
  
Tools	
  and	
  AutomaKon	
  
•  Developer	
  and	
  Build	
  Tools	
  
      –  Jira,	
  Perforce,	
  Eclipse,	
  Jenkins,	
  Ivy,	
  ArKfactory	
  
      –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  

•  Custom	
  Ne8lix	
  ApplicaKon	
  Console	
  
      –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
      –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producKon	
  

•  Open	
  Source	
  +	
  Support	
  
      –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop,	
  OpenJDK,	
  CentOS	
  
      –  Datastax	
  support	
  for	
  Cassandra,	
  AWS	
  support	
  for	
  Hadoop	
  via	
  EMR	
  

•  Monitoring	
  Tools	
  
      –  Datastax	
  Opscenter	
  for	
  monitoring	
  Cassandra	
  
      –  AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  h;p://appdynamics.com	
  
Developer	
  MigraKon	
  
•  Detailed	
  SQL	
  to	
  NoSQL	
  TransiKon	
  Advice	
  
   –  Sid	
  Anand	
  	
  -­‐	
  QConSF	
  Nov	
  5th	
  –	
  Ne8lix’	
  TransiKon	
  
      to	
  High	
  Availability	
  Storage	
  Systems	
  
   –  Blog	
  -­‐	
  h;p://pracKcalcloudcompuKng.com/	
  
   –  Download	
  Paper	
  PDF	
  -­‐	
  h;p://bit.ly/bhOTLu	
  
•  Mark	
  Atwood,	
  "Guide	
  to	
  NoSQL,	
  redux”	
  
   –  YouTube	
  h;p://youtu.be/zAbFRiyT3LU	
  
Cloud	
  OperaKons	
  
     Cassandra	
  Use	
  Cases	
  
  Model	
  Driven	
  Architecture	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
        Chaos	
  Monkey	
  
Cassandra	
  Use	
  Cases	
  
•  Key	
  by	
  Customer	
  
    –  Several	
  separate	
  Cassandra	
  rings,	
  read-­‐intensive	
  
    –  Sized	
  to	
  fit	
  in	
  memory	
  using	
  m2.4xl	
  Instances	
  
•  Key	
  by	
  Customer:Movie	
  –	
  e.g.	
  Viewing	
  History	
  
    –  Growing	
  fast,	
  write	
  intensive	
  –	
  m1.xl	
  instances	
  
    –  Sized	
  to	
  hold	
  hot	
  data	
  in	
  memory	
  only	
  
•  Large	
  scale	
  data	
  logging	
  –	
  lots	
  of	
  writes	
  
    –  Column	
  data	
  expires	
  a4er	
  Kme	
  period	
  
    –  Working	
  on	
  using	
  distributed	
  counters…	
  
Model	
  Driven	
  Architecture	
  
•  Datacenter	
  PracKces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  pa;erns	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jenkins	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producKon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaKon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

                       Every	
  change	
  is	
  a	
  new	
  AMI	
  
Ne8lix	
  Pla8orm	
  Cassandra	
  AMI	
  
•  Tomcat	
  server	
  
   –  Always	
  running,	
  registers	
  with	
  pla8orm	
  
   –  Manages	
  Cassandra	
  state,	
  tokens,	
  backups	
  
•  SimpleDB	
  configuraKon	
  
   –  Stores	
  token	
  slots	
  and	
  opKons	
  
   –  Avoids	
  circular	
  “bootstrap	
  problems”	
  
•  Removed	
  Root	
  Disk	
  Dependency	
  on	
  EBS	
  
   –  Use	
  S3	
  backed	
  AMI	
  for	
  stateful	
  services	
  
   –  Normally	
  use	
  EBS	
  backed	
  AMI	
  for	
  fast	
  provisioning	
  
Ne8lix	
  App	
  Console	
  
Auto	
  Scale	
  Group	
  ConfiguraKon	
  
Chaos	
  Monkey	
  
•  Make	
  sure	
  systems	
  are	
  resilient	
  
    –  Allow	
  any	
  instance	
  to	
  fail	
  without	
  customer	
  impact	
  
•  Chaos	
  Monkey	
  hours	
  
    –  Monday-­‐Thursday	
  9am-­‐3pm	
  random	
  instance	
  kill	
  
•  ApplicaKon	
  configuraKon	
  opKon	
  
    –  Apps	
  now	
  have	
  to	
  opt-­‐out	
  from	
  Chaos	
  Monkey	
  
•  Computers	
  (Datacenter	
  or	
  AWS)	
  randomly	
  die	
  
    –  Fact	
  of	
  life,	
  but	
  too	
  infrequent	
  to	
  test	
  resiliency	
  
Capacity	
  Planning	
  &	
  Monitoring	
  
Capacity	
  Planning	
  in	
  Clouds	
  
                     (a	
  few	
  things	
  have	
  changed…)	
  

•    Capacity	
  is	
  expensive	
  
•    Capacity	
  takes	
  Kme	
  to	
  buy	
  and	
  provision	
  
•    Capacity	
  only	
  increases,	
  can’t	
  be	
  shrunk	
  easily	
  
•    Capacity	
  comes	
  in	
  big	
  chunks,	
  paid	
  up	
  front	
  
•    Planning	
  errors	
  can	
  cause	
  big	
  problems	
  
•    Systems	
  are	
  clearly	
  defined	
  assets	
  
•    Systems	
  can	
  be	
  instrumented	
  in	
  detail	
  
•    Depreciate	
  assets	
  over	
  3	
  years	
  (reservaKons!)	
  
Data	
  Sources	
  
                                      • External	
  URL	
  availability	
  and	
  latency	
  alerts	
  and	
  reports	
  –	
  Keynote	
  
     External	
  TesKng	
             • Stress	
  tesKng	
  -­‐	
  SOASTA	
  

                                      • Ne8lix	
  REST	
  calls	
  –	
  Chukwa	
  to	
  DataOven	
  with	
  GUID	
  transacKon	
  idenKfier	
  
 Request	
  Trace	
  Logging	
        • Generic	
  HTTP	
  –	
  AppDynamics	
  service	
  Ker	
  aggregaKon,	
  end	
  to	
  end	
  tracking	
  

                                      • Tracers	
  and	
  counters	
  –	
  log4j,	
  tracer	
  central,	
  Chukwa	
  to	
  DataOven	
  
   ApplicaKon	
  logging	
            • Trackid	
  and	
  Audit/Debug	
  logging	
  –	
  DataOven,	
  Appdynamics	
  	
  GUID	
  cross	
  reference	
  

                                      • ApplicaKon	
  specific	
  real	
  Kme	
  –	
  Datastax	
  Opscenter,	
  Appdynamics	
  
        JMX	
  	
  Metrics	
          • Service	
  and	
  SLA	
  percenKles	
  –	
  Appdynamics,	
  Epic	
  logged	
  to	
  DataOven	
  

                                      • Stdout	
  logs	
  –	
  S3	
  –	
  DataOven	
  
Tomcat	
  and	
  Apache	
  logs	
     • Standard	
  format	
  Access	
  and	
  Error	
  logs	
  –	
  S3	
  –	
  DataOven	
  

                                      • Garbage	
  CollecKon	
  –	
  Appdynamics	
  
               JVM	
                  • Memory	
  usage,	
  call	
  stacks,	
  resource/call	
  -­‐	
  AppDynamics	
  

                                      • system	
  CPU/Net/RAM/Disk	
  metrics	
  –	
  AppDynamics	
  
              Linux	
                 • SNMP	
  metrics	
  –	
  Epic,	
  Network	
  flows	
  –	
  boundary.com	
  

                                      • Load	
  balancer	
  traffic	
  –	
  Amazon	
  Cloudwatch,	
  SimpleDB	
  usage	
  stats	
  
              AWS	
                   • System	
  configuraKon	
  	
  -­‐	
  CPU	
  count/speed	
  and	
  RAM	
  size,	
  overall	
  usage	
  -­‐	
  AWS	
  
AppDynamics	
  
        How	
  to	
  look	
  deep	
  inside	
  your	
  cloud	
  applicaKons	
  

•  AutomaKc	
  Monitoring	
  
   –  Base	
  AMI	
  bakes	
  in	
  all	
  monitoring	
  tools	
  
   –  Outbound	
  calls	
  only	
  –	
  no	
  discovery/polling	
  issues	
  
   –  InacKve	
  instances	
  removed	
  a4er	
  a	
  few	
  days	
  
   	
  
•  Incident	
  Alarms	
  (deviaKon	
  from	
  baseline)	
  
   –  Business	
  TransacKon	
  latency	
  and	
  error	
  rate	
  
   –  Alarm	
  thresholds	
  discover	
  their	
  own	
  baseline	
  
   –  Email	
  contains	
  URL	
  to	
  Incident	
  Workbench	
  UI	
  
AppDynamics	
  Monitoring	
  of	
  Cassandra	
  –	
  AutomaKc	
  Discovery	
  
DataStax	
  OpsCenter	
  
Ne8lix	
  ContribuKons	
  to	
  Cassandra	
  
•  Cassandra	
  as	
  a	
  mutable	
  toolkit	
  
    –  Cassandra	
  is	
  in	
  Java,	
  pluggable,	
  well	
  structured	
  
    –  Ne8lix	
  has	
  a	
  building	
  full	
  of	
  Java	
  engineers….	
  
•  Actual	
  ContribuKons	
  delivered	
  in	
  0.8	
  
    –  First	
  prototype	
  of	
  off-­‐heap	
  row	
  cache	
  (Vijay)	
  
    –  Incremental	
  backup	
  SSTable	
  write	
  callback	
  
•  Work	
  In	
  Progress	
  
    –  AWS	
  integraKon	
  and	
  backup	
  using	
  Tomcat	
  helper	
  
    –  Total	
  re-­‐write	
  of	
  Hector	
  Java	
  client	
  library	
  (Eran)	
  
Ne8lix	
  “NoOps”	
  OrganizaKon	
  
MarkeKng	
  &	
  AdverKsing	
  Site	
                      Member	
  Site	
  PersonalizaKon	
  
 for	
  Customer	
  AcquisiKon	
                            for	
  Customer	
  RetenKon	
  

 Cloud	
  Ops	
                       Build	
  Tools	
  
                     Database	
                              Pla8orm	
          Cloud	
         Cloud	
  
 Reliability	
                            and	
  
                    Engineering	
                          Development	
     Performance	
     SoluKons	
  
Engineering	
                         AutomaKon	
  



                                        Perforce	
  
 Cassandra	
        Cassandra	
                             Cassandra	
       Cassandra	
      Cassandra	
  
                                         Jenkins	
  




    AWS	
               AWS	
              AWS	
               AWS	
             AWS	
            AWS	
  
Takeaway	
  
                              	
  
Ne9lix	
  is	
  using	
  Cassandra	
  on	
  AWS	
  as	
  a	
  key	
  	
  
  infrastructure	
  component	
  of	
  its	
  globally	
  
          distributed	
  streaming	
  product.	
  
                              	
  
       h;p://www.linkedin.com/in/adriancockcro4	
  
               @adrianco	
  #ne8lixcloud	
  
Amazon Cloud Terminology Reference
     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaKon	
  code)	
  
•    EC2	
  –	
  ElasKc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraKons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosKng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Availability	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booKng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (h;p	
  access)	
  
•    EBS	
  –	
  ElasKc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDS	
  –	
  RelaKonal	
  Database	
  Service	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  h;p	
  based	
  NoSQL	
  data	
  store)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (h;p	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoKficaKon	
  Service	
  (h;p	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasKc	
  Map	
  Reduce	
  (automaKcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasKc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasKc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (extension	
  of	
  enterprise	
  datacenter	
  network	
  into	
  cloud)	
  
•    IAM	
  –	
  IdenKty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  

Contenu connexe

Tendances

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018Cloudera Japan
 
EXACC Presentat CHEUG 2019 (9).pptx
EXACC Presentat CHEUG 2019 (9).pptxEXACC Presentat CHEUG 2019 (9).pptx
EXACC Presentat CHEUG 2019 (9).pptxabdulhafeezkalsekar1
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to CassandraUmair Mansoob
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsKaran Singh
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Amazon Web Services
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for PerformanceScyllaDB
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on ExadataAnil Nair
 
Oracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting DisksOracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting DisksMarkus Michalewicz
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engineInfluxData
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Web Services
 

Tendances (20)

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
 
EXACC Presentat CHEUG 2019 (9).pptx
EXACC Presentat CHEUG 2019 (9).pptxEXACC Presentat CHEUG 2019 (9).pptx
EXACC Presentat CHEUG 2019 (9).pptx
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Migrating Oracle database to Cassandra
Migrating Oracle database to CassandraMigrating Oracle database to Cassandra
Migrating Oracle database to Cassandra
 
Amazon RDS Deep Dive
Amazon RDS Deep DiveAmazon RDS Deep Dive
Amazon RDS Deep Dive
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Scaling for Performance
Scaling for PerformanceScaling for Performance
Scaling for Performance
 
Automated master failover
Automated master failoverAutomated master failover
Automated master failover
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
 
Oracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting DisksOracle Clusterware Node Management and Voting Disks
Oracle Clusterware Node Management and Voting Disks
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 

En vedette

Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query languageCourtney Robinson
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 

En vedette (8)

Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Cql – cassandra query language
Cql – cassandra query languageCql – cassandra query language
Cql – cassandra query language
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 

Similaire à Migrating Netflix from Datacenter Oracle to Global Cassandra

Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012CLOUDIAN KK
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
KT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk AhnKT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk AhnHui Cheng
 
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerScality
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSconfluent
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...HostedbyConfluent
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
클라우드 시대 완벽한 데이터 관리 방법
클라우드 시대 완벽한 데이터 관리 방법 클라우드 시대 완벽한 데이터 관리 방법
클라우드 시대 완벽한 데이터 관리 방법 오라클 클라우드
 

Similaire à Migrating Netflix from Datacenter Oracle to Global Cassandra (20)

Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
KT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk AhnKT ucloud storage, by Jaesuk Ahn
KT ucloud storage, by Jaesuk Ahn
 
Am 02 osac_kt_swift
Am 02 osac_kt_swiftAm 02 osac_kt_swift
Am 02 osac_kt_swift
 
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
클라우드 시대 완벽한 데이터 관리 방법
클라우드 시대 완벽한 데이터 관리 방법 클라우드 시대 완벽한 데이터 관리 방법
클라우드 시대 완벽한 데이터 관리 방법
 
OCI Overview
OCI OverviewOCI Overview
OCI Overview
 

Plus de Adrian Cockcroft

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 

Plus de Adrian Cockcroft (20)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 

Dernier

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 

Dernier (20)

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 

Migrating Netflix from Datacenter Oracle to Global Cassandra

  • 1. Replacing  Datacenter  Oracle  with   Global  Apache  Cassandra  on  AWS   July  11,  2011   Adrian  Cockcro4   @adrianco  #ne8lixcloud   h;p://www.linkedin.com/in/adriancockcro4  
  • 2. Ne8lix  Inc.   With  more  than  23  million  subscribers  in  the  United   States  and  Canada,  Ne9lix,  Inc.  is  the  world’s  leading   Internet  subscripAon  service  for  enjoying  movies  and   TV  shows.     InternaAonal  Expansion   We  plan  to  expand  into  an  addiAonal  market  in  the   second  half  of  2011…  If  the  second  market  meets  our   expectaAons…  we  will  conAnue  to  invest  and  expand   aggressively  in  2012.   Source:  h;p://ir.ne8lix.com  
  • 3. Building  a  Global  Ne8lix  Service   Ne8lix  Cloud  MigraKon   Data  MigraKon  to  Cassandra   Highly  Available  and  Globally  Distributed  Data   Backups  and  Archives  in  the  Cloud   Monitoring  Cassandra   ContribuKons  and  OrganizaKon  
  • 4. Why  Use  Public  Cloud?  
  • 8. Data  Center   Ne8lix  could  not   build  new   datacenters  fast   enough   Capacity  growth  is  acceleraKng,  unpredictable   Product  launch  spikes  -­‐  iPhone,  Wii,  PS3,  XBox  
  • 9. 23  Million  Customers   2011-­‐Q1  year/year  customers  +69%     25   20   15   10   5   0   2011Q1   2010Q1   2010Q2   2010Q3   2010Q4   2009Q2   2009Q3   2009Q4   Source:  h;p://ir.ne8lix.com  
  • 10. Out-­‐Growing  Data  Center   h;p://techblog.ne8lix.com/2011/02/redesigning-­‐ne8lix-­‐api.html   37x  Growth  Jan   2010-­‐Jan  2011   Datacenter   Capacity  
  • 11. Ne8lix.com  is  now  ~100%  Cloud   Account  sign-­‐up  is  currently  being  moved  to  cloud   All  internaKonal  product  is  cloud  based   USA  specific  logisKcs  remains  in  the  Datacenter    
  • 12. Ne8lix  Choice  was  AWS  with  our   own  pla8orm  and  tools   Unique  pla8orm  requirements  and   extreme  agility  and  flexibility  
  • 13. Leverage  AWS  Scale   “the  biggest  public  cloud”   AWS  investment  in  features  and  automaKon   Use  AWS  zones  and  regions  for  high  availability,   scalability  and  global  deployment  
  • 14. We  want  to  use  clouds,   we  don’t  have  Kme  to  build  them   Public  cloud  for  agility  and  scale   AWS  because  they  are  big  enough  to  allocate  thousands   of  instances  per  hour  when  we  need  to  
  • 15. Ne8lix  Deployed  on  AWS   Content   Logs   Play   WWW   API   Video   S3   DRM   Sign-­‐Up   Metadata   Masters   EMR   CDN   Device   EC2   Search   Hadoop   rouKng   Config   Movie   TV  Movie   S3   Hive   Bookmarks   Choosing   Choosing   Business   Mobile   CDN   Logging   RaKngs   Intelligence   iPhone  
  • 16. Port  to  Cloud  Architecture   Short  term  investment,  long  term  payback!   Pay  down  technical  debt   Robust  pa;erns  
  • 17. TransiKon   •  The  Goals   –  Faster,  Scalable,  Available  and  ProducKve   •  AnK-­‐pa;erns  and  Cloud  Architecture   –  The  things  we  wanted  to  change  and  why   •  Data  MigraKon   –  Minimizing  datacenter  dependencies    
  • 18. Datacenter  AnK-­‐Pa;erns   What  do  we  currently  do  in  the   datacenter  that  prevents  us  from   meeKng  our  goals?    
  • 19. Old  Datacenter  vs.  New  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SKcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha;y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa;erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 20. The  Central  SQL  Database   •  Datacenter  has  central  Oracle  databases   –  Everything  in  one  place  is  convenient  unKl  it  fails   –  Customers,  movies,  history,  configuraKon   •  Schema  changes  require  downKme     AnA-­‐paOern  impacts  scalability,  availability  
  • 21. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   –  Joins  take  place  in  java  code   DBA   •  No  schema  to  change,  no  scheduled  downKme   •  Latency  for  typical  queries   –  Memcached  is  dominated  by  network  latency  <1ms   –  Cassandra  replicaKon  takes  a  few  milliseconds   –  Oracle  for  simple  queries  is  a  few  milliseconds   –  SimpleDB  replicaKon  and  REST  auth  overheads  >10ms  
  • 22. Data  MigraKon  to  Cassandra  
  • 23. TransiKonal  Steps   •  BidirecKonal  ReplicaKon   –  Oracle  to  SimpleDB   –  Queued  reverse  path  using  SQS   –  Backups  remain  in  Datacenter  via  Oracle   •  New  Cloud-­‐Only  Data  Sources   –  Cassandra  based   –  No  replicaKon  to  Datacenter   –  Backups  performed  in  the  cloud  
  • 24. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   API  etc.   Load  Balancer   Component   API   SQS   Services   Oracl e   Oracle   Oracle   Cassandra   memcached   ReplicaKon   memcached   EC2   Internal   Disks   Ne=lix   S3   Data  Center   SimpleDB  
  • 25. Cuvng  the  Umbilical   •  TransiKon  Oracle  Data  Sources  to  Cassandra   –  Offload  Datacenter  Oracle  hardware   –  Free  up  capacity  for  growth  of  remaining  services   •  TransiKon  SimpleDB+Memcached  to  Cassandra   –  Primary  data  sources  that  need  backup   –  Keep  simple  use  cases  like  configuraKon  service   •  New  challenges   –  Backup,  restore,  archive,  business  conKnuity   –  Business  Intelligence  integraKon  
  • 26. API   AWS  EC2   Front  End  Load  Balancer   Discovery   Service   API  Proxy   Load  Balancer   Component   API   Services   memcached   Cassandra   EC2   Internal   Disks   Backup   S3   SimpleDB  
  • 27. High  Availability   •  Cassandra  stores  3  local  copies,  1  per  zone   –  Synchronous  access,  durable,  highly  available   –  Read/Write  One  fastest,  least  consistent  -­‐  ~1ms   –  Read/Write  Quorum  2  of  3,  consistent  -­‐  ~3ms   •  AWS  Availability  Zones   –  Separate  buildings   –  Separate  power  etc.   –  Close  together    
  • 28. Remote  Copies   •  Cassandra  duplicates  across  AWS  regions   –  Asynchronous  write,  replicates  at  desKnaKon   –  Doesn’t  directly  affect  local  read/write  latency   •  Global  Coverage   –  Business  agility   –  Follow  AWS…   •  Local  Access   3 3 –  Be;er  latency   3 3 –  Fault  IsolaKon    
  • 29. Cassandra  Backup   •  Full  Backup   Cassandra   –  Cron  on  each  node   Cassandra   Cassandra   –  Snapshot  -­‐>  tar.gz  -­‐>  S3   Cassandra   Cassandra   •  Incremental   S3   –  SSTable  write  triggers   Cassandra   Backup   Cassandra   copy  to  S3   •  ConKnuous   Cassandra   Cassandra   –  Scrape  commit  log   Cassandra   Cassandra   –  Write  to  EBS  every  30s  
  • 30. Cassandra  Restore   •  Full  Restore   Cassandra   Cassandra   Cassandra   –  Replace  previous  data   •  New  Ring  from  Backup   Cassandra   Cassandra   –  New  name  old  data   S3   Backup   Cassandra   Cassandra   –  One  line  command!   Cassandra   Cassandra   Cassandra   Cassandra  
  • 31. Cassandra  Data  ExtracKon   •  Business  Intelligence   Brisk   Brisk   Brisk   –  Re-­‐normalize  data   using  Hadoop  job   Brisk   Brisk   •  Daily  ExtracKon   S3   –  Create  Brisk  ring   Brisk   Backup   Brisk   –  Extract  backup   –  Run  Hadoop  job   Brisk   Brisk   –  Remove  Brisk  ring   Brisk   Brisk   –  Under  1hr…  
  • 32. Cassandra  Online  BI   •  Intra-­‐Day  ExtracKon   Cassandra   Brisk   Cassandra   –  Use  split  Brisk  ring   –  Size  each  separately   Brisk   Cassandra   –  Hourly  Hadoop  job   S3   Backup   Cassandra   Cassandra   Cassandra   Cassandra   Cassandra   Cassandra  
  • 33. Cassandra  Archive   Appropriate  level  of  paranoia  needed…   •  Archive  could  be  un-­‐readable   –  Base  on  restored  S3  backup  and  BI  extracted  data   •  Archive  could  be  stolen   –  Encrypt  archive   •  AWS  East  Region  could  have  a  problem   –  Copy  data  to  AWS  West   •  ProducKon  AWS  Account  could  have  an  issue   –  Separate  Archive  account  with  no-­‐delete  S3  ACL   •  AWS  S3  could  have  a  global  problem   –  Create  an  extra  copy  on  a  different  cloud  vendor  
  • 34. Tools  and  AutomaKon   •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jenkins,  Ivy,  ArKfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne8lix  ApplicaKon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producKon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop,  OpenJDK,  CentOS   –  Datastax  support  for  Cassandra,  AWS  support  for  Hadoop  via  EMR   •  Monitoring  Tools   –  Datastax  Opscenter  for  monitoring  Cassandra   –  AppDynamics  –  Developer  focus  for  cloud  h;p://appdynamics.com  
  • 35. Developer  MigraKon   •  Detailed  SQL  to  NoSQL  TransiKon  Advice   –  Sid  Anand    -­‐  QConSF  Nov  5th  –  Ne8lix’  TransiKon   to  High  Availability  Storage  Systems   –  Blog  -­‐  h;p://pracKcalcloudcompuKng.com/   –  Download  Paper  PDF  -­‐  h;p://bit.ly/bhOTLu   •  Mark  Atwood,  "Guide  to  NoSQL,  redux”   –  YouTube  h;p://youtu.be/zAbFRiyT3LU  
  • 36. Cloud  OperaKons   Cassandra  Use  Cases   Model  Driven  Architecture   Capacity  Planning  &  Monitoring   Chaos  Monkey  
  • 37. Cassandra  Use  Cases   •  Key  by  Customer   –  Several  separate  Cassandra  rings,  read-­‐intensive   –  Sized  to  fit  in  memory  using  m2.4xl  Instances   •  Key  by  Customer:Movie  –  e.g.  Viewing  History   –  Growing  fast,  write  intensive  –  m1.xl  instances   –  Sized  to  hold  hot  data  in  memory  only   •  Large  scale  data  logging  –  lots  of  writes   –  Column  data  expires  a4er  Kme  period   –  Working  on  using  distributed  counters…  
  • 38. Model  Driven  Architecture   •  Datacenter  PracKces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  pa;erns   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jenkins  based  builds  for  everything   –  Every  producKon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaKon  is  managed  by  an  Autoscaler   Every  change  is  a  new  AMI  
  • 39. Ne8lix  Pla8orm  Cassandra  AMI   •  Tomcat  server   –  Always  running,  registers  with  pla8orm   –  Manages  Cassandra  state,  tokens,  backups   •  SimpleDB  configuraKon   –  Stores  token  slots  and  opKons   –  Avoids  circular  “bootstrap  problems”   •  Removed  Root  Disk  Dependency  on  EBS   –  Use  S3  backed  AMI  for  stateful  services   –  Normally  use  EBS  backed  AMI  for  fast  provisioning  
  • 41. Auto  Scale  Group  ConfiguraKon  
  • 42. Chaos  Monkey   •  Make  sure  systems  are  resilient   –  Allow  any  instance  to  fail  without  customer  impact   •  Chaos  Monkey  hours   –  Monday-­‐Thursday  9am-­‐3pm  random  instance  kill   •  ApplicaKon  configuraKon  opKon   –  Apps  now  have  to  opt-­‐out  from  Chaos  Monkey   •  Computers  (Datacenter  or  AWS)  randomly  die   –  Fact  of  life,  but  too  infrequent  to  test  resiliency  
  • 43. Capacity  Planning  &  Monitoring  
  • 44. Capacity  Planning  in  Clouds   (a  few  things  have  changed…)   •  Capacity  is  expensive   •  Capacity  takes  Kme  to  buy  and  provision   •  Capacity  only  increases,  can’t  be  shrunk  easily   •  Capacity  comes  in  big  chunks,  paid  up  front   •  Planning  errors  can  cause  big  problems   •  Systems  are  clearly  defined  assets   •  Systems  can  be  instrumented  in  detail   •  Depreciate  assets  over  3  years  (reservaKons!)  
  • 45. Data  Sources   • External  URL  availability  and  latency  alerts  and  reports  –  Keynote   External  TesKng   • Stress  tesKng  -­‐  SOASTA   • Ne8lix  REST  calls  –  Chukwa  to  DataOven  with  GUID  transacKon  idenKfier   Request  Trace  Logging   • Generic  HTTP  –  AppDynamics  service  Ker  aggregaKon,  end  to  end  tracking   • Tracers  and  counters  –  log4j,  tracer  central,  Chukwa  to  DataOven   ApplicaKon  logging   • Trackid  and  Audit/Debug  logging  –  DataOven,  Appdynamics    GUID  cross  reference   • ApplicaKon  specific  real  Kme  –  Datastax  Opscenter,  Appdynamics   JMX    Metrics   • Service  and  SLA  percenKles  –  Appdynamics,  Epic  logged  to  DataOven   • Stdout  logs  –  S3  –  DataOven   Tomcat  and  Apache  logs   • Standard  format  Access  and  Error  logs  –  S3  –  DataOven   • Garbage  CollecKon  –  Appdynamics   JVM   • Memory  usage,  call  stacks,  resource/call  -­‐  AppDynamics   • system  CPU/Net/RAM/Disk  metrics  –  AppDynamics   Linux   • SNMP  metrics  –  Epic,  Network  flows  –  boundary.com   • Load  balancer  traffic  –  Amazon  Cloudwatch,  SimpleDB  usage  stats   AWS   • System  configuraKon    -­‐  CPU  count/speed  and  RAM  size,  overall  usage  -­‐  AWS  
  • 46. AppDynamics   How  to  look  deep  inside  your  cloud  applicaKons   •  AutomaKc  Monitoring   –  Base  AMI  bakes  in  all  monitoring  tools   –  Outbound  calls  only  –  no  discovery/polling  issues   –  InacKve  instances  removed  a4er  a  few  days     •  Incident  Alarms  (deviaKon  from  baseline)   –  Business  TransacKon  latency  and  error  rate   –  Alarm  thresholds  discover  their  own  baseline   –  Email  contains  URL  to  Incident  Workbench  UI  
  • 47. AppDynamics  Monitoring  of  Cassandra  –  AutomaKc  Discovery  
  • 49. Ne8lix  ContribuKons  to  Cassandra   •  Cassandra  as  a  mutable  toolkit   –  Cassandra  is  in  Java,  pluggable,  well  structured   –  Ne8lix  has  a  building  full  of  Java  engineers….   •  Actual  ContribuKons  delivered  in  0.8   –  First  prototype  of  off-­‐heap  row  cache  (Vijay)   –  Incremental  backup  SSTable  write  callback   •  Work  In  Progress   –  AWS  integraKon  and  backup  using  Tomcat  helper   –  Total  re-­‐write  of  Hector  Java  client  library  (Eran)  
  • 50. Ne8lix  “NoOps”  OrganizaKon   MarkeKng  &  AdverKsing  Site   Member  Site  PersonalizaKon   for  Customer  AcquisiKon   for  Customer  RetenKon   Cloud  Ops   Build  Tools   Database   Pla8orm   Cloud   Cloud   Reliability   and   Engineering   Development   Performance   SoluKons   Engineering   AutomaKon   Perforce   Cassandra   Cassandra   Cassandra   Cassandra   Cassandra   Jenkins   AWS   AWS   AWS   AWS   AWS   AWS  
  • 51. Takeaway     Ne9lix  is  using  Cassandra  on  AWS  as  a  key     infrastructure  component  of  its  globally   distributed  streaming  product.     h;p://www.linkedin.com/in/adriancockcro4   @adrianco  #ne8lixcloud  
  • 52. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaKon  code)   •  EC2  –  ElasKc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraKons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosKng  cloud  instances   –  Region  –  group  of  Availability  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan   •  ASG  –  Auto  Scaling  Group  (instances  booKng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (h;p  access)   •  EBS  –  ElasKc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDS  –  RelaKonal  Database  Service  (managed  MySQL  master  and  slaves)   •  SDB  –  Simple  Data  Base  (hosted  h;p  based  NoSQL  data  store)   •  SQS  –  Simple  Queue  Service  (h;p  based  message  queue)   •  SNS  –  Simple  NoKficaKon  Service  (h;p  and  email  based  topics  and  messages)   •  EMR  –  ElasKc  Map  Reduce  (automaKcally  managed  Hadoop  cluster)   •  ELB  –  ElasKc  Load  Balancer   •  EIP  –  ElasKc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (extension  of  enterprise  datacenter  network  into  cloud)   •  IAM  –  IdenKty  and  Access  Management  (fine  grain  role  based  security  keys)