SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Globally	
  Distributed	
  Cloud	
  
  Applica4ons	
  at	
  Ne7lix	
  
              October	
  2012	
  
             Adrian	
  Cockcro3	
  
            @adrianco	
  #ne6lixcloud	
  
    h;p://www.linkedin.com/in/adriancockcro3	
  
Adrian	
  Cockcro3	
  
•  Director,	
  Architecture	
  for	
  Cloud	
  Systems,	
  Ne6lix	
  Inc.	
  
      –  Previously	
  Director	
  for	
  PersonalizaMon	
  Pla6orm	
  

•  DisMnguished	
  Availability	
  Engineer,	
  eBay	
  Inc.	
  2004-­‐7	
  
      –  Founding	
  member	
  of	
  eBay	
  Research	
  Labs	
  

•  DisMnguished	
  Engineer,	
  Sun	
  Microsystems	
  Inc.	
  1988-­‐2004	
  
      –    2003-­‐4	
  Chief	
  Architect	
  High	
  Performance	
  Technical	
  CompuMng	
  
      –    2001	
  Author:	
  Capacity	
  Planning	
  for	
  Web	
  Services	
  
      –    1999	
  Author:	
  Resource	
  Management	
  
      –    1995	
  &	
  1998	
  Author:	
  Sun	
  Performance	
  and	
  Tuning	
  
      –    1996	
  Japanese	
  EdiMon	
  of	
  Sun	
  Performance	
  and	
  Tuning	
  
             •  	
  SPARC	
  &	
  Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)	
  


•  More	
  
      –  Twi;er	
  @adrianco	
  –	
  Blog	
  h;p://perfcap.blogspot.com	
  
      –  PresentaMons	
  at	
  h;p://www.slideshare.net/adrianco	
  
The	
  Ne6lix	
  Streaming	
  Service	
  

  Now	
  in	
  USA,	
  Canada,	
  LaMn	
  America,	
  
    UK,	
  Ireland,	
  Sweden,	
  Denmark,	
  
               Norway	
  and	
  Finland	
  
US	
  Non-­‐Member	
  Web	
  Site	
  
     AdverMsing	
  and	
  MarkeMng	
  Driven	
  
Member	
  Web	
  Site	
  
   PersonalizaMon	
  Driven	
  
Streaming	
  Device	
  API	
  




                           Netflix Ready Devices
                          From:      May 2008
                            To:      May 2010
Content	
  Delivery	
  Service	
  
Distributed	
  storage	
  nodes	
  controlled	
  by	
  Ne6lix	
  cloud	
  services	
  
Abstract	
  
•  Ne6lix	
  on	
  Cloud	
  –	
  What,	
  Why	
  and	
  When	
  

•  Globally	
  Distributed	
  Architecture	
  

•  Open	
  Source	
  Components	
  
Why	
  Use	
  Cloud?	
  
         	
  
           	
  
Things	
  we	
  don’t	
  do	
  
What	
  Ne6lix	
  Did	
  
•  Moved	
  to	
  SaaS	
  
    –  Corporate	
  IT	
  –	
  OneLogin,	
  Workday,	
  Box,	
  Evernote…	
  
    –  Tools	
  –	
  Pagerduty,	
  AppDynamics,	
  EMR	
  (Hadoop)	
  
•  Built	
  our	
  own	
  PaaS	
  
    –  Customized	
  to	
  make	
  our	
  developers	
  producMve	
  
    –  Large	
  scale,	
  global,	
  highly	
  available,	
  leveraging	
  AWS	
  
•  Moved	
  incremental	
  capacity	
  to	
  IaaS	
  
    –  No	
  new	
  datacenter	
  space	
  since	
  2008	
  as	
  we	
  grew	
  
    –  Moved	
  our	
  streaming	
  apps	
  to	
  the	
  cloud	
  
Keeping	
  up	
  with	
  Developer	
  Trends	
  
                                                               In	
  producMon	
  
                                                               at	
  Ne6lix	
  
•    Big	
  Data/Hadoop	
                                       2009	
  
•    AWS	
  Cloud	
                                             2009	
  
•    ApplicaMon	
  Performance	
  Management	
   2010	
  
•    Integrated	
  DevOps	
  PracMces	
                         2010	
  
•    ConMnuous	
  IntegraMon/Delivery	
                         2010	
  
•    NoSQL	
                                                    2010	
  
•    Pla6orm	
  as	
  a	
  Service;	
  Fine	
  grain	
  SOA	
   2010	
  
•    Social	
  coding,	
  open	
  development/github	
   2011	
  
AWS	
  specific	
  feature	
  dependence….	
  
                      	
  
                     	
  
Portability	
  vs.	
  FuncMonality	
  
•  Portability	
  –	
  the	
  OperaMons	
  focus	
  
   –  Avoid	
  vendor	
  lock-­‐in	
  
   –  Support	
  datacenter	
  based	
  use	
  cases	
  
   –  Possible	
  operaMons	
  cost	
  savings	
  

•  FuncMonality	
  –	
  the	
  Developer	
  focus	
  
   –  Less	
  complex	
  test	
  and	
  debug,	
  one	
  mature	
  supplier	
  
   –  Faster	
  Mme	
  to	
  market	
  for	
  your	
  products	
  
   –  Possible	
  developer	
  Mme/cost	
  savings	
  
FuncMonal	
  PaaS	
  
•  IaaS	
  base	
  -­‐	
  all	
  the	
  features	
  of	
  AWS	
  
    –  Very	
  large	
  scale,	
  mature,	
  global,	
  evolving	
  rapidly	
  
    –  ELB,	
  Autoscale,	
  VPC,	
  SQS,	
  EIP,	
  EMR,	
  etc,	
  etc.	
  
    –  E.g.	
  Large	
  files	
  (TB)	
  and	
  mulMpart	
  writes	
  in	
  S3	
  

•  FuncMonal	
  PaaS	
  –	
  Ne6lix	
  added	
  features	
  
    –  ConMnuous	
  build/deploy,	
  SOA,	
  HA	
  pa;erns	
  	
  
    –  Asgard	
  console,	
  Monkeys,	
  Big	
  data	
  tools	
  
    –  Cassandra/Zookeeper	
  data	
  store	
  automaMon	
  
How	
  Ne6lix	
  Works	
  
Consumer	
  
Electronics	
                                                               User	
  Data	
  

AWS	
  Cloud	
                                  Web	
  Site	
  or	
  
                                               Discovery	
  API	
  
 Services	
  
                                                                         PersonalizaMon	
  
CDN	
  Edge	
  
LocaMons	
  
                                                                                DRM	
  
                   Customer	
  Device	
  
                                               Streaming	
  API	
  
                    (PC,	
  PS3,	
  TV…)	
  
                                                                           QoS	
  Logging	
  


                                                                             CDN	
  
                                                                        Management	
  and	
  
                                                                           Steering	
  
                                               OpenConnect	
  
                                                CDN	
  Boxes	
  
                                                                        Content	
  Encoding	
  
Component	
  Services	
  
 (Simplified	
  view	
  using	
  AppDynamics)	
  
Web	
  Server	
  Dependencies	
  Flow	
  
 (Home	
  page	
  business	
  transacMon	
  as	
  seen	
  by	
  AppDynamics)	
  




                                                       Cassandra	
  

                                                                       memcached	
  
                                                                 Web	
  service	
  
Start	
  Here	
  
                                                                       S3	
  bucket	
  
One	
  Request	
  Snapshot	
  
 (captured	
  because	
  it	
  was	
  unusually	
  slow)	
  
Current	
  Architectural	
  Pa;erns	
  for	
  Availability	
  

•  Isolated	
  Services	
  
   –  Resilient	
  Business	
  logic	
  
•  Three	
  Balanced	
  Availability	
  Zones	
  
   –  Resilient	
  to	
  Infrastructure	
  outage	
  
•  Triple	
  Replicated	
  Persistence	
  
   –  Durable	
  distributed	
  Storage	
  
•  Isolated	
  Regions	
  
   –  US	
  and	
  EU	
  don’t	
  take	
  each	
  other	
  down	
  
Isolated	
  Services	
  
                                                    	
  
Test	
  With	
  Chaos	
  Monkey,	
  Latency	
  Monkey
Three	
  Balanced	
  Availability	
  Zones	
  
                                  Test	
  with	
  Chaos	
  Gorilla	
  

                                           Load	
  Balancers	
  




          Zone	
  A	
                              Zone	
  B	
                       Zone	
  C	
  
Cassandra	
  and	
  Evcache	
            Cassandra	
  and	
  Evcache	
     Cassandra	
  and	
  Evcache	
  
      Replicas	
                               Replicas	
                        Replicas	
  
Triple	
  Replicated	
  Persistence	
  
            Cassandra	
  maintenance	
  affects	
  individual	
  replicas	
                	
  
                                     Load	
  Balancers	
  




          Zone	
  A	
                        Zone	
  B	
                       Zone	
  C	
  
Cassandra	
  and	
  Evcache	
      Cassandra	
  and	
  Evcache	
     Cassandra	
  and	
  Evcache	
  
      Replicas	
                         Replicas	
                        Replicas	
  
Isolated	
  Regions	
  

                        US-­‐East	
  Load	
  Balancers	
                                                               EU-­‐West	
  Load	
  Balancers	
  




       Zone	
  A	
                         Zone	
  B	
                     Zone	
  C	
                 Zone	
  A	
                        Zone	
  B	
                 Zone	
  C	
  

Cassandra	
  Replicas	
             Cassandra	
  Replicas	
         Cassandra	
  Replicas	
     Cassandra	
  Replicas	
            Cassandra	
  Replicas	
     Cassandra	
  Replicas	
  
Failure	
  Modes	
  and	
  Effects	
  
Failure	
  Mode	
              Probability	
     Mi4ga4on	
  Plan	
  
ApplicaMon	
  Failure	
        High	
            AutomaMc	
  degraded	
  response	
  
AWS	
  Region	
  Failure	
     Low	
             Wait	
  for	
  region	
  to	
  recover	
  
AWS	
  Zone	
  Failure	
       Medium	
          ConMnue	
  to	
  run	
  on	
  2	
  out	
  of	
  3	
  zones	
  
Datacenter	
  Failure	
        Medium	
          Migrate	
  more	
  funcMons	
  to	
  cloud	
  
Data	
  store	
  failure	
     Low	
             Restore	
  from	
  S3	
  backups	
  
S3	
  failure	
                Low	
             Restore	
  from	
  remote	
  archive	
  
Ne6lix	
  Deployed	
  on	
  AWS	
  
   2009	
            2009	
                  2010	
              2010	
            2010	
             2011	
  

Content	
            Logs	
                  Play	
              WWW	
             API	
                CS	
  
   Content	
             S3	
                                                                         InternaMonal	
  
  Management	
                                   DRM	
             Sign-­‐Up	
      Metadata	
          CS	
  lookup	
  
                      Terabytes	
  


      EC2	
                                                         Search	
          Device	
         DiagnosMcs	
  
                           EMR	
             CDN	
  rouMng	
                          Config	
           &	
  AcMons	
  
    Encoding	
                                                       Solr	
  


      S3	
                                                          Movie	
         TV	
  Movie	
       Customer	
  
                      Hive	
  &	
  Pig	
     Bookmarks	
           Choosing	
       Choosing	
           Call	
  Log	
  
   Petabytes	
  


                       Business	
                                                     Social	
  
                                                Logging	
           RaMngs	
        Facebook	
        CS	
  AnalyMcs	
  
                     Intelligence	
  
   CDNs	
  
    ISPs	
  
  Terabits	
  
 Customers	
  
Cloud	
  Architecture	
  Pa;erns	
  

        Where	
  do	
  we	
  start?	
  
Datacenter	
  to	
  Cloud	
  TransiMon	
  Goals	
  
•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenMle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verMcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasMc	
  capacity	
  effecMvely	
  
•  Available	
  
     –  SubstanMally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulMple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Mme,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducMve	
  
     –  OpMmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaMon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
Ne6lix	
  Datacenter	
  vs.	
  Cloud	
  Arch	
  
   Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

SMcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

      Cha;y	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

    Instrumented	
  Code	
              Instrumented	
  Service	
  Pa;erns	
  

   Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

 Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
Cassandra	
  on	
  AWS	
  

A	
  highly	
  available	
  and	
  durable	
  
        deployment	
  pa;ern	
  
Cassandra	
  Service	
  Pa;ern	
  
                                                                                 Cassandra	
  Cluster	
  
Service	
  REST	
  Clients	
                                                     Managed	
  by	
  Priam	
  
                                                                                 Between	
  6	
  and	
  72	
  nodes	
  

                                      Data	
  Access	
  REST	
  Service	
  
                                      Astyanax	
  Cassandra	
  Client	
  




                                                                                           Datacenter	
  
                                                                                           Update	
  Flow	
  


                            Appdynamics	
  Service	
  Flow	
  VisualizaMon	
  
ProducMon	
  Deployment	
  
Totally	
  Denormalized	
  Data	
  Model	
  

  Over	
  50	
  Cassandra	
  Clusters	
  
  Over	
  500	
  nodes	
  
  Over	
  30TB	
  of	
  daily	
  backups	
  
  Biggest	
  cluster	
  72	
  nodes	
  
  1	
  cluster	
  over	
  250Kwrites/s	
  
Astyanax	
  -­‐	
  Cassandra	
  Write	
  Data	
  Flows	
  
                 Single	
  Region,	
  MulMple	
  Availability	
  Zone,	
  Token	
  Aware	
  

                                                               Cassandra	
  
                                                               • Disks	
  
                                                               • Zone	
  A	
  

1.  Client	
  Writes	
  to	
  local	
     Cassandra	
  3	
                                  2	
  
                                                                                             Cassandra	
           If	
  a	
  node	
  goes	
  offline,	
  
    coordinator	
                         • Disks	
   4                                     3	
  Disks	
   4	
  
                                                                                             •                     hinted	
  handoff	
  
2.  Coodinator	
  writes	
  to	
          • Zone	
  C	
                1                      • Zone	
  B	
        completes	
  the	
  write	
  
                                                                                               2	
  
    other	
  zones	
  
3.  Nodes	
  return	
  ack	
  
                                                               Token	
                                             when	
  the	
  node	
  comes	
  
                                                                                                                   back	
  up.	
  
4.  Data	
  wri;en	
  to	
                                     Aware	
                                             	
  
    internal	
  commit	
  log	
                                Clients	
                                           Requests	
  can	
  choose	
  to	
  
    disks	
  (no	
  more	
  than	
        Cassandra	
                                         Cassandra	
          wait	
  for	
  one	
  node,	
  a	
  
    10	
  seconds	
  later)	
             • Disks	
                                           • Disks	
            quorum,	
  or	
  all	
  nodes	
  to	
  
                                          • Zone	
  B	
                                       • Zone	
  C	
        ack	
  the	
  write	
  
                                                                                    3	
                            	
  
                                                               Cassandra	
                                         SSTable	
  disk	
  writes	
  and	
  
                                                               • Disks	
         4	
                               compacMons	
  occur	
  
                                                               • Zone	
  A	
  
                                                                                                                   asynchronously	
  
Data	
  Flows	
  for	
  MulM-­‐Region	
  Writes	
  
              Token	
  Aware,	
  Consistency	
  Level	
  =	
  Local	
  Quorum	
  

1.  Client	
  writes	
  to	
  local	
  replicas	
                                If	
  a	
  node	
  or	
  region	
  goes	
  offline,	
  hinted	
  handoff	
  
2.  Local	
  write	
  acks	
  returned	
  to	
                                   completes	
  the	
  write	
  when	
  the	
  node	
  comes	
  back	
  up.	
  
    Client	
  which	
  conMnues	
  when	
                                        Nightly	
  global	
  compare	
  and	
  repair	
  jobs	
  ensure	
  
    2	
  of	
  3	
  local	
  nodes	
  are	
                                      everything	
  stays	
  consistent.	
  
    commi;ed	
  
3.  Local	
  coordinator	
  writes	
  to	
  
    remote	
  coordinator.	
  	
                                                  Cassandra	
                           100+ms	
  latency	
  
4.  When	
  data	
  arrives,	
  remote	
  
                                                                                                                                                                Cassandra	
  
                                                                                  •  Disks	
                                                                    •  Disks	
  
                                                                                  •  Zone	
  A	
                                                                •  Zone	
  A	
  

    coordinator	
  node	
  acks	
  and	
              Cassandra	
        2	
                          2	
  
                                                                                                     Cassandra	
                           Cassandra	
                             4	
  
                                                                                                                                                                                    Cassandra	
  
                                                                6	
                                                6	
   3	
            5	
   Disks	
  6	
  
    copies	
  to	
  other	
  remote	
  zones	
                                                                                                                                              6	
  
                                                      •  Disks	
                                     •  Disks	
  
                                                      •  Zone	
  C	
                                 •  Zone	
  B	
  
                                                                                                                                         • 
                                                                                                                                           •  Zone	
  C	
                          4	
  Disks	
  B	
  
                                                                                                                                                                                    • 
                                                                                                                                                                                    •  Zone	
  
                                                                                           1	
  
                                                                                                                                                                                           4	
  
5.  Remote	
  nodes	
  ack	
  to	
  local	
                                        US	
                                                                          EU	
  
    coordinator	
                                                                Clients	
                                                                     Clients	
  
                                                      Cassandra	
                                          2	
  
                                                                                                     Cassandra	
                           Cassandra	
                               Cassandra	
  
6.  Data	
  flushed	
  to	
  internal	
                •  Disks	
  
                                                      •  Zone	
  B	
  
                                                                                                     •  Disks	
  
                                                                                                                   6	
  
                                                                                                     •  Zone	
  C	
  
                                                                                                                                           •  Disks	
  
                                                                                                                                           •  Zone	
  B	
  
                                                                                                                                                                                     •  Disks	
  
                                                                                                                                                                                     •  Zone	
  C	
  

    commit	
  log	
  disks	
  (no	
  more	
                                       Cassandra	
                                                                  6	
  
                                                                                                                                                                       5	
  
                                                                                                                                                                Cassandra	
  

    than	
  10	
  seconds	
  later)	
  
                                                                                  •  Disks	
                                                                    •  Disks	
  
                                                                                  •  Zone	
  A	
                                                                •  Zone	
  A	
  
ETL	
  for	
  Cassandra	
  
•    Data	
  is	
  de-­‐normalized	
  over	
  many	
  clusters!	
  
•    Too	
  many	
  to	
  restore	
  from	
  backups	
  for	
  ETL	
  
•    SoluMon	
  –	
  read	
  backup	
  files	
  using	
  Hadoop	
  
•    Aegisthus	
  
      –  h;p://techblog.ne6lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html	
  

      –  High	
  throughput	
  raw	
  SSTable	
  processing	
  
      –  Re-­‐normalizes	
  many	
  clusters	
  to	
  a	
  consistent	
  view	
  
      –  Extract,	
  Transform,	
  then	
  Load	
  into	
  Teradata	
  
Benchmarks	
  and	
  Scalability	
  
Cloud	
  Deployment	
  Scalability	
  
                 New	
  Autoscaled	
  AMI	
  –	
  zero	
  to	
  500	
  instances	
  from	
  21:38:52	
  -­‐	
  21:46:32,	
  7m40s	
  
       Scaled	
  up	
  and	
  down	
  over	
  a	
  few	
  days,	
  total	
  2176	
  instance	
  launches,	
  m2.2xlarge	
  (4	
  core	
  34GB)	
  
	
  
                                 Min. 1st Qu.               Median             Mean 3rd Qu.                  Max. !
                                 41.0   104.2                149.0            171.8   215.8                 562.0!
Scalability	
  from	
  48	
  to	
  288	
  nodes	
  on	
  AWS	
  
  h;p://techblog.ne6lix.com/2011/11/benchmarking-­‐cassandra-­‐scalability-­‐on.html	
  


                        Client	
  Writes/s	
  by	
  node	
  count	
  –	
  Replica4on	
  Factor	
  =	
  3	
  
1200000	
  
                                                                                                         1099837	
  
1000000	
  

 800000	
  
                                                                                     Used	
  288	
  of	
  m1.xlarge	
  
                                                                                     4	
  CPU,	
  15	
  GB	
  RAM,	
  8	
  ECU	
  
 600000	
  
                                                              537172	
               Cassandra	
  0.86	
  
                                                                                     Benchmark	
  config	
  only	
  
 400000	
                                        366828	
                            existed	
  for	
  about	
  1hr	
  
 200000	
                           174373	
  

        0	
  
                0	
             50	
         100	
        150	
            200	
      250	
             300	
             350	
  
Cassandra	
  on	
  AWS	
  
The	
  Past	
                              The	
  Future	
  
•  Instance:	
  m2.4xlarge	
               •  Instance:	
  hi1.4xlarge	
  
•  Storage:	
  2	
  drives,	
  1.7TB	
     •  Storage:	
  2	
  SSD	
  volumes,	
  2TB	
  
•  CPU:	
  8	
  Cores,	
  26	
  ECU	
      •  CPU:	
  8	
  HT	
  cores,	
  35	
  ECU	
  
•  RAM:	
  68GB	
                          •  RAM:	
  64GB	
  
•  Network:	
  1Gbit	
                     •  Network:	
  10Gbit	
  
•  IOPS:	
  ~500	
                         •  IOPS:	
  ~100,000	
  
•  Throughput:	
  ~100Mbyte/s	
            •  Throughput:	
  ~1Gbyte/s	
  
•  Cost:	
  $1.80/hr	
                     •  Cost:	
  $3.10/hr	
  
Cassandra	
  Disk	
  vs.	
  SSD	
  Benchmark	
  
        Same	
  Throughput,	
  Lower	
  Latency,	
  Half	
  Cost	
  
Availability	
  and	
  Resilience	
  
Chaos	
  Monkey	
  
h;p://techblog.ne6lix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html           	
  
•  Computers	
  (Datacenter	
  or	
  AWS)	
  randomly	
  die	
  
    –  Fact	
  of	
  life,	
  but	
  too	
  infrequent	
  to	
  test	
  resiliency	
  
•  Test	
  to	
  make	
  sure	
  systems	
  are	
  resilient	
  
    –  Allow	
  any	
  instance	
  to	
  fail	
  without	
  customer	
  impact	
  
•  Chaos	
  Monkey	
  hours	
  
    –  Monday-­‐Friday	
  9am-­‐3pm	
  random	
  instance	
  kill	
  
•  ApplicaMon	
  configuraMon	
  opMon	
  
    –  Apps	
  now	
  have	
  to	
  opt-­‐out	
  from	
  Chaos	
  Monkey	
  
Responsibility	
  and	
  Experience	
  
•  Make	
  developers	
  responsible	
  for	
  failures	
  
    –  Then	
  they	
  learn	
  and	
  write	
  code	
  that	
  doesn’t	
  fail	
  
•  Use	
  Incident	
  Reviews	
  to	
  find	
  gaps	
  to	
  fix	
  
    –  Make	
  sure	
  its	
  not	
  about	
  finding	
  “who	
  to	
  blame”	
  
•  Keep	
  Mmeouts	
  short,	
  fail	
  fast	
  
    –  Don’t	
  let	
  cascading	
  Mmeouts	
  stack	
  up	
  
•  Make	
  configuraMon	
  opMons	
  dynamic	
  
    –  You	
  don’t	
  want	
  to	
  push	
  code	
  to	
  tweak	
  an	
  opMon	
  
Resilient	
  Design	
  –	
  Circuit	
  Breakers	
  
h;p://techblog.ne6lix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html	
  
Distributed	
  OperaMonal	
  Model	
  
•  Developers	
  
   –  Provision	
  and	
  run	
  their	
  own	
  code	
  in	
  producMon	
  
   –  Take	
  turns	
  to	
  be	
  on	
  call	
  if	
  it	
  breaks	
  (pagerduty)	
  
   –  Configure	
  autoscalers	
  to	
  handle	
  capacity	
  needs	
  

•  DevOps	
  and	
  PaaS	
  (aka	
  NoOps)	
  
   –  DevOps	
  is	
  used	
  to	
  build	
  and	
  run	
  the	
  PaaS	
  
   –  PaaS	
  constrains	
  Dev	
  to	
  use	
  automaMon	
  instead	
  
   –  PaaS	
  puts	
  more	
  responsibility	
  on	
  Dev,	
  with	
  tools	
  
Culture	
  
UnconvenMonal	
  Culture	
  
                See	
  culture	
  deck	
  at	
  h;p://jobs.ne6lix.com	
  

•    Brave/Aggressive	
  from	
  the	
  top	
  down	
  
•    Focus	
  on	
  talent	
  density	
  above	
  everything	
  
•    Reduce	
  process,	
  remove	
  complexity	
  
•    Freedom	
  and	
  Responsibility	
  
•    One	
  product	
  focus	
  for	
  the	
  whole	
  company	
  
•    (almost)	
  full	
  informaMon	
  sharing	
  across	
  co.	
  
•    Simplified	
  managers	
  role	
  
Managers	
  Role	
  
•    Hiring,	
  Architecture,	
  Project	
  Management	
  
•    No	
  vacaMon	
  policy	
  to	
  track	
  
•    (Almost)	
  no	
  remote	
  employees	
  or	
  contractors	
  
•    No	
  bonuses	
  to	
  allocate	
  
•    No	
  expenses	
  to	
  approve	
  
•    Pay	
  mark	
  to	
  market	
  handled	
  at	
  VP	
  level	
  
Ne6lix	
  OrganizaMon	
  
                     DevOps	
  Org	
  ReporMng	
  into	
  Product	
  Group,	
  not	
  ITops                                                                  	
  
                                                   CEO	
  –	
  Reed	
  HasMngs	
  

                         CPO	
  –	
  Chief	
  Product	
  Officer	
  –	
  Neil	
  Hunt	
  

                 VP	
  -­‐	
  Cloud	
  and	
  Pla6orm	
  Engineering	
  -­‐	
  Yury	
  
                             Pla6orm	
  and	
                                  Cloud	
  Ops	
           PersonalizaMon	
  
                              Persistence	
                                    Reliability	
             Pla6orm	
  and	
       Membership	
  and	
         Data	
  Science	
  
  Architecture	
                                   Cloud	
  SoluMons	
                                                             Billing	
                 Pla6orm	
  
                             Engineering	
                                    Engineering	
            Performance	
  Eng	
  


Future	
  planning	
        Base	
  Pla6orm	
        Monitoring	
                                         Metadata	
  
                                                                             Alert	
  RouMng	
                                    Data	
  sources	
            Business	
  
 Security	
  Arch	
          Zookeeper	
              Monkeys	
                                         Benchmarking	
                                       Intelligence	
  
                                                                           Incident	
  Lifecycle	
                              Vault	
  processing	
  
    Efficiency	
              Cassandra	
  Ops	
        Build	
  Tools	
                                   Memcached	
  


    AWS	
  VPC	
  
                                                   AWS	
  Instances	
  
  Hyperguard	
              AWS	
  Instances	
                                 PagerDuty	
              AWS	
  Instances	
          Cassandra	
           Hadoop	
  on	
  EMR	
  
                                                       AWS	
  API	
  
 Powerpoint	
  J	
  
Build	
  Your	
  Own	
  PaaS	
  
Components	
  
•    ConMnuous	
  build	
  framework	
  turns	
  code	
  into	
  AMIs	
  
•    AWS	
  accounts	
  for	
  test,	
  producMon,	
  etc.	
  
•    Cloud	
  access	
  gateway	
  
•    Service	
  registry	
  
•    ConfiguraMon	
  properMes	
  service	
  
•    Persistence	
  services	
  
•    Monitoring,	
  alert	
  forwarding	
  
•    Backups,	
  archives	
  
Ne6lix	
  Open	
  Source	
  Strategy	
  
•  Release	
  PaaS	
  Components	
  git-­‐by-­‐git	
  
    –  Source	
  at	
  github.com/ne6lix	
  –	
  we	
  build	
  from	
  it…	
  
    –  Intros	
  and	
  techniques	
  at	
  techblog.ne6lix.com	
  
    –  Blog	
  post	
  or	
  new	
  code	
  every	
  few	
  weeks	
  


•  MoMvaMons	
  
    –  Give	
  back	
  to	
  Apache	
  licensed	
  OSS	
  community	
  
    –  MoMvate,	
  retain,	
  hire	
  top	
  engineers	
  
    –  “Peer	
  pressure”	
  code	
  cleanup,	
  external	
  contribuMons	
  
Instance	
  creaMon	
  


 Bakery	
  &	
  
Build	
  tools	
                                      Asgard	
  

                     Base	
  AMI	
                                                    Instance	
  
                                                               Autoscaling	
  
ApplicaMon	
                               Odin	
                scripts	
  
  Code	
  




 Image	
  baked	
                      ASG	
  /	
  Instance	
  started	
         Instance	
  Running	
  
ApplicaMon	
  Launch	
  


    Governator	
  
                                                  Eureka	
  
     (Guice)	
  


                      Async	
  
                     logging	
  

                                   Archaius	
              Entrypoints	
  
       Servo	
  




                                          Registering,	
  
ApplicaMon	
  iniMalizing	
  
                                         configuraMon	
  
RunMme	
  


     Astyanax	
                                Priam	
  

                    Curator	
                                                    Chaos	
  Monkey	
  
                                                                                 Latency	
  Monkey	
  
                                  NIWS	
  
                                                             Exhibitor	
  
                                   LB	
                                          Janitor	
  Monkey	
  
                    REST	
  
                                                                                 Cass	
  JMeter	
  
Dependency	
        client	
  
 Command	
                                   Explorers	
  



 Calling	
  other	
                      Managing	
                          Resiliency	
  aids	
  
   services	
                             service	
  
Open	
  Source	
  Projects	
  
             Legend	
  
  Github	
  /	
  Techblog	
                      Priam	
                                        Exhibitor	
  
                                                                                                                                Servo	
  and	
  Autoscaling	
  Scripts	
  
Apache	
  ContribuMons	
  
                                    Cassandra	
  as	
  a	
  Service	
               Zookeeper	
  as	
  a	
  Service	
  
                                          Astyanax	
                                        Curator	
                                          Honu	
  
    Techblog	
  Post	
  
                                   Cassandra	
  client	
  for	
  Java	
                Zookeeper	
  Pa;erns	
                    Log4j	
  streaming	
  to	
  Hadoop	
  
     Coming	
  Soon	
  
                                          CassJMeter	
                                  EVCache	
                                      Circuit	
  Breaker	
  
                                      Cassandra	
  test	
  suite	
                 Memcached	
  as	
  a	
  Service	
                Robust	
  service	
  pa;ern	
  

                                 Cassandra	
  MulM-­‐region	
  EC2	
                   Eureka	
  /	
  Discovery	
                  Asgard	
  -­‐	
  AutoScaleGroup	
  
                                     datastore	
  support	
                             Service	
  Directory	
                       based	
  AWS	
  console	
  

                                          Aegisthus	
                                    Archaius	
                                    Chaos	
  Monkey	
  
                                  Hadoop	
  ETL	
  for	
  Cassandra	
           Dynamics	
  ProperMes	
  Service	
                  Robustness	
  verificaMon	
  

                                              Explorers	
                                    EntryPoints	
                               Latency	
  Monkey	
  

                                Governator	
  -­‐	
  Library	
  lifecycle	
       Server-­‐side	
  latency/error	
  
                                 and	
  dependency	
  injecMon	
                            injecMon	
                                   Janitor	
  Monkey	
  

                                          Odin	
  
                                                                                   REST	
  Client	
  +	
  mid-­‐Mer	
  LB	
             Bakeries	
  and	
  AMI	
  
                                   Workflow	
  orchestraMon	
  

                                           Async	
  logging	
                   ConfiguraMon	
  REST	
  endpoints	
                       Build	
  dynaslaves	
  
Roadmap	
  for	
  2012	
  
•    More	
  resiliency	
  and	
  improved	
  availability	
  
•    More	
  automaMon,	
  orchestraMon	
  
•    “Hardening”	
  the	
  pla6orm,	
  code	
  clean-­‐up	
  
•    Lower	
  latency	
  for	
  web	
  services	
  and	
  devices	
  
•    IPv6	
  –	
  now	
  running	
  in	
  prod,	
  rollout	
  in	
  process	
  
•    More	
  open	
  sourced	
  components	
  
•    See	
  you	
  at	
  AWS	
  Re:Invent	
  in	
  November…	
  
Takeaway	
  
                                                     	
  
 Ne?lix	
  has	
  built	
  and	
  deployed	
  a	
  scalable	
  global	
  Pla?orm	
  as	
  a	
  Service.	
  
                                                     	
  
Key	
  components	
  of	
  the	
  Ne?lix	
  PaaS	
  are	
  being	
  released	
  as	
  Open	
  Source	
  
                   projects	
  so	
  you	
  can	
  build	
  your	
  own	
  custom	
  PaaS.	
  
                                                     	
  
                                  h;p://github.com/Ne6lix	
  
                                 h;p://techblog.ne6lix.com	
  
                                 h;p://slideshare.net/Ne6lix	
  
                                               	
  
                          h;p://www.linkedin.com/in/adriancockcro3	
  
                                               	
  
                                  @adrianco	
  #ne6lixcloud	
  
Amazon Cloud Terminology Reference
     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaMon	
  code)	
  
•    EC2	
  –	
  ElasMc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraMons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosMng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Avail	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan,	
  SA-­‐Brazil,	
  US-­‐Gov	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booMng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (h;p	
  access)	
  
•    EBS	
  –	
  ElasMc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDS	
  –	
  RelaMonal	
  Database	
  Service	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    DynamoDB/SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  h;p	
  based	
  NoSQL	
  datastore,	
  DynamoDB	
  replaces	
  SDB)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (h;p	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoMficaMon	
  Service	
  (h;p	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasMc	
  Map	
  Reduce	
  (automaMcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasMc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasMc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (single	
  tenant,	
  more	
  flexible	
  network	
  and	
  security	
  constructs)	
  
•    DirectConnect	
  –	
  secure	
  pipe	
  from	
  AWS	
  VPC	
  to	
  external	
  datacenter	
  
•    IAM	
  –	
  IdenMty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  

Contenu connexe

Tendances

Deploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic BeanstalkDeploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic BeanstalkAmazon Web Services
 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices Amazon Web Services
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon Web Services
 
Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAmazon Web Services
 
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Amazon Web Services
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
 
Docker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshopDocker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshopSathish VJ
 
Introduction to AWS Cloud Computing
Introduction to AWS Cloud ComputingIntroduction to AWS Cloud Computing
Introduction to AWS Cloud ComputingAmazon Web Services
 
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS SummitKubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS SummitAmazon Web Services
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 

Tendances (20)

Deploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic BeanstalkDeploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
Deploy, Manage, and Scale your Apps with AWS Elastic Beanstalk
 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
AWS Cloud Watch
AWS Cloud WatchAWS Cloud Watch
AWS Cloud Watch
 
Amazon CloudFront 101
Amazon CloudFront 101Amazon CloudFront 101
Amazon CloudFront 101
 
Accelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdfAccelerate Your Cloud Migration Journey.pdf
Accelerate Your Cloud Migration Journey.pdf
 
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
What is AWS?
What is AWS?What is AWS?
What is AWS?
 
Introduction to Amazon EKS
Introduction to Amazon EKSIntroduction to Amazon EKS
Introduction to Amazon EKS
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Docker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshopDocker and Kubernetes 101 workshop
Docker and Kubernetes 101 workshop
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Introduction to AWS Cloud Computing
Introduction to AWS Cloud ComputingIntroduction to AWS Cloud Computing
Introduction to AWS Cloud Computing
 
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS SummitKubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
Kubernetes on AWS with Amazon EKS - MAD301 - New York AWS Summit
 
Zuul @ Netflix SpringOne Platform
Zuul @ Netflix SpringOne PlatformZuul @ Netflix SpringOne Platform
Zuul @ Netflix SpringOne Platform
 
AWS EC2
AWS EC2AWS EC2
AWS EC2
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 

En vedette

Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017Amazon Web Services
 
Trust in news sources and opinions on the CBC
Trust in news sources and opinions on the CBCTrust in news sources and opinions on the CBC
Trust in news sources and opinions on the CBCjasonmeyers
 
ENT101 Embracing the Cloud - AWS re: Invent 2012
ENT101 Embracing the Cloud - AWS re: Invent 2012ENT101 Embracing the Cloud - AWS re: Invent 2012
ENT101 Embracing the Cloud - AWS re: Invent 2012Amazon Web Services
 
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012Amazon Web Services
 
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012Amazon Web Services
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Philip Fisher-Ogden
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 

En vedette (11)

Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
Building an Investment Case for Mass Migrations to AWS - AWS Summit SG 2017
 
Trust in news sources and opinions on the CBC
Trust in news sources and opinions on the CBCTrust in news sources and opinions on the CBC
Trust in news sources and opinions on the CBC
 
ENT101 Embracing the Cloud - AWS re: Invent 2012
ENT101 Embracing the Cloud - AWS re: Invent 2012ENT101 Embracing the Cloud - AWS re: Invent 2012
ENT101 Embracing the Cloud - AWS re: Invent 2012
 
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
ARC203 Highly Available Architecture at Netflix - AWS re: Invent 2012
 
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Culture
CultureCulture
Culture
 

Similaire à Netflix Global Cloud Architecture

The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source PlatformRuslan Meshenberg
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSFAdrian Cockcroft
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSAcquia
 
Studio of the Future: Production Workflow in AWS
Studio of the Future: Production Workflow in AWSStudio of the Future: Production Workflow in AWS
Studio of the Future: Production Workflow in AWSControl Group
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qconYiwei Ma
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumAdrian Cockcroft
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qconYiwei Ma
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAmazon Web Services
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...IndicThreads
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012CLOUDIAN KK
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsMark Slingsby
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram Chinta
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerAmazon Web Services
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Amazon Web Services
 

Similaire à Netflix Global Cloud Architecture (20)

The Netflix Open Source Platform
The Netflix Open Source PlatformThe Netflix Open Source Platform
The Netflix Open Source Platform
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source PlatformSV Forum Platform Architecture SIG - Netflix Open Source Platform
SV Forum Platform Architecture SIG - Netflix Open Source Platform
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Netflix and Open Source
Netflix and Open SourceNetflix and Open Source
Netflix and Open Source
 
Running High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWSRunning High Availability Websites with Acquia and AWS
Running High Availability Websites with Acquia and AWS
 
Studio of the Future: Production Workflow in AWS
Studio of the Future: Production Workflow in AWSStudio of the Future: Production Workflow in AWS
Studio of the Future: Production Workflow in AWS
 
Netflix keynote-adrian-qcon
Netflix keynote-adrian-qconNetflix keynote-adrian-qcon
Netflix keynote-adrian-qcon
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
 
Netflix in the Cloud at SV Forum
Netflix in the Cloud at SV ForumNetflix in the Cloud at SV Forum
Netflix in the Cloud at SV Forum
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
AWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go SquaredAWS for Start-ups - Case Study - Go Squared
AWS for Start-ups - Case Study - Go Squared
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
 
Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012Cloudian_Cassandra Summit 2012
Cloudian_Cassandra Summit 2012
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and Docker
 
Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20Aws webcast - Scaling on AWS 13 08-20
Aws webcast - Scaling on AWS 13 08-20
 

Plus de Adrian Cockcroft

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionAdrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Adrian Cockcroft
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Adrian Cockcroft
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connectAdrian Cockcroft
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 

Plus de Adrian Cockcroft (20)

Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with NotesYow Conference Dec 2013 Netflix Workshop Slides with Notes
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Gluecon keynote
Gluecon keynoteGluecon keynote
Gluecon keynote
 
Dystopia as a Service
Dystopia as a ServiceDystopia as a Service
Dystopia as a Service
 
NetflixOSS Meetup
NetflixOSS MeetupNetflixOSS Meetup
NetflixOSS Meetup
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3) Cloud Architecture Tutorial - Why and What (1of 3)
Cloud Architecture Tutorial - Why and What (1of 3)
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
 
Netflix Velocity Conference 2011
Netflix Velocity Conference 2011Netflix Velocity Conference 2011
Netflix Velocity Conference 2011
 
Migrating to Public Cloud
Migrating to Public CloudMigrating to Public Cloud
Migrating to Public Cloud
 
Performance architecture for cloud connect
Performance architecture for cloud connectPerformance architecture for cloud connect
Performance architecture for cloud connect
 
Netflix in the cloud 2011
Netflix in the cloud 2011Netflix in the cloud 2011
Netflix in the cloud 2011
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 

Dernier

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Dernier (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Netflix Global Cloud Architecture

  • 1. Globally  Distributed  Cloud   Applica4ons  at  Ne7lix   October  2012   Adrian  Cockcro3   @adrianco  #ne6lixcloud   h;p://www.linkedin.com/in/adriancockcro3  
  • 2. Adrian  Cockcro3   •  Director,  Architecture  for  Cloud  Systems,  Ne6lix  Inc.   –  Previously  Director  for  PersonalizaMon  Pla6orm   •  DisMnguished  Availability  Engineer,  eBay  Inc.  2004-­‐7   –  Founding  member  of  eBay  Research  Labs   •  DisMnguished  Engineer,  Sun  Microsystems  Inc.  1988-­‐2004   –  2003-­‐4  Chief  Architect  High  Performance  Technical  CompuMng   –  2001  Author:  Capacity  Planning  for  Web  Services   –  1999  Author:  Resource  Management   –  1995  &  1998  Author:  Sun  Performance  and  Tuning   –  1996  Japanese  EdiMon  of  Sun  Performance  and  Tuning   •   SPARC  &  Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)   •  More   –  Twi;er  @adrianco  –  Blog  h;p://perfcap.blogspot.com   –  PresentaMons  at  h;p://www.slideshare.net/adrianco  
  • 3. The  Ne6lix  Streaming  Service   Now  in  USA,  Canada,  LaMn  America,   UK,  Ireland,  Sweden,  Denmark,   Norway  and  Finland  
  • 4. US  Non-­‐Member  Web  Site   AdverMsing  and  MarkeMng  Driven  
  • 5. Member  Web  Site   PersonalizaMon  Driven  
  • 6. Streaming  Device  API   Netflix Ready Devices From: May 2008 To: May 2010
  • 7. Content  Delivery  Service   Distributed  storage  nodes  controlled  by  Ne6lix  cloud  services  
  • 8. Abstract   •  Ne6lix  on  Cloud  –  What,  Why  and  When   •  Globally  Distributed  Architecture   •  Open  Source  Components  
  • 11. What  Ne6lix  Did   •  Moved  to  SaaS   –  Corporate  IT  –  OneLogin,  Workday,  Box,  Evernote…   –  Tools  –  Pagerduty,  AppDynamics,  EMR  (Hadoop)   •  Built  our  own  PaaS   –  Customized  to  make  our  developers  producMve   –  Large  scale,  global,  highly  available,  leveraging  AWS   •  Moved  incremental  capacity  to  IaaS   –  No  new  datacenter  space  since  2008  as  we  grew   –  Moved  our  streaming  apps  to  the  cloud  
  • 12. Keeping  up  with  Developer  Trends   In  producMon   at  Ne6lix   •  Big  Data/Hadoop   2009   •  AWS  Cloud   2009   •  ApplicaMon  Performance  Management   2010   •  Integrated  DevOps  PracMces   2010   •  ConMnuous  IntegraMon/Delivery   2010   •  NoSQL   2010   •  Pla6orm  as  a  Service;  Fine  grain  SOA   2010   •  Social  coding,  open  development/github   2011  
  • 13. AWS  specific  feature  dependence….      
  • 14. Portability  vs.  FuncMonality   •  Portability  –  the  OperaMons  focus   –  Avoid  vendor  lock-­‐in   –  Support  datacenter  based  use  cases   –  Possible  operaMons  cost  savings   •  FuncMonality  –  the  Developer  focus   –  Less  complex  test  and  debug,  one  mature  supplier   –  Faster  Mme  to  market  for  your  products   –  Possible  developer  Mme/cost  savings  
  • 15. FuncMonal  PaaS   •  IaaS  base  -­‐  all  the  features  of  AWS   –  Very  large  scale,  mature,  global,  evolving  rapidly   –  ELB,  Autoscale,  VPC,  SQS,  EIP,  EMR,  etc,  etc.   –  E.g.  Large  files  (TB)  and  mulMpart  writes  in  S3   •  FuncMonal  PaaS  –  Ne6lix  added  features   –  ConMnuous  build/deploy,  SOA,  HA  pa;erns     –  Asgard  console,  Monkeys,  Big  data  tools   –  Cassandra/Zookeeper  data  store  automaMon  
  • 16. How  Ne6lix  Works   Consumer   Electronics   User  Data   AWS  Cloud   Web  Site  or   Discovery  API   Services   PersonalizaMon   CDN  Edge   LocaMons   DRM   Customer  Device   Streaming  API   (PC,  PS3,  TV…)   QoS  Logging   CDN   Management  and   Steering   OpenConnect   CDN  Boxes   Content  Encoding  
  • 17. Component  Services   (Simplified  view  using  AppDynamics)  
  • 18. Web  Server  Dependencies  Flow   (Home  page  business  transacMon  as  seen  by  AppDynamics)   Cassandra   memcached   Web  service   Start  Here   S3  bucket  
  • 19. One  Request  Snapshot   (captured  because  it  was  unusually  slow)  
  • 20. Current  Architectural  Pa;erns  for  Availability   •  Isolated  Services   –  Resilient  Business  logic   •  Three  Balanced  Availability  Zones   –  Resilient  to  Infrastructure  outage   •  Triple  Replicated  Persistence   –  Durable  distributed  Storage   •  Isolated  Regions   –  US  and  EU  don’t  take  each  other  down  
  • 21. Isolated  Services     Test  With  Chaos  Monkey,  Latency  Monkey
  • 22. Three  Balanced  Availability  Zones   Test  with  Chaos  Gorilla   Load  Balancers   Zone  A   Zone  B   Zone  C   Cassandra  and  Evcache   Cassandra  and  Evcache   Cassandra  and  Evcache   Replicas   Replicas   Replicas  
  • 23. Triple  Replicated  Persistence   Cassandra  maintenance  affects  individual  replicas     Load  Balancers   Zone  A   Zone  B   Zone  C   Cassandra  and  Evcache   Cassandra  and  Evcache   Cassandra  and  Evcache   Replicas   Replicas   Replicas  
  • 24. Isolated  Regions   US-­‐East  Load  Balancers   EU-­‐West  Load  Balancers   Zone  A   Zone  B   Zone  C   Zone  A   Zone  B   Zone  C   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas   Cassandra  Replicas  
  • 25. Failure  Modes  and  Effects   Failure  Mode   Probability   Mi4ga4on  Plan   ApplicaMon  Failure   High   AutomaMc  degraded  response   AWS  Region  Failure   Low   Wait  for  region  to  recover   AWS  Zone  Failure   Medium   ConMnue  to  run  on  2  out  of  3  zones   Datacenter  Failure   Medium   Migrate  more  funcMons  to  cloud   Data  store  failure   Low   Restore  from  S3  backups   S3  failure   Low   Restore  from  remote  archive  
  • 26. Ne6lix  Deployed  on  AWS   2009   2009   2010   2010   2010   2011   Content   Logs   Play   WWW   API   CS   Content   S3   InternaMonal   Management   DRM   Sign-­‐Up   Metadata   CS  lookup   Terabytes   EC2   Search   Device   DiagnosMcs   EMR   CDN  rouMng   Config   &  AcMons   Encoding   Solr   S3   Movie   TV  Movie   Customer   Hive  &  Pig   Bookmarks   Choosing   Choosing   Call  Log   Petabytes   Business   Social   Logging   RaMngs   Facebook   CS  AnalyMcs   Intelligence   CDNs   ISPs   Terabits   Customers  
  • 27. Cloud  Architecture  Pa;erns   Where  do  we  start?  
  • 28. Datacenter  to  Cloud  TransiMon  Goals   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenMle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verMcally  scaled  databases   –  Leverage  AWS  elasMc  capacity  effecMvely   •  Available   –  SubstanMally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulMple  AWS  availability  zones   –  No  scheduled  down  Mme,  no  central  database  schema  to  change   •  ProducMve   –  OpMmize  agility  of  a  large  development  team  with  automaMon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 29. Ne6lix  Datacenter  vs.  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SMcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha;y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa;erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 30. Cassandra  on  AWS   A  highly  available  and  durable   deployment  pa;ern  
  • 31. Cassandra  Service  Pa;ern   Cassandra  Cluster   Service  REST  Clients   Managed  by  Priam   Between  6  and  72  nodes   Data  Access  REST  Service   Astyanax  Cassandra  Client   Datacenter   Update  Flow   Appdynamics  Service  Flow  VisualizaMon  
  • 32. ProducMon  Deployment   Totally  Denormalized  Data  Model   Over  50  Cassandra  Clusters   Over  500  nodes   Over  30TB  of  daily  backups   Biggest  cluster  72  nodes   1  cluster  over  250Kwrites/s  
  • 33. Astyanax  -­‐  Cassandra  Write  Data  Flows   Single  Region,  MulMple  Availability  Zone,  Token  Aware   Cassandra   • Disks   • Zone  A   1.  Client  Writes  to  local   Cassandra  3   2   Cassandra   If  a  node  goes  offline,   coordinator   • Disks   4 3  Disks   4   •  hinted  handoff   2.  Coodinator  writes  to   • Zone  C   1 • Zone  B   completes  the  write   2   other  zones   3.  Nodes  return  ack   Token   when  the  node  comes   back  up.   4.  Data  wri;en  to   Aware     internal  commit  log   Clients   Requests  can  choose  to   disks  (no  more  than   Cassandra   Cassandra   wait  for  one  node,  a   10  seconds  later)   • Disks   • Disks   quorum,  or  all  nodes  to   • Zone  B   • Zone  C   ack  the  write   3     Cassandra   SSTable  disk  writes  and   • Disks   4   compacMons  occur   • Zone  A   asynchronously  
  • 34. Data  Flows  for  MulM-­‐Region  Writes   Token  Aware,  Consistency  Level  =  Local  Quorum   1.  Client  writes  to  local  replicas   If  a  node  or  region  goes  offline,  hinted  handoff   2.  Local  write  acks  returned  to   completes  the  write  when  the  node  comes  back  up.   Client  which  conMnues  when   Nightly  global  compare  and  repair  jobs  ensure   2  of  3  local  nodes  are   everything  stays  consistent.   commi;ed   3.  Local  coordinator  writes  to   remote  coordinator.     Cassandra   100+ms  latency   4.  When  data  arrives,  remote   Cassandra   •  Disks   •  Disks   •  Zone  A   •  Zone  A   coordinator  node  acks  and   Cassandra   2   2   Cassandra   Cassandra   4   Cassandra   6   6   3   5   Disks  6   copies  to  other  remote  zones   6   •  Disks   •  Disks   •  Zone  C   •  Zone  B   •  •  Zone  C   4  Disks  B   •  •  Zone   1   4   5.  Remote  nodes  ack  to  local   US   EU   coordinator   Clients   Clients   Cassandra   2   Cassandra   Cassandra   Cassandra   6.  Data  flushed  to  internal   •  Disks   •  Zone  B   •  Disks   6   •  Zone  C   •  Disks   •  Zone  B   •  Disks   •  Zone  C   commit  log  disks  (no  more   Cassandra   6   5   Cassandra   than  10  seconds  later)   •  Disks   •  Disks   •  Zone  A   •  Zone  A  
  • 35. ETL  for  Cassandra   •  Data  is  de-­‐normalized  over  many  clusters!   •  Too  many  to  restore  from  backups  for  ETL   •  SoluMon  –  read  backup  files  using  Hadoop   •  Aegisthus   –  h;p://techblog.ne6lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html   –  High  throughput  raw  SSTable  processing   –  Re-­‐normalizes  many  clusters  to  a  consistent  view   –  Extract,  Transform,  then  Load  into  Teradata  
  • 37. Cloud  Deployment  Scalability   New  Autoscaled  AMI  –  zero  to  500  instances  from  21:38:52  -­‐  21:46:32,  7m40s   Scaled  up  and  down  over  a  few  days,  total  2176  instance  launches,  m2.2xlarge  (4  core  34GB)     Min. 1st Qu. Median Mean 3rd Qu. Max. ! 41.0 104.2 149.0 171.8 215.8 562.0!
  • 38. Scalability  from  48  to  288  nodes  on  AWS   h;p://techblog.ne6lix.com/2011/11/benchmarking-­‐cassandra-­‐scalability-­‐on.html   Client  Writes/s  by  node  count  –  Replica4on  Factor  =  3   1200000   1099837   1000000   800000   Used  288  of  m1.xlarge   4  CPU,  15  GB  RAM,  8  ECU   600000   537172   Cassandra  0.86   Benchmark  config  only   400000   366828   existed  for  about  1hr   200000   174373   0   0   50   100   150   200   250   300   350  
  • 39. Cassandra  on  AWS   The  Past   The  Future   •  Instance:  m2.4xlarge   •  Instance:  hi1.4xlarge   •  Storage:  2  drives,  1.7TB   •  Storage:  2  SSD  volumes,  2TB   •  CPU:  8  Cores,  26  ECU   •  CPU:  8  HT  cores,  35  ECU   •  RAM:  68GB   •  RAM:  64GB   •  Network:  1Gbit   •  Network:  10Gbit   •  IOPS:  ~500   •  IOPS:  ~100,000   •  Throughput:  ~100Mbyte/s   •  Throughput:  ~1Gbyte/s   •  Cost:  $1.80/hr   •  Cost:  $3.10/hr  
  • 40. Cassandra  Disk  vs.  SSD  Benchmark   Same  Throughput,  Lower  Latency,  Half  Cost  
  • 42. Chaos  Monkey   h;p://techblog.ne6lix.com/2012/07/chaos-­‐monkey-­‐released-­‐into-­‐wild.html   •  Computers  (Datacenter  or  AWS)  randomly  die   –  Fact  of  life,  but  too  infrequent  to  test  resiliency   •  Test  to  make  sure  systems  are  resilient   –  Allow  any  instance  to  fail  without  customer  impact   •  Chaos  Monkey  hours   –  Monday-­‐Friday  9am-­‐3pm  random  instance  kill   •  ApplicaMon  configuraMon  opMon   –  Apps  now  have  to  opt-­‐out  from  Chaos  Monkey  
  • 43. Responsibility  and  Experience   •  Make  developers  responsible  for  failures   –  Then  they  learn  and  write  code  that  doesn’t  fail   •  Use  Incident  Reviews  to  find  gaps  to  fix   –  Make  sure  its  not  about  finding  “who  to  blame”   •  Keep  Mmeouts  short,  fail  fast   –  Don’t  let  cascading  Mmeouts  stack  up   •  Make  configuraMon  opMons  dynamic   –  You  don’t  want  to  push  code  to  tweak  an  opMon  
  • 44. Resilient  Design  –  Circuit  Breakers   h;p://techblog.ne6lix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html  
  • 45. Distributed  OperaMonal  Model   •  Developers   –  Provision  and  run  their  own  code  in  producMon   –  Take  turns  to  be  on  call  if  it  breaks  (pagerduty)   –  Configure  autoscalers  to  handle  capacity  needs   •  DevOps  and  PaaS  (aka  NoOps)   –  DevOps  is  used  to  build  and  run  the  PaaS   –  PaaS  constrains  Dev  to  use  automaMon  instead   –  PaaS  puts  more  responsibility  on  Dev,  with  tools  
  • 47. UnconvenMonal  Culture   See  culture  deck  at  h;p://jobs.ne6lix.com   •  Brave/Aggressive  from  the  top  down   •  Focus  on  talent  density  above  everything   •  Reduce  process,  remove  complexity   •  Freedom  and  Responsibility   •  One  product  focus  for  the  whole  company   •  (almost)  full  informaMon  sharing  across  co.   •  Simplified  managers  role  
  • 48. Managers  Role   •  Hiring,  Architecture,  Project  Management   •  No  vacaMon  policy  to  track   •  (Almost)  no  remote  employees  or  contractors   •  No  bonuses  to  allocate   •  No  expenses  to  approve   •  Pay  mark  to  market  handled  at  VP  level  
  • 49. Ne6lix  OrganizaMon   DevOps  Org  ReporMng  into  Product  Group,  not  ITops   CEO  –  Reed  HasMngs   CPO  –  Chief  Product  Officer  –  Neil  Hunt   VP  -­‐  Cloud  and  Pla6orm  Engineering  -­‐  Yury   Pla6orm  and   Cloud  Ops   PersonalizaMon   Persistence   Reliability   Pla6orm  and   Membership  and   Data  Science   Architecture   Cloud  SoluMons   Billing   Pla6orm   Engineering   Engineering   Performance  Eng   Future  planning   Base  Pla6orm   Monitoring   Metadata   Alert  RouMng   Data  sources   Business   Security  Arch   Zookeeper   Monkeys   Benchmarking   Intelligence   Incident  Lifecycle   Vault  processing   Efficiency   Cassandra  Ops   Build  Tools   Memcached   AWS  VPC   AWS  Instances   Hyperguard   AWS  Instances   PagerDuty   AWS  Instances   Cassandra   Hadoop  on  EMR   AWS  API   Powerpoint  J  
  • 50. Build  Your  Own  PaaS  
  • 51. Components   •  ConMnuous  build  framework  turns  code  into  AMIs   •  AWS  accounts  for  test,  producMon,  etc.   •  Cloud  access  gateway   •  Service  registry   •  ConfiguraMon  properMes  service   •  Persistence  services   •  Monitoring,  alert  forwarding   •  Backups,  archives  
  • 52. Ne6lix  Open  Source  Strategy   •  Release  PaaS  Components  git-­‐by-­‐git   –  Source  at  github.com/ne6lix  –  we  build  from  it…   –  Intros  and  techniques  at  techblog.ne6lix.com   –  Blog  post  or  new  code  every  few  weeks   •  MoMvaMons   –  Give  back  to  Apache  licensed  OSS  community   –  MoMvate,  retain,  hire  top  engineers   –  “Peer  pressure”  code  cleanup,  external  contribuMons  
  • 53. Instance  creaMon   Bakery  &   Build  tools   Asgard   Base  AMI   Instance   Autoscaling   ApplicaMon   Odin   scripts   Code   Image  baked   ASG  /  Instance  started   Instance  Running  
  • 54. ApplicaMon  Launch   Governator   Eureka   (Guice)   Async   logging   Archaius   Entrypoints   Servo   Registering,   ApplicaMon  iniMalizing   configuraMon  
  • 55. RunMme   Astyanax   Priam   Curator   Chaos  Monkey   Latency  Monkey   NIWS   Exhibitor   LB   Janitor  Monkey   REST   Cass  JMeter   Dependency   client   Command   Explorers   Calling  other   Managing   Resiliency  aids   services   service  
  • 56. Open  Source  Projects   Legend   Github  /  Techblog   Priam   Exhibitor   Servo  and  Autoscaling  Scripts   Apache  ContribuMons   Cassandra  as  a  Service   Zookeeper  as  a  Service   Astyanax   Curator   Honu   Techblog  Post   Cassandra  client  for  Java   Zookeeper  Pa;erns   Log4j  streaming  to  Hadoop   Coming  Soon   CassJMeter   EVCache   Circuit  Breaker   Cassandra  test  suite   Memcached  as  a  Service   Robust  service  pa;ern   Cassandra  MulM-­‐region  EC2   Eureka  /  Discovery   Asgard  -­‐  AutoScaleGroup   datastore  support   Service  Directory   based  AWS  console   Aegisthus   Archaius   Chaos  Monkey   Hadoop  ETL  for  Cassandra   Dynamics  ProperMes  Service   Robustness  verificaMon   Explorers   EntryPoints   Latency  Monkey   Governator  -­‐  Library  lifecycle   Server-­‐side  latency/error   and  dependency  injecMon   injecMon   Janitor  Monkey   Odin   REST  Client  +  mid-­‐Mer  LB   Bakeries  and  AMI   Workflow  orchestraMon   Async  logging   ConfiguraMon  REST  endpoints   Build  dynaslaves  
  • 57. Roadmap  for  2012   •  More  resiliency  and  improved  availability   •  More  automaMon,  orchestraMon   •  “Hardening”  the  pla6orm,  code  clean-­‐up   •  Lower  latency  for  web  services  and  devices   •  IPv6  –  now  running  in  prod,  rollout  in  process   •  More  open  sourced  components   •  See  you  at  AWS  Re:Invent  in  November…  
  • 58. Takeaway     Ne?lix  has  built  and  deployed  a  scalable  global  Pla?orm  as  a  Service.     Key  components  of  the  Ne?lix  PaaS  are  being  released  as  Open  Source   projects  so  you  can  build  your  own  custom  PaaS.     h;p://github.com/Ne6lix   h;p://techblog.ne6lix.com   h;p://slideshare.net/Ne6lix     h;p://www.linkedin.com/in/adriancockcro3     @adrianco  #ne6lixcloud  
  • 59. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaMon  code)   •  EC2  –  ElasMc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraMons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosMng  cloud  instances   –  Region  –  group  of  Avail  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan,  SA-­‐Brazil,  US-­‐Gov   •  ASG  –  Auto  Scaling  Group  (instances  booMng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (h;p  access)   •  EBS  –  ElasMc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDS  –  RelaMonal  Database  Service  (managed  MySQL  master  and  slaves)   •  DynamoDB/SDB  –  Simple  Data  Base  (hosted  h;p  based  NoSQL  datastore,  DynamoDB  replaces  SDB)   •  SQS  –  Simple  Queue  Service  (h;p  based  message  queue)   •  SNS  –  Simple  NoMficaMon  Service  (h;p  and  email  based  topics  and  messages)   •  EMR  –  ElasMc  Map  Reduce  (automaMcally  managed  Hadoop  cluster)   •  ELB  –  ElasMc  Load  Balancer   •  EIP  –  ElasMc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (single  tenant,  more  flexible  network  and  security  constructs)   •  DirectConnect  –  secure  pipe  from  AWS  VPC  to  external  datacenter   •  IAM  –  IdenMty  and  Access  Management  (fine  grain  role  based  security  keys)