Infrastructure Considerations : Design : "webops"

"Preparing for the future"

By : ~/Piyush

 5+ years experience designing, setting
up, testing & running production web systems in
varied deployment environments
 Experience setting up colocation IDCs with
Active-Active DR sites for India’s No. 1 OTA
 Experience working on public cloud platforms
like AWS and setting up private cloud
infrastructure
 …Generation G : Gamification /engineer/ 
 Tags: techie, open source
enthusiast, engineer, geek, DevOps, web
ops, security , Tripper(MMYT),Ex-Nextag-ian :)

 Scalable
 Robust and Always Available
 Manageable
 Resilience
 Operationally Visible (Monitor Everything)
 Cost effective

 Avoid unnecessary change by selecting a
long-term supported distribution on which to
base your platform.
◦ RHEL / CentOS
◦ Ubuntu LTS (Long Term Support)
◦ Debian Stable

My preference:-
RHEL / CentOS (Red Hat Stability & yum wins)

 Use your capacity model to drive a decision
on how you build infrastructure : Check SLAs
& Cost constraints
◦ 100% dedicated hardware (Self Managed /
Outsourced)
◦ 100% cloud (May consider AWS /or Rackspace)
◦ Hybrid
 Cloud success relies on “automating” key
service management processes to optimize
the run-time operation of /dynamic
workloads/ in a shared-resource
environment.

 Split each service(/layer) out across its own
set of servers for easier scale-out and
management.
◦ Traffic Management / (both Global Traffic & Local
traffic management)
◦ Application Servers
◦ Data Store Servers
◦ Email Services
◦ + Minimize Distribution of State:-
 Keep services that require storage to a minimum, for
ease of backups and management - like Data Services
(backups)

 Use redundant pairs(on devices/appliances)
, /HA/ & clustering or failover to ensure
availability of service(s).
◦ Minimum down-time.
◦ Application & services redundancy + Load Balanced
cluster on one site & DR too
◦ DB HA+ Data Store(MySQL) Backup and Recovery
◦ Choose and implement best suited Failover strategy
◦ Redundant Network on each node (+ on Server:
Linux NIC bond)

◦ Dev , QA and staging platforms (both application &
N/W platform) to prove application and
configuration changes before they go live into
production.
◦ Most of the Live site issues are due to lack of
similar configuration environment / platform for
Dev / QA / Staging Testing.
◦ LAB Env:-
 Performance/Stress LAB
 Experimentation LAB (A/B or Multivariate experiment)
support with Live traffic

 Virtualization is key here :) ...actually this is
changing world ...not the cloud !!
 + Selecting the Right Virtualization
Technology
 Use network boot and installer tools; or
templated provisioning to build servers
identically
◦ PXE Boot + Kickstart
◦ VMWare ESXi Template /Citrix Xenserver
◦ Amazon AMI (EC2)
◦ OpenNebula

 Package Management - YUM repositories
(Distribution + Own)
 Create you own Repository servers for
packages + Code both
 Use configuration management tools to
deploy configuration automatically from a
central location.
◦ Puppet / Facter
◦ Chef
◦ CFEngine (Nova)
◦ RANCID (N/w Devices)

 Use a central service for identity and
password management
◦ OpenLDAP
◦ Active Directory
◦ TACACS+ (N/w devices)
 Have proper accounting/audit Logging

 Inventory Management :
◦ Use facter facts + CMDB based Inventory
Management

◦ Version Control:-
 SVN / GIT
◦ Use continuous integration and deployment tools to
test and release software
 Jenkins (Hudson) / Go
 Capistrano / Fabric
◦ ....Deploy more frequently ...so as to build
confidence in the whole system for change
management

 Starting from Site Availability Checks &
External Dependencies Checks to much more
detailed data to Capture as much data as
possible.
 Store time-series data for trend analysis, and
alert when thresholds are breached.
◦ CPU / RAM / IO / Network usage per server
◦ Application metrics
◦ Disc space usage
◦ Network bandwidth
◦ MySQL numbers
◦ ...etc

 So, source could be anything starting from
DB, logs, SNMP, http etc
 + have Real time reporting over it
(Dashboards)
 + Real time data extraction
 Tools to consider:
◦ Ganglia / Centreon / Nagios
◦ OpManager for URL monitoring
◦ Selenium RC based checks (Functional tests) etc
 Alerting on both Minimum/Maximum
Thresholds (OK, WARN, CRITICAL)!

 Continue to plan your resource requirements
based on growth expectations, new features
and performance targets
 Use data from:
◦ Your monitoring system!
◦ Business requirements
 Continuously Improve:
◦ Profile applications and reduce resource usage
(Dtrace)
◦ Review performance against capacity model
◦ Feed a “Top 10” hitlist back to developers may be
slow queries etc

 Varnish cache
◦ Reverse proxy, flexible configuration with inline C
support
 Nginx
◦ Event based / Lightweight
◦ Runs more than 8% of the web
 PHP-FPM
◦ Best FastCGI implementation available for PHP
 MySQL Server tuning / optimization
 Caching:- In memory data store -
Memcached / Redis

 As a first exercise - do have a IT Infrastructure &
Application Threat Modeling done along with
Risk Assessment then…..consider having
◦ HIDS (OSSEC) /IPTABLES
◦ WAF (Web Application Firewall)
◦ IPS (Intrusion prevention system)
◦ Linux Hardening
◦ DLP (Data Leakage Prevention)
◦ Data Encryption considerations wrt Data Classification
 Security Monitoring & Attack Detection
 Key thing is to "Enable continuous compliance"
...maybe PCI-DSS for an e-comm.

 Diagnosing / Troubleshooting and Fixing
production issues
 Change Management and Delivery
 Automate as much as possible with centralized
management of Scripting etc
 Backup/restore : Always do test drills for them
 Don’t re-invent the wheel & try to Go with proven
and solid technologies when you can
 Last :) Keep-on Re-architecting the infrastructure
(may be small things) to optimize efficiency
(every 6 months) ...learn from mistakes (yours/
others too :))

Questions if Any !! 

Ping Me on:-

IRC /freenode/ : PiyushK ##infra-talk
Gtalk: piykumar
Twitter @piykumar

Infrastructure Considerations : Design : "webops"

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Infrastructure Considerations : Design : "webops"

Similaire à Infrastructure Considerations : Design : "webops" (20)

Plus de Piyush Kumar

Plus de Piyush Kumar (6)

Dernier

Dernier (20)

Infrastructure Considerations : Design : "webops"