SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
A private cloud platform
based on CloudFoundry
TRANSLATED VERSION
@Weiyu Wang(王炜煜),Operations Department @Baidu
weibo.com/wwy1640
2013-7-19
Outline	

Background and Objectives
Practice and Reform(Part 1、2)
Processes and Standard
Reform operations
Future plans
TRANSLATED VERSION
1. Background and Objectives
TRANSLATED VERSION
Operation and PaaS	

Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
OP(SRE),operation
PaaS (and IaaS)
TRANSLATED VERSION
Objectives	

Automation
Business life cycle management,for example, modification 、
monitor、fault handling and so on.
Resource utilization is elastic.
Standardization
Flow
Instance standard
System environment、runtime、framework
Unification
Integrate the third-party service,for example DB、Cache、log、FS
and so on.
Linkage with other system platform
TRANSLATED VERSION
Why CloudFoundry ?
Automation
Standard Unification
Machine Management
(The downstream department)
Automation
StandardUnification
TRANSLATED VERSION
Why CF ?	

Automation
Unification
Standard
TRANSLATED VERSION
2. Practice and Reform(Part1)
Java,base on cf 1.0
TRANSLATED VERSION
Java Apps	

•  Number of Product Categories >100
•  APP >200
•  Instances>2000
•  Average single-instance 10G(Memory)
•  Average Daily total pv > 1billion
•  The numbers of developers and testers for APP > 700
•  Tomcat5/6/7、jdk1.5/1.6、Standalone
TRANSLATED VERSION
Implementation and Preparation	

•  Relevant modification based on CentOS
ü  Deploy each CF component independently
⁺  Analyze BOSH、chef,implementation based on physical machine
ü  OS environment initialization
⁺  apt-get is changed to yum
ü  Ubuntu-cmd to CentOS
⁺  DEA(v1.0),agent.rb、secure.rb
yum install -y make gcc gcc-c++ kernel-devel.x86_64 openssl-devel.x86_64 libxml2.x86_64 libxml2-
devel.x86_64 libxslt.x86_64 libxslt-devel.x86_64 git.x86_64 sqlite.x86_64 ruby-sqlite3.x86_64 sqlite-
devel.x86_64 unzip.x86_64 zip.x86_64 ruby-devel.x86_64 ruby-mysql.x86_64 mysql-devel.x86_64 curl-
devel.x86_64 postgresql-libs.x86_64 postgresql-devel.x86_64 zlib-devel.x86_64 readline-devel.x86_64
ImageMagick.x86_64 ImageMagick-devel.x86_64 php-magickwand.x86_64
TRANSLATED VERSION
Cluster capacity assessment	

•  Number of instances,NATS capacity assessment
ü  Number of instances hosted by single DEA(<100),the pressure to NATS-Server has little
effect
ü  Single NATS-Server can host 330 DEAs by a conservative estimate,The number of single
instance is 5~30.
ü  Multiple NATS-Server,extendable
Deplay
(ms)
Number of DEAs (10 ~ 340)
Number of Single DEA
instances(5 ~ 30)
Critical line
330 DEAs
TRANSLATED VERSION
In cluster, component redundant,
LB design	

•  NATS
ü  Cluster,multiple NATS, synchronous heartbeat
ü  Cache information from client side. If network is cut down,it
should keep to reconnect.
ü  Multiple NATS does load balance(Client > 0.5.beta.6)
NATS-Server1 NATS-Server2
NATS-Client
(caching message)
NATS-Server1/2,
Random list
TRANSLATED VERSION
Multiple cluster redundant design
•  Multiple independent cluster ,logic independent
ü  The first layer’s switch,modify DNS A record,for multiple domain names(CNAME to this A
record), they will uniformly switch to to different clusters
ü  The second layer’s switch,modify “interface layer”(For its application layer’s function ,it can be
simply understood as Nginx’s reverse proxy )
ü  Ensure App (stateless) capacity,or expand the capacity quickly to prevent overload when the
traffic switch back
Baidu GateWay
Front End
Router
A记录
Baidu GateWay
Front End
Router
app1 app1
CNAME(formal domain
name)
CNAME(formal domain
name)
www.baidu.com CNAME www.a.shifen.com.
www.baidu.cn CNAME www.a.shifen.com.
www.a.shifen.com. A 119.75.218.77
www.a.shifen.com. A 119.75.217.56
TRANSLATED VERSION
Core components, distributed	

Router_1
NATS_1
Router
NATS
CC
HM
Stager
DEA
PG_DB
Redis
TRANSLATED VERSION
Framework(cf1.0)	

DEA
Logging
Name
Service
Monitoring
jvm
Stager
File
Persistence
HM
Router
CC
Baidu GateWay / Front End
jvm jvm
API Bridge
UAA
jvm
jvm jvm jvm jvm
Router(Cluster 02)
N
A
T
S
DB
TRANSLATED VERSION
New features	

•  Support RPC, Single instance with multiple
ports
ü  One instance will open multiple ports,and provide API to search the
IP ,ports in real time
ü  Linkage with “name service”,synchronize dynamic IP/port’s
relationship with name.
ü  RPC caller will connect the instance directly according to name
TRANSLATED VERSION
DEA server
Support RPC、
Single instance with multiple ports 	

Instance01:port
Instance02:port
API Bridge
NS
server
TXT record
ip:port
ip:port
RPC caller
NS client
Domain
ip:port
ip:port
ip_local_port_range
10000 ~ 60000
Port pool(There is freeze
period after allocation)
61000 ~ 65000
TRANSLATED VERSION
New features	

•  Support JMX
ü  API to search the IP and Jconsole port in real time, then implement to
collect JMX data in real time.
TRANSLATED VERSION
DEA
Support JMX
	

Instance01: Jconsole 端口
Instance02: Jconsole 端口
{
"instances": [
{
"index": 0,
"state": "RUNNING",
"since": 438249600,
"jconsole_ip": "10.1.1.1",
"jconsole_port": 61111
},
{
"index": 1,
"state": "RUNNING",
"since": 438249600,
"jconsole_ip": "10.1.1.1",
"jconsole_port": 62222
}
Monitoring Metrics
CpuUseRateDaemonThreadCount
MemPool_OldGen_UseRate
NonHeapMemoryUsage_used
TotalCompilationTime
TotalPeakThreadCount
TotalStartedThreadCount
UnloadedClassCount
GC_Major_Frequency GC_Major_Time
… …
Stager:
java 
-Dcom.sun.management.jmxremote.port={VCAP_JCONSOLE_PORT}
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
TRANSLATED VERSION
New features	

•  Enhancement to health monitor
ü  Seven layers’ detection
ü  Number of file handler detection
TRANSLATED VERSION
DEA Server
DEA agent.rb
Health Manger
instance
http
avali
abili
ty
instance
CPU MEM DISK ……
report
Enhancement to health monitor
	

hand
ler
TRANSLATED VERSION
DEA(v1.0), logical enhancement	

•  Ports Management
ü  Description
⁺  Single DEA, multiple instance,parallel to assign and start the port,there is no
critical line,but there is the port competition issue
ü  Solution
⁺  Reference DEA(v2.0)’s logic(Notes: it’s DEA_NG, not compatible with CF1.0)
⁺  Define ip_local_port_range as 10000~61000,it is dynamic ports’ range
⁺  Make 61001~65000 as DEA scheduling assigned ports
⁺  For assigned port,add “[release time、port num]” data structure
⁺  It resolve the port competition by delaying to release the port
ü  Note
⁺  CF2.0 has resolved this problem by the same method above.
TRANSLATED VERSION
DEA(v1.0),logical enhancement 	

•  Instance resource information management
ü  Description
⁺  Du command takes long time to calculate the disk space, as a result, the
following commands’ calculation is not consistent
⁺  When calculate the CPU utilization, it doesn’t consider the number of cores
ü  Solution
⁺  Adjust the related command’s order
⁺  When calculate the CPU utilization, it should be divided by the number of
cores
ü  Notes
⁺  CF2.0 has resolved this problem.
TRANSLATED VERSION
New features
(Linkage with peripheral system)	

•  File persistent
ü  Use MFS(Moose File System)
ü  DEA deply MFS-Client and mount /mfs/path to let instance use
ü  MFS service provide the HTTP interface to get the data
•  Route based on URL,distinguish APP
ü  foo.baidu.com/app1 à app1.foo.baidu.com
ü  foo.baidu.com/app2 à app2.foo.baidu.com
•  Monitor linkage
ü  APP’s life cycle,to interact with external monitor system’s API, to implement the
monitor item’s automatic modification.
•  The SDK
ü  Automatic release(encapsulate vmc)
ü  View file
TRANSLATED VERSION
Summary of key reform point(CF V1.0
•  Relevant reform based on CentOS
•  NATS-Cluster usage、NATS-Client retry and cache
•  Support RPC、single instance with multiple ports
•  Support dynamic JMX、Jconsole
•  Enhance the health monitor
•  Ports management
•  Instance resource information management
•  Peripheral component:File persistent、Monitor linkage、
URI Route、The SDK
TRANSLATED VERSION
2. Practice and Reform(Part2)
C/C++,base on cf 2.0
TRANSLATED VERSION
Several key problems of C/C++ Apps	

•  Container’s runtime is isolated with resource
ü  Kernel/GNU
ü  Resource isolation
ü  Snapshot,Core Dump
•  Single instance, multiple processes
ü  Health monitor
ü  The order of processes’ execution
ü  Communication within instance and among process
ü  Multiple ports
ü  The isomorphism of multiple instances
TRANSLATED VERSION
Several key problems of C/C++ Apps	

•  Big instance
ü  Big instance number(100 thousands)
ü  Large amount of data(single instance,2TB)
ü  High memory usage(single instance,100G)
ü  Long start time(30mins)
ü  Large flow(single instance,daily total PV2 hundred million)
ü  When drift,to prevent insufficient resources
•  APP communication
ü  Network layer communication,authorization、flow control
ü  Output file,need to get from outside
ü  Input file,need to push from outside
ü  RPC,none-HTTP protocol,not containing PATH info,can’t route
TRANSLATED VERSION
Instance’s OS-Level
environment preparation	

•  Container’s runtime environment
ü Kernel is consistent with host machine
ü Make Container’s file environment
warden/warden/root/linux/rootfs/setup.sh
if grep -q -i centos /etc/issue
then
exec $(dirname $0)/centos.sh $@
fi
TRANSLATED VERSION
Relationship between Container
and host machine	

Warden
Networking,Bridge / NAT / Firewall / FlowControl
DEA
init─┬─xxx
├─xxx─xxx
├─xxx
mount r usr/ lib/ etc/
mount rw xxx/
network interface(sub net)
Cgroup – CPU / MEM
Name space
init─┬─xxx
├─xxx─xxx
├─xxx
mount r usr/ lib/ etc/
mount rw xxx/
network interface(sub net)
Cgroup – CPU / MEM
Name space
TRANSLATED VERSION
Package management	

•  Buildpack API
ü  detect , check
ü  Compile,environment preparation
⁺  Directory structure
⁺  Program files,and relevant supporting program
⁺  Startup script, and ensure the startup order of process …
⁺  Monitor script,it can periodically execute and check the whole instance’s health
ü  Release,information to publish
ü  Procfile,parameter passing(e.g. port)
ü  .profile.d,environment variable
TRANSLATED VERSION
Point to enhance health monitor	

•  Self-defined monitor scripts
ü  self-defined monitor scripts, which is published together with instance and periodically
to modify the content of stat_file
ü  DEA will check the stat_file periodically
Instance stat_file
monitor.sh
process-1
process-2
DEA
HM
TRANSLATED VERSION
Reform to APP	

•  For RPC,support NS Client
ü  Dynamic configuration file to replace route
ü  Port management,freeze time
•  Input/Output file
ü  Input file need to get from outside actively
ü  Output file,pushed to the transit(e.g. cloud storage ),or service based on NS
•  Multiple process management, startup scripts
ü  Multiple processes,to control their startup order
ü  Process control
•  File persistent
ü  Remote log
ü  Use the cloud storage
TRANSLATED VERSION
Framework(CF2.0)	

DEA
Logging
Name
Service
Monitoring
File
Persistence
HM
gorouter(RPC,not applicable)
CC
Baidu GateWay / Front End
API Bridge
UAA
(Cluster 02)
N
A
T
S
Container
process-1
process-2
Warden
NS Client
Container
process-1
process-2
Container
process-1
process-2
DB
TRANSLATED VERSION
Reform Summary(cf v2.0)	

•  Relevant reform based on the CentOS
•  Container’s environment order
•  Buildpack’s order
•  Support RPC, single instance, multiple ports
•  Enhance the health monitor
•  Peripheral: file persistent, monitor linkage, URI Route, SDK
TRANSLATED VERSION
3. Processes and Standard
TRANSLATED VERSION
Working Process Description	

Review
•  Standard
•  Capacity
•  SLA
Access
•  Org
relationship
•  Name info
•  Operation info
Process
approval
•  Authorizatio
n apply
•  Name apply
•  Release opt
Release
update
•  PreRelea
se
•  Gray
scale
•  Rollback
Failure
handling
•  availabi
y
•  Security
•  Issue
mgmt
TRANSLATED VERSION
Standard and Capacity Example	

•  Standard information collection
ü  App related name, related interface people(R&D, QA, operation,
related manager, and so on)
ü  Runtime is isolated with container’s version
ü  Stateless, RPC, URI Route
ü  Dynamic and static files are isolated
ü  File persistence
•  Capacity information collection
ü  PV、QPS
ü  Single instance’s CPU, memory, disk, bandwidth, restarting time
ü  Number of instances
TRANSLATED VERSION
SLA examples	

•  Service object
ü  Java Application(“APP” for short in the following)
ü  APP that conforms to the standard
•  Servicing time
ü  24×365 all year round
•  Way to communication
ü  Mail、Tel、interface people information
•  Stability related indicators
ü  Core components,availability >99.99%(by month),MTTR<20mins,
MTBF>5days
ü  Control services,availability >99.95%(the whole year)
ü  APP’s self SLA, it won’t cause bad effect because of platform its self.
ü  Notes:APP’s self problem,beyond the scope of SLA,for example,
bug, capacity forest error, external system’s failure(e.g. DB, Cache) and so on
TRANSLATED VERSION
Organization, Layer	

• Product line(Org)
• Module(Space)
• Group(APP)
• Version (APP-*)
Product line -2
Product line-1 (Org)
Module-2
Module-1 (Space)
Group-1(A)
Group-2(B)
实例,版本-1
(APP-1-1)
实例,版本-2
(APP-1-2)
实例,版本-1
(APP-2-1)
实例,版本-2
(APP-2-2)
Instance,v1
(A-1)
Instance,V2
(A-2)
Instance,v1
(B-1)
Instance,V2
(B-2)It is one APP,but multiple
instances in the dashed frame.
TRANSLATED VERSION
Further encapsulation to CC	

Product line(Org) OrgName
Module(Space) OrgName_SpaceName
Module group OrgName_SpaceName_GroupTag
Module version OrgName_SpaceName_GroupTag_VersionTag
Instance(Unique id) OrgName_SpaceName_GroupTag_VersionTag_Index
TRANSLATED VERSION
GroupTag、VersionTag	

• GroupTag
•  It can distinguish: configuration number、computer room、rack … from different dimension
• Version Tag
•  It can distinguish:program, data, configuration file and so on
•  Including: four version number, timestamp
• Instance full name,for example
•  Org_Space_GroupA_1-1-1-1-438249600_1
•  Org_Space_GroupB_1-1-1-1-438249600_1
TRANSLATED VERSION
Examination, approval and release	

•  Distribute form and approve
ü  APP information(program version, capacity information, related
instruction and so on)
ü  Approval(related manager, and the people who should know)
ü  Operator、Operating time
ü  Monitor information(Monitoring and controlling strategy、
Interface people and so on)
•  Start to distribute operation, and add
monitor
ü  Before release,related approval processes must pass
ü  Operator, program version, MD5、time information and so on,it
must keep consistent with approval
ü  It must be consistent and pass the processes,then it can
release
ü  After successful release, add the monitor
Distribute
form
Approval
Release
APP
Add Monitor
TRANSLATED VERSION
Pre-release, release, rollback	

app_v1
instance01app_v1.paas.baidu.com
app_v1
instance02
app_v2
instance01
app_v2
instance02
app_v3
instance01
app_v3
instance02app_v3.paas.baidu.com
app.baidu.com
Generic domain name, map/unmap, multiple versions of app
Ahead, Release
Retreat, roll back
Pre-release,Offline observation in inner network
TRANSLATED VERSION
Basic grays scale release	

app_v1
instance01app_v1.paas.baidu.com
app_v1
instance02
app_v2
instance01
app_v2
instance02
app_v3
instance01
app_v3
instance02
app.baidu.com
1、Make one formal domain name point to multiple apps at the same time
2、Adjust the proportion of many instances’ number,then adjust the
proportion of traffic.
app.baidu.com
app_v2
instance03
By adjusting the proportion
of the many instance’s
number, to adjust the
proportion of gray scale
traffic
TRANSLATED VERSION
“The path to sermon”,
The platform popularization 	

•  The medal, who own the other half ?
ü  Support app
⁺  New service needs to follow the PaaS related standard and thought
⁺  Old service,need R&D to reform and QA to do regression test
ü  Periphery support
⁺  DB, Cache, storage, interface, security, monitor and so on
•  Clear the benefits,establish the win-win ecosystem
ü  Deliver faster, save more resource, and make it more simple
ü  One-stop and all-in-all service,hand in hand to popularize
TRANSLATED VERSION
Some solutions:	

•  Give users(APP developers) noble imperial
enjoyment
ü  For important APP,do some specific service
ü  For important managers,it should have a set of complete, timely communication, such
as reports, etc
ü  The principle is “capitalism”, rather than “socialism”
•  Event “marketing”
ü  E.g. “struts2 0day”
⁺  Actively cooperate with R&D and QA to do the issues identification, repair and
implementation
⁺  Actively report the progress and do the event managment
⁺  Late,for this to actively promote and participate the discussion and make decision,
for example, security, and architecture group
⁺  The principle is “win-win”,rather than shirking the responsibility
TRANSLATED VERSION
4. Reform Operation
TRANSLATED VERSION
Reform operation	

“NoOps”
PaaS(and IaaS) overall functionality
>= Traditional operation work
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
OP(SRE),
operation
PaaS (and IaaS)
TRANSLATED VERSION
How to reform,Example	

• Automatic fault recovery
ü  Add the health monitor mechanism based on the
traditional monitoring
ü  Instance automatically restart and “drift”
ü  Reduce the traditional alarm and man power
⁺  It will only alarm, when automatic recovery fail
Monitor
Whole instance
name_1
ip:port
… …
Health
monitor
AP
I
… …
Real instance_1
ip:port
Instance after drifting_1
•  ”drift” is a normal phenomenon, it doesn’t alarm
•  It only need the alarm, when “drift” fail
•  It refinins to monitor instance,every time according t
name,detect and return ip:port
TRANSLATED VERSION
How to reform, Example	

•  More agile
ü  Make developer forget the servers, instead of resource oriented
ü  It has a complete configuration management and automatic deployment
function
ü  Release, pre-release, rollback, extremely simple, and it doesn’t need the
extra complex deployment tool
ü  Elastic extension, extremely simple
ü  Use Buildpack,implement cloud compiling and run directly
•  Experience of all in one and one-stop
ü  From distribute form, release and modify the monitor,the working process is
totally automatic
ü  Integrate the third-party service, unify the management entrance
TRANSLATED VERSION
5. Future plans
TRANSLATED VERSION
Future plans	

• Feedback to community
•  For private cloud function,try best to encapsulate the native components(based
CF2.0) , then make the new component open source
•  If affect the native components,try best to merge to the master branch
•  Write more document and tips, and actively to participate in communication
• Development orientation
•  For large application(big instance)related
•  Intelligent scheduling related
•  Information Security
•  Further continuous integration
•  UI
TRANSLATED VERSION
We are hiring !
@Weiyu Wang(王炜煜)
weibo.com/wwy1640
Thanks
TRANSLATED VERSION

Contenu connexe

Tendances

Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011
Accenture
 
C L113
C L113C L113
C L113
Novell
 
Perf stat windows
Perf stat windowsPerf stat windows
Perf stat windows
Accenture
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Community
 
An Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersAn Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ Servers
Quantel
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Tommy Lee
 

Tendances (20)

Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011
 
C L113
C L113C L113
C L113
 
Difference between cluster image package show-repository and system image get
Difference between cluster image package show-repository and system image getDifference between cluster image package show-repository and system image get
Difference between cluster image package show-repository and system image get
 
RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)RHCE (RED HAT CERTIFIED ENGINEERING)
RHCE (RED HAT CERTIFIED ENGINEERING)
 
Perf stat windows
Perf stat windowsPerf stat windows
Perf stat windows
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
An Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ ServersAn Integrated Asset Management Solution For Quantel sQ Servers
An Integrated Asset Management Solution For Quantel sQ Servers
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-CRUI
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-CRUI제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-CRUI
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-CRUI
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Barman (PostgreSql) manual
Barman (PostgreSql) manualBarman (PostgreSql) manual
Barman (PostgreSql) manual
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
DLM knowledge-sharing
DLM knowledge-sharingDLM knowledge-sharing
DLM knowledge-sharing
 
Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013
 
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...Avoiding Chaos:  Methodology for Managing Performance in a Shared Storage A...
Avoiding Chaos: Methodology for Managing Performance in a Shared Storage A...
 

En vedette

Soc st. seafering traders ch3 3 assessment
Soc st. seafering traders ch3 3 assessmentSoc st. seafering traders ch3 3 assessment
Soc st. seafering traders ch3 3 assessment
vickytg123
 
Iata codes
Iata codesIata codes
Iata codes
unit30
 
Regeneración Natural y Arificial
Regeneración Natural y ArificialRegeneración Natural y Arificial
Regeneración Natural y Arificial
Nombre Apellidos
 

En vedette (20)

The Church of Jesus Christ of Latter-day Saints Customer Presentation
The Church of Jesus Christ of Latter-day Saints Customer PresentationThe Church of Jesus Christ of Latter-day Saints Customer Presentation
The Church of Jesus Christ of Latter-day Saints Customer Presentation
 
A tradução especializada: Um motor de desenvolvimento
A tradução especializada: Um motor de desenvolvimentoA tradução especializada: Um motor de desenvolvimento
A tradução especializada: Um motor de desenvolvimento
 
St Valentine’S Day
St Valentine’S DaySt Valentine’S Day
St Valentine’S Day
 
Voiplegal 111107142756-phpapp01
Voiplegal 111107142756-phpapp01Voiplegal 111107142756-phpapp01
Voiplegal 111107142756-phpapp01
 
Create a Tagul World Cloud for your Blog
Create a Tagul World Cloud for your BlogCreate a Tagul World Cloud for your Blog
Create a Tagul World Cloud for your Blog
 
Compositional Techniques of Chiptune Music
Compositional Techniques of Chiptune MusicCompositional Techniques of Chiptune Music
Compositional Techniques of Chiptune Music
 
Capital vs revenue transactions
Capital vs revenue transactionsCapital vs revenue transactions
Capital vs revenue transactions
 
Soc st. seafering traders ch3 3 assessment
Soc st. seafering traders ch3 3 assessmentSoc st. seafering traders ch3 3 assessment
Soc st. seafering traders ch3 3 assessment
 
Trabalho modulo IV
Trabalho modulo IVTrabalho modulo IV
Trabalho modulo IV
 
BIBLIA CATOLICA, ANTIGUO TESTAMENTO, JUECES, PARTE 10 DE 47
BIBLIA CATOLICA, ANTIGUO TESTAMENTO, JUECES, PARTE 10 DE 47BIBLIA CATOLICA, ANTIGUO TESTAMENTO, JUECES, PARTE 10 DE 47
BIBLIA CATOLICA, ANTIGUO TESTAMENTO, JUECES, PARTE 10 DE 47
 
resposta do capitulo 15
resposta do capitulo 15resposta do capitulo 15
resposta do capitulo 15
 
The Task-based Teaching
The Task-based TeachingThe Task-based Teaching
The Task-based Teaching
 
Iata codes
Iata codesIata codes
Iata codes
 
How to live stream your event on YouTube using wirecast.
How to live stream your event on YouTube using wirecast. How to live stream your event on YouTube using wirecast.
How to live stream your event on YouTube using wirecast.
 
como o cerebro aprende
como o cerebro aprendecomo o cerebro aprende
como o cerebro aprende
 
Anatomy of Female Pelvic Slideshow (in Malay)
Anatomy of Female Pelvic Slideshow (in Malay)Anatomy of Female Pelvic Slideshow (in Malay)
Anatomy of Female Pelvic Slideshow (in Malay)
 
Maglev
MaglevMaglev
Maglev
 
Regeneración Natural y Arificial
Regeneración Natural y ArificialRegeneración Natural y Arificial
Regeneración Natural y Arificial
 
Industrial disputes settlement machinery
Industrial disputes settlement machineryIndustrial disputes settlement machinery
Industrial disputes settlement machinery
 
Infosys Financial Analysis
Infosys Financial AnalysisInfosys Financial Analysis
Infosys Financial Analysis
 

Similaire à Baidu cloudfoundry english

WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
FlowER Erlang Openflow Controller
FlowER Erlang Openflow ControllerFlowER Erlang Openflow Controller
FlowER Erlang Openflow Controller
Holger Winkelmann
 

Similaire à Baidu cloudfoundry english (20)

Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
RAC - Test
RAC - TestRAC - Test
RAC - Test
 
Sap basis administrator user guide
Sap basis administrator   user guideSap basis administrator   user guide
Sap basis administrator user guide
 
Eranea's solution and technology for mainframe migration / transformation : d...
Eranea's solution and technology for mainframe migration / transformation : d...Eranea's solution and technology for mainframe migration / transformation : d...
Eranea's solution and technology for mainframe migration / transformation : d...
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
SUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSUSUSE Expert Days 2017 FUJITSU
SUSE Expert Days 2017 FUJITSU
 
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and PerformanceVMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
 
C Cure Users Group Presentation Final 4
C Cure Users Group Presentation Final 4C Cure Users Group Presentation Final 4
C Cure Users Group Presentation Final 4
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
PROSE
PROSEPROSE
PROSE
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?Oracle Drivers configuration for High Availability, is it a developer's job?
Oracle Drivers configuration for High Availability, is it a developer's job?
 
FlowER Erlang Openflow Controller
FlowER Erlang Openflow ControllerFlowER Erlang Openflow Controller
FlowER Erlang Openflow Controller
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 

Plus de James Watters

Plus de James Watters (14)

James Watters Kafka Summit NYC 2019 Keynote
James Watters Kafka Summit NYC 2019 KeynoteJames Watters Kafka Summit NYC 2019 Keynote
James Watters Kafka Summit NYC 2019 Keynote
 
Dr. Denner opening keynote at Bosch Connected World
Dr. Denner opening keynote at Bosch Connected World Dr. Denner opening keynote at Bosch Connected World
Dr. Denner opening keynote at Bosch Connected World
 
"The Cloud Native Enterprise is Coming"
"The Cloud Native Enterprise is Coming" "The Cloud Native Enterprise is Coming"
"The Cloud Native Enterprise is Coming"
 
Cloud foundry, Lessons Learned at The Home Depot
Cloud foundry, Lessons Learned at The Home Depot Cloud foundry, Lessons Learned at The Home Depot
Cloud foundry, Lessons Learned at The Home Depot
 
VMworld_PivotalCF_And_Containers
VMworld_PivotalCF_And_Containers VMworld_PivotalCF_And_Containers
VMworld_PivotalCF_And_Containers
 
Pivotal CF in 2 slides
Pivotal CF in 2 slides Pivotal CF in 2 slides
Pivotal CF in 2 slides
 
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
Ahead conference keynote deck, The Journey to Enterprise PaaS with Cloud Foun...
 
SV Cloud Meetup
SV Cloud MeetupSV Cloud Meetup
SV Cloud Meetup
 
Apachecon 2014 Keynote: The Apache Way in the Cloud with Cloud Foundry
Apachecon 2014 Keynote: The Apache Way in the Cloud with Cloud Foundry Apachecon 2014 Keynote: The Apache Way in the Cloud with Cloud Foundry
Apachecon 2014 Keynote: The Apache Way in the Cloud with Cloud Foundry
 
Enterprise PaaS Golden Pitch
Enterprise PaaS Golden Pitch Enterprise PaaS Golden Pitch
Enterprise PaaS Golden Pitch
 
Why PaaS, Why Now?
Why PaaS, Why Now? Why PaaS, Why Now?
Why PaaS, Why Now?
 
Baidu Cloud Foundry
Baidu Cloud FoundryBaidu Cloud Foundry
Baidu Cloud Foundry
 
A mercantile api economy jw
A mercantile api economy jwA mercantile api economy jw
A mercantile api economy jw
 
Keynote Cloud Jw
Keynote Cloud JwKeynote Cloud Jw
Keynote Cloud Jw
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Baidu cloudfoundry english

  • 1. A private cloud platform based on CloudFoundry TRANSLATED VERSION @Weiyu Wang(王炜煜),Operations Department @Baidu weibo.com/wwy1640 2013-7-19
  • 2. Outline  Background and Objectives Practice and Reform(Part 1、2) Processes and Standard Reform operations Future plans TRANSLATED VERSION
  • 3. 1. Background and Objectives TRANSLATED VERSION
  • 5. Objectives  Automation Business life cycle management,for example, modification 、 monitor、fault handling and so on. Resource utilization is elastic. Standardization Flow Instance standard System environment、runtime、framework Unification Integrate the third-party service,for example DB、Cache、log、FS and so on. Linkage with other system platform TRANSLATED VERSION
  • 6. Why CloudFoundry ? Automation Standard Unification Machine Management (The downstream department) Automation StandardUnification TRANSLATED VERSION
  • 8. 2. Practice and Reform(Part1) Java,base on cf 1.0 TRANSLATED VERSION
  • 9. Java Apps  •  Number of Product Categories >100 •  APP >200 •  Instances>2000 •  Average single-instance 10G(Memory) •  Average Daily total pv > 1billion •  The numbers of developers and testers for APP > 700 •  Tomcat5/6/7、jdk1.5/1.6、Standalone TRANSLATED VERSION
  • 10. Implementation and Preparation  •  Relevant modification based on CentOS ü  Deploy each CF component independently ⁺  Analyze BOSH、chef,implementation based on physical machine ü  OS environment initialization ⁺  apt-get is changed to yum ü  Ubuntu-cmd to CentOS ⁺  DEA(v1.0),agent.rb、secure.rb yum install -y make gcc gcc-c++ kernel-devel.x86_64 openssl-devel.x86_64 libxml2.x86_64 libxml2- devel.x86_64 libxslt.x86_64 libxslt-devel.x86_64 git.x86_64 sqlite.x86_64 ruby-sqlite3.x86_64 sqlite- devel.x86_64 unzip.x86_64 zip.x86_64 ruby-devel.x86_64 ruby-mysql.x86_64 mysql-devel.x86_64 curl- devel.x86_64 postgresql-libs.x86_64 postgresql-devel.x86_64 zlib-devel.x86_64 readline-devel.x86_64 ImageMagick.x86_64 ImageMagick-devel.x86_64 php-magickwand.x86_64 TRANSLATED VERSION
  • 11. Cluster capacity assessment  •  Number of instances,NATS capacity assessment ü  Number of instances hosted by single DEA(<100),the pressure to NATS-Server has little effect ü  Single NATS-Server can host 330 DEAs by a conservative estimate,The number of single instance is 5~30. ü  Multiple NATS-Server,extendable Deplay (ms) Number of DEAs (10 ~ 340) Number of Single DEA instances(5 ~ 30) Critical line 330 DEAs TRANSLATED VERSION
  • 12. In cluster, component redundant, LB design  •  NATS ü  Cluster,multiple NATS, synchronous heartbeat ü  Cache information from client side. If network is cut down,it should keep to reconnect. ü  Multiple NATS does load balance(Client > 0.5.beta.6) NATS-Server1 NATS-Server2 NATS-Client (caching message) NATS-Server1/2, Random list TRANSLATED VERSION
  • 13. Multiple cluster redundant design •  Multiple independent cluster ,logic independent ü  The first layer’s switch,modify DNS A record,for multiple domain names(CNAME to this A record), they will uniformly switch to to different clusters ü  The second layer’s switch,modify “interface layer”(For its application layer’s function ,it can be simply understood as Nginx’s reverse proxy ) ü  Ensure App (stateless) capacity,or expand the capacity quickly to prevent overload when the traffic switch back Baidu GateWay Front End Router A记录 Baidu GateWay Front End Router app1 app1 CNAME(formal domain name) CNAME(formal domain name) www.baidu.com CNAME www.a.shifen.com. www.baidu.cn CNAME www.a.shifen.com. www.a.shifen.com. A 119.75.218.77 www.a.shifen.com. A 119.75.217.56 TRANSLATED VERSION
  • 15. Framework(cf1.0)  DEA Logging Name Service Monitoring jvm Stager File Persistence HM Router CC Baidu GateWay / Front End jvm jvm API Bridge UAA jvm jvm jvm jvm jvm Router(Cluster 02) N A T S DB TRANSLATED VERSION
  • 16. New features  •  Support RPC, Single instance with multiple ports ü  One instance will open multiple ports,and provide API to search the IP ,ports in real time ü  Linkage with “name service”,synchronize dynamic IP/port’s relationship with name. ü  RPC caller will connect the instance directly according to name TRANSLATED VERSION
  • 17. DEA server Support RPC、 Single instance with multiple ports  Instance01:port Instance02:port API Bridge NS server TXT record ip:port ip:port RPC caller NS client Domain ip:port ip:port ip_local_port_range 10000 ~ 60000 Port pool(There is freeze period after allocation) 61000 ~ 65000 TRANSLATED VERSION
  • 18. New features  •  Support JMX ü  API to search the IP and Jconsole port in real time, then implement to collect JMX data in real time. TRANSLATED VERSION
  • 19. DEA Support JMX  Instance01: Jconsole 端口 Instance02: Jconsole 端口 { "instances": [ { "index": 0, "state": "RUNNING", "since": 438249600, "jconsole_ip": "10.1.1.1", "jconsole_port": 61111 }, { "index": 1, "state": "RUNNING", "since": 438249600, "jconsole_ip": "10.1.1.1", "jconsole_port": 62222 } Monitoring Metrics CpuUseRateDaemonThreadCount MemPool_OldGen_UseRate NonHeapMemoryUsage_used TotalCompilationTime TotalPeakThreadCount TotalStartedThreadCount UnloadedClassCount GC_Major_Frequency GC_Major_Time … … Stager: java -Dcom.sun.management.jmxremote.port={VCAP_JCONSOLE_PORT} -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false TRANSLATED VERSION
  • 20. New features  •  Enhancement to health monitor ü  Seven layers’ detection ü  Number of file handler detection TRANSLATED VERSION
  • 21. DEA Server DEA agent.rb Health Manger instance http avali abili ty instance CPU MEM DISK …… report Enhancement to health monitor  hand ler TRANSLATED VERSION
  • 22. DEA(v1.0), logical enhancement  •  Ports Management ü  Description ⁺  Single DEA, multiple instance,parallel to assign and start the port,there is no critical line,but there is the port competition issue ü  Solution ⁺  Reference DEA(v2.0)’s logic(Notes: it’s DEA_NG, not compatible with CF1.0) ⁺  Define ip_local_port_range as 10000~61000,it is dynamic ports’ range ⁺  Make 61001~65000 as DEA scheduling assigned ports ⁺  For assigned port,add “[release time、port num]” data structure ⁺  It resolve the port competition by delaying to release the port ü  Note ⁺  CF2.0 has resolved this problem by the same method above. TRANSLATED VERSION
  • 23. DEA(v1.0),logical enhancement  •  Instance resource information management ü  Description ⁺  Du command takes long time to calculate the disk space, as a result, the following commands’ calculation is not consistent ⁺  When calculate the CPU utilization, it doesn’t consider the number of cores ü  Solution ⁺  Adjust the related command’s order ⁺  When calculate the CPU utilization, it should be divided by the number of cores ü  Notes ⁺  CF2.0 has resolved this problem. TRANSLATED VERSION
  • 24. New features (Linkage with peripheral system)  •  File persistent ü  Use MFS(Moose File System) ü  DEA deply MFS-Client and mount /mfs/path to let instance use ü  MFS service provide the HTTP interface to get the data •  Route based on URL,distinguish APP ü  foo.baidu.com/app1 à app1.foo.baidu.com ü  foo.baidu.com/app2 à app2.foo.baidu.com •  Monitor linkage ü  APP’s life cycle,to interact with external monitor system’s API, to implement the monitor item’s automatic modification. •  The SDK ü  Automatic release(encapsulate vmc) ü  View file TRANSLATED VERSION
  • 25. Summary of key reform point(CF V1.0 •  Relevant reform based on CentOS •  NATS-Cluster usage、NATS-Client retry and cache •  Support RPC、single instance with multiple ports •  Support dynamic JMX、Jconsole •  Enhance the health monitor •  Ports management •  Instance resource information management •  Peripheral component:File persistent、Monitor linkage、 URI Route、The SDK TRANSLATED VERSION
  • 26. 2. Practice and Reform(Part2) C/C++,base on cf 2.0 TRANSLATED VERSION
  • 27. Several key problems of C/C++ Apps  •  Container’s runtime is isolated with resource ü  Kernel/GNU ü  Resource isolation ü  Snapshot,Core Dump •  Single instance, multiple processes ü  Health monitor ü  The order of processes’ execution ü  Communication within instance and among process ü  Multiple ports ü  The isomorphism of multiple instances TRANSLATED VERSION
  • 28. Several key problems of C/C++ Apps  •  Big instance ü  Big instance number(100 thousands) ü  Large amount of data(single instance,2TB) ü  High memory usage(single instance,100G) ü  Long start time(30mins) ü  Large flow(single instance,daily total PV2 hundred million) ü  When drift,to prevent insufficient resources •  APP communication ü  Network layer communication,authorization、flow control ü  Output file,need to get from outside ü  Input file,need to push from outside ü  RPC,none-HTTP protocol,not containing PATH info,can’t route TRANSLATED VERSION
  • 29. Instance’s OS-Level environment preparation  •  Container’s runtime environment ü Kernel is consistent with host machine ü Make Container’s file environment warden/warden/root/linux/rootfs/setup.sh if grep -q -i centos /etc/issue then exec $(dirname $0)/centos.sh $@ fi TRANSLATED VERSION
  • 30. Relationship between Container and host machine  Warden Networking,Bridge / NAT / Firewall / FlowControl DEA init─┬─xxx ├─xxx─xxx ├─xxx mount r usr/ lib/ etc/ mount rw xxx/ network interface(sub net) Cgroup – CPU / MEM Name space init─┬─xxx ├─xxx─xxx ├─xxx mount r usr/ lib/ etc/ mount rw xxx/ network interface(sub net) Cgroup – CPU / MEM Name space TRANSLATED VERSION
  • 31. Package management  •  Buildpack API ü  detect , check ü  Compile,environment preparation ⁺  Directory structure ⁺  Program files,and relevant supporting program ⁺  Startup script, and ensure the startup order of process … ⁺  Monitor script,it can periodically execute and check the whole instance’s health ü  Release,information to publish ü  Procfile,parameter passing(e.g. port) ü  .profile.d,environment variable TRANSLATED VERSION
  • 32. Point to enhance health monitor  •  Self-defined monitor scripts ü  self-defined monitor scripts, which is published together with instance and periodically to modify the content of stat_file ü  DEA will check the stat_file periodically Instance stat_file monitor.sh process-1 process-2 DEA HM TRANSLATED VERSION
  • 33. Reform to APP  •  For RPC,support NS Client ü  Dynamic configuration file to replace route ü  Port management,freeze time •  Input/Output file ü  Input file need to get from outside actively ü  Output file,pushed to the transit(e.g. cloud storage ),or service based on NS •  Multiple process management, startup scripts ü  Multiple processes,to control their startup order ü  Process control •  File persistent ü  Remote log ü  Use the cloud storage TRANSLATED VERSION
  • 34. Framework(CF2.0)  DEA Logging Name Service Monitoring File Persistence HM gorouter(RPC,not applicable) CC Baidu GateWay / Front End API Bridge UAA (Cluster 02) N A T S Container process-1 process-2 Warden NS Client Container process-1 process-2 Container process-1 process-2 DB TRANSLATED VERSION
  • 35. Reform Summary(cf v2.0)  •  Relevant reform based on the CentOS •  Container’s environment order •  Buildpack’s order •  Support RPC, single instance, multiple ports •  Enhance the health monitor •  Peripheral: file persistent, monitor linkage, URI Route, SDK TRANSLATED VERSION
  • 36. 3. Processes and Standard TRANSLATED VERSION
  • 37. Working Process Description  Review •  Standard •  Capacity •  SLA Access •  Org relationship •  Name info •  Operation info Process approval •  Authorizatio n apply •  Name apply •  Release opt Release update •  PreRelea se •  Gray scale •  Rollback Failure handling •  availabi y •  Security •  Issue mgmt TRANSLATED VERSION
  • 38. Standard and Capacity Example  •  Standard information collection ü  App related name, related interface people(R&D, QA, operation, related manager, and so on) ü  Runtime is isolated with container’s version ü  Stateless, RPC, URI Route ü  Dynamic and static files are isolated ü  File persistence •  Capacity information collection ü  PV、QPS ü  Single instance’s CPU, memory, disk, bandwidth, restarting time ü  Number of instances TRANSLATED VERSION
  • 39. SLA examples  •  Service object ü  Java Application(“APP” for short in the following) ü  APP that conforms to the standard •  Servicing time ü  24×365 all year round •  Way to communication ü  Mail、Tel、interface people information •  Stability related indicators ü  Core components,availability >99.99%(by month),MTTR<20mins, MTBF>5days ü  Control services,availability >99.95%(the whole year) ü  APP’s self SLA, it won’t cause bad effect because of platform its self. ü  Notes:APP’s self problem,beyond the scope of SLA,for example, bug, capacity forest error, external system’s failure(e.g. DB, Cache) and so on TRANSLATED VERSION
  • 40. Organization, Layer  • Product line(Org) • Module(Space) • Group(APP) • Version (APP-*) Product line -2 Product line-1 (Org) Module-2 Module-1 (Space) Group-1(A) Group-2(B) 实例,版本-1 (APP-1-1) 实例,版本-2 (APP-1-2) 实例,版本-1 (APP-2-1) 实例,版本-2 (APP-2-2) Instance,v1 (A-1) Instance,V2 (A-2) Instance,v1 (B-1) Instance,V2 (B-2)It is one APP,but multiple instances in the dashed frame. TRANSLATED VERSION
  • 41. Further encapsulation to CC  Product line(Org) OrgName Module(Space) OrgName_SpaceName Module group OrgName_SpaceName_GroupTag Module version OrgName_SpaceName_GroupTag_VersionTag Instance(Unique id) OrgName_SpaceName_GroupTag_VersionTag_Index TRANSLATED VERSION
  • 42. GroupTag、VersionTag  • GroupTag •  It can distinguish: configuration number、computer room、rack … from different dimension • Version Tag •  It can distinguish:program, data, configuration file and so on •  Including: four version number, timestamp • Instance full name,for example •  Org_Space_GroupA_1-1-1-1-438249600_1 •  Org_Space_GroupB_1-1-1-1-438249600_1 TRANSLATED VERSION
  • 43. Examination, approval and release  •  Distribute form and approve ü  APP information(program version, capacity information, related instruction and so on) ü  Approval(related manager, and the people who should know) ü  Operator、Operating time ü  Monitor information(Monitoring and controlling strategy、 Interface people and so on) •  Start to distribute operation, and add monitor ü  Before release,related approval processes must pass ü  Operator, program version, MD5、time information and so on,it must keep consistent with approval ü  It must be consistent and pass the processes,then it can release ü  After successful release, add the monitor Distribute form Approval Release APP Add Monitor TRANSLATED VERSION
  • 44. Pre-release, release, rollback  app_v1 instance01app_v1.paas.baidu.com app_v1 instance02 app_v2 instance01 app_v2 instance02 app_v3 instance01 app_v3 instance02app_v3.paas.baidu.com app.baidu.com Generic domain name, map/unmap, multiple versions of app Ahead, Release Retreat, roll back Pre-release,Offline observation in inner network TRANSLATED VERSION
  • 45. Basic grays scale release  app_v1 instance01app_v1.paas.baidu.com app_v1 instance02 app_v2 instance01 app_v2 instance02 app_v3 instance01 app_v3 instance02 app.baidu.com 1、Make one formal domain name point to multiple apps at the same time 2、Adjust the proportion of many instances’ number,then adjust the proportion of traffic. app.baidu.com app_v2 instance03 By adjusting the proportion of the many instance’s number, to adjust the proportion of gray scale traffic TRANSLATED VERSION
  • 46. “The path to sermon”, The platform popularization  •  The medal, who own the other half ? ü  Support app ⁺  New service needs to follow the PaaS related standard and thought ⁺  Old service,need R&D to reform and QA to do regression test ü  Periphery support ⁺  DB, Cache, storage, interface, security, monitor and so on •  Clear the benefits,establish the win-win ecosystem ü  Deliver faster, save more resource, and make it more simple ü  One-stop and all-in-all service,hand in hand to popularize TRANSLATED VERSION
  • 47. Some solutions:  •  Give users(APP developers) noble imperial enjoyment ü  For important APP,do some specific service ü  For important managers,it should have a set of complete, timely communication, such as reports, etc ü  The principle is “capitalism”, rather than “socialism” •  Event “marketing” ü  E.g. “struts2 0day” ⁺  Actively cooperate with R&D and QA to do the issues identification, repair and implementation ⁺  Actively report the progress and do the event managment ⁺  Late,for this to actively promote and participate the discussion and make decision, for example, security, and architecture group ⁺  The principle is “win-win”,rather than shirking the responsibility TRANSLATED VERSION
  • 49. Reform operation  “NoOps” PaaS(and IaaS) overall functionality >= Traditional operation work Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime OP(SRE), operation PaaS (and IaaS) TRANSLATED VERSION
  • 50. How to reform,Example  • Automatic fault recovery ü  Add the health monitor mechanism based on the traditional monitoring ü  Instance automatically restart and “drift” ü  Reduce the traditional alarm and man power ⁺  It will only alarm, when automatic recovery fail Monitor Whole instance name_1 ip:port … … Health monitor AP I … … Real instance_1 ip:port Instance after drifting_1 •  ”drift” is a normal phenomenon, it doesn’t alarm •  It only need the alarm, when “drift” fail •  It refinins to monitor instance,every time according t name,detect and return ip:port TRANSLATED VERSION
  • 51. How to reform, Example  •  More agile ü  Make developer forget the servers, instead of resource oriented ü  It has a complete configuration management and automatic deployment function ü  Release, pre-release, rollback, extremely simple, and it doesn’t need the extra complex deployment tool ü  Elastic extension, extremely simple ü  Use Buildpack,implement cloud compiling and run directly •  Experience of all in one and one-stop ü  From distribute form, release and modify the monitor,the working process is totally automatic ü  Integrate the third-party service, unify the management entrance TRANSLATED VERSION
  • 53. Future plans  • Feedback to community •  For private cloud function,try best to encapsulate the native components(based CF2.0) , then make the new component open source •  If affect the native components,try best to merge to the master branch •  Write more document and tips, and actively to participate in communication • Development orientation •  For large application(big instance)related •  Intelligent scheduling related •  Information Security •  Further continuous integration •  UI TRANSLATED VERSION
  • 54. We are hiring ! @Weiyu Wang(王炜煜) weibo.com/wwy1640 Thanks TRANSLATED VERSION