SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Making Nova Infrastructure
         HA/Fault Tolerant
This discussion will focus only on infrastructure and has
               nothing to do with guests.



http://etherpad.openstack.org/making-nova-components-ha
Goals of discussion

1. Discuss changes that are needed for each of the
   components.
2. Not solve the issues today.
3. Leave with ideas for blueprints or topics of discussion on the
   mailing list.
My reference implementation
based on where nova is today,
       Cactus release.
MySQL Active/Passive setup

   Two nodes
   DRDB or shared DAS for file system
   Heartbeat and/or Pacemaker to control resource fail over

Wanted to use stock MySQL included from distributions
  No patches
  No recompiling
  No complex setups
RabbitMQ Active/Passive

   Cluster != HA
   Active/Passive
      DRDB or DAS for shared filesystem
      Heartbeat and/or Pacemaker to control resource fail over

RabbitMQ seems limited in scaling options

http://www.rabbitmq.com/pacemaker.html
http://www.rabbitmq.com/clustering.html
Managers Active/Passive

1. Nova-API
     Should have option to run with real web server
2. Nova-Network
3. Nova-Scheduler
4. Nova-compute
     Hard to make HA as its only one per node
NTT’s Goal

 High Availability of Nova-Network Service
Use of Open Source Software’s
 Linux-HA (Heartbeat)
 Keepalived
Design Approaches
Active/Standby using Linux-HA
    In this approach, we are trying to switch over from primary network
    server to standby network server using heartbeat as a HA software.

Detecting failover criteria
Failover will be detected by the peer system in following cases
     Heartbeat service is down
     Network failure( ethernet interface on which the udp packets are sent
     by the heartbeats is down i.e. eth0, eth1 and so on)P
     Powered off
     nova-network service is not running on primary network server
     syslog has reported any fatal error
Configure Heartbeat
You must configure three files to get heartbeat to work: authkeys, ha.cf and
haresources.
    /etc/ha.d/ha.cf - the global cluster configuration
    /etc/ha.d/authkeys - a file containing keys for mutual node authentication
    /etc/ha.d/haresources - This file specifies the services for the cluster and
    who the default owner is
Cluster Configuration
Nova Network Configuration
 Nova.conf
      Routing source IP address should be same on both primary and standby
      network server.
 --routing_source_ip=10.2.3.120
      Host flag settings should be same on both primary and standby network
      server. All network messages communication happens using rabbitmq
      on topic network.<hostname>. By default hostname is the hostname of
      the server on which nova-network server is running if host flag is not
      provided. After failover we don’t want hostname to differ from the primary
      network server so it should use host flag to get rid of this problem. Most
      importantly, administrator will have to use the same host to create
      networks using nova-manage.
 --host=somehostname
Test HA
  Start the heartbeat service on the primary and then on the standby
  network server
  After the heartbeat starts successfully, you should see a new network
  interface with the IP address that you configured in the haresources file
  on the primary network server.
  Run new VM instance and associate floating ip address 10.2.3.104 to an
  instance.
  Ping to 10.2.3.104 from your laptop. Ping should be successful.
  Now simulate failover by simply stopping heartbeat on the primary
  network server
  You should see nova-network service will be running on the standby
  network server and nova-network service will be stopped on the primary
  server.
  Verify that you can still ping to 10.2.3.104. If ping is successful that
  means you are successful in implementing HA of the network server.
Disconnection Time
   The disconnection time should vary depending upon deadtime parameter
   you have set in the ha.cf. To find out the actual network disconnection
   time, I have taken 3 samples using tcpdump (tcpdump -i eth1 -n icmp)
   with different deadtime settings. But shockingly I got unexpected results.


Dead time      Ping Reply Ping Reply              Time difference (in
(in            Stop       Response                Seconds)
Seconds)
     4          15:20:28     15:20:56                         28
     4          15:21:47     15:22:14                         27
     4          15:23:46     15:24:26                         40
     3          16:23:07     16:23:54                         47
     3          16:25:16     16:26:03                         47
     3          16:26:59     16:27:46                         47
     2          16:40:54     16:41:19                         25
     2          16:41:56     16:42:20                         24
     2          16:43:02     16:43:26                         24
Problems
The failover happens exactly after 4/3/2 seconds but ARP table in the neighboring
machines are not updated with the new MAC address of the standby network
server. It takes lot of time to update the ARP table if we don’t broadcasts a
gratuitous ARP to inform the neighboring machines about the change of MAC
address for the IP.
I tried broadcasting a gratuitous ARPs to router (in our setup 10.2.3.1 is the router)
just before starting the nova network service and recorded the disconnection time
again.
ARP command
arping -b -c 1 -s <floating Ip address> -I <public interface> <router ip address>
$arping -b -c 1 -s 10.2.3.104 -I eth1 10.2.3.1

Amazingly, this time the disconnection time recorded was in the range of 4-5
seconds for 4/3/2 deadtime parameter.
Possible Changes to the nova-network
We may need to send gratuitous ARPs for following IP addresses:-
- Routing source IP address
- Floating IP Addresses
     Gateway address of each vlan(which is assigned to each bridge)

Instead of handling these gratuitous ARPs externally when the failover happens,
we think we should make changes to the nova-network source code to address all
of the above 3 points. Even if we modify the source code to send the ARP
messages, there will be a down time of few seconds. However, this modification
sound reasonable than the current implementation in terms of HA.

Note: The down time will depend on many factors like number of VM instances,
vlans etc. For POC purpose, we have used 2 VM instances and one Vlan.
Discussion

Database
   Ideas?
   Concerns?
   replacements?
Discussion

RabbitMQ
  Ideas?
  Concerns?
  replacements?
Discussion

Nova-network
  Ideas?
  Concerns?
  Multi-master?
Discussion

Nova-scheduler
  Ideas?
  Concerns?
  Multi-master?
Discussion

Nova-api
  Ideas?
  Concerns?
  run multiple instances with lb in front?

Contenu connexe

Tendances

Netfilter: Making large iptables rulesets scale
Netfilter: Making large iptables rulesets scaleNetfilter: Making large iptables rulesets scale
Netfilter: Making large iptables rulesets scalebrouer
 
Configuration IPTables On CentOS 8
Configuration IPTables On CentOS 8Configuration IPTables On CentOS 8
Configuration IPTables On CentOS 8Kaan Aslandağ
 
Wireshar training
Wireshar trainingWireshar training
Wireshar trainingLuke Luo
 
Basics of firewall, ebtables, arptables and iptables
Basics of firewall, ebtables, arptables and iptablesBasics of firewall, ebtables, arptables and iptables
Basics of firewall, ebtables, arptables and iptablesPrzemysław Piotrowski
 
Cumulus Networks: Automating Network Configuration
Cumulus Networks: Automating Network ConfigurationCumulus Networks: Automating Network Configuration
Cumulus Networks: Automating Network ConfigurationCumulus Networks
 
Athenticated smaba server config with open vpn
Athenticated smaba server  config with open vpnAthenticated smaba server  config with open vpn
Athenticated smaba server config with open vpnChanaka Lasantha
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationCumulus Networks
 
Introduction to tcp ip linux networking
Introduction to tcp ip   linux networkingIntroduction to tcp ip   linux networking
Introduction to tcp ip linux networkingSreenatha Reddy K R
 
TCPdump-Wireshark
TCPdump-WiresharkTCPdump-Wireshark
TCPdump-WiresharkHarsh Singh
 
Packet Tracer: Routing protocols EIGRP and OSPF
Packet Tracer: Routing protocols EIGRP and OSPFPacket Tracer: Routing protocols EIGRP and OSPF
Packet Tracer: Routing protocols EIGRP and OSPFRafat Khandaker
 
Networking Puzzle
Networking PuzzleNetworking Puzzle
Networking PuzzleAalok Shah
 
OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)
OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)
OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)Denny K
 
Network configuration
Network configurationNetwork configuration
Network configurationengshemachi
 

Tendances (20)

Netfilter: Making large iptables rulesets scale
Netfilter: Making large iptables rulesets scaleNetfilter: Making large iptables rulesets scale
Netfilter: Making large iptables rulesets scale
 
Configuration IPTables On CentOS 8
Configuration IPTables On CentOS 8Configuration IPTables On CentOS 8
Configuration IPTables On CentOS 8
 
Wireshar training
Wireshar trainingWireshar training
Wireshar training
 
Basics of firewall, ebtables, arptables and iptables
Basics of firewall, ebtables, arptables and iptablesBasics of firewall, ebtables, arptables and iptables
Basics of firewall, ebtables, arptables and iptables
 
Iptables
IptablesIptables
Iptables
 
Cumulus Networks: Automating Network Configuration
Cumulus Networks: Automating Network ConfigurationCumulus Networks: Automating Network Configuration
Cumulus Networks: Automating Network Configuration
 
Athenticated smaba server config with open vpn
Athenticated smaba server  config with open vpnAthenticated smaba server  config with open vpn
Athenticated smaba server config with open vpn
 
Commands
CommandsCommands
Commands
 
Chapter7ccna
Chapter7ccnaChapter7ccna
Chapter7ccna
 
Iptables in linux
Iptables in linuxIptables in linux
Iptables in linux
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After Configuration
 
Introduction to tcp ip linux networking
Introduction to tcp ip   linux networkingIntroduction to tcp ip   linux networking
Introduction to tcp ip linux networking
 
TCPdump-Wireshark
TCPdump-WiresharkTCPdump-Wireshark
TCPdump-Wireshark
 
Lession2 Xinetd
Lession2 XinetdLession2 Xinetd
Lession2 Xinetd
 
Packet Tracer: Routing protocols EIGRP and OSPF
Packet Tracer: Routing protocols EIGRP and OSPFPacket Tracer: Routing protocols EIGRP and OSPF
Packet Tracer: Routing protocols EIGRP and OSPF
 
Networking Puzzle
Networking PuzzleNetworking Puzzle
Networking Puzzle
 
Tcpdump
TcpdumpTcpdump
Tcpdump
 
OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)
OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)
OSTU - Quickstart Guide for Wireshark (by Tony Fortunato)
 
F5 tcpdump
F5 tcpdumpF5 tcpdump
F5 tcpdump
 
Network configuration
Network configurationNetwork configuration
Network configuration
 

Similaire à Nova HA

CCNA Packet Tracer 1.6.1
CCNA Packet Tracer 1.6.1CCNA Packet Tracer 1.6.1
CCNA Packet Tracer 1.6.1Rafat Khandaker
 
Linux Networking Commands
Linux Networking CommandsLinux Networking Commands
Linux Networking Commandstmavroidis
 
Implementing an IPv6 Enabled Environment for a Public Cloud Tenant
Implementing an IPv6 Enabled Environment for a Public Cloud TenantImplementing an IPv6 Enabled Environment for a Public Cloud Tenant
Implementing an IPv6 Enabled Environment for a Public Cloud TenantShixiong Shang
 
Mikrotik link redundancy solution
Mikrotik link redundancy solution Mikrotik link redundancy solution
Mikrotik link redundancy solution S M Tipu
 
Sharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linuxSharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linuxjasembo
 
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...idsecconf
 
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerUnder the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerDocker, Inc.
 
Run Run Trema Test
Run Run Trema TestRun Run Trema Test
Run Run Trema TestHiroshi Ota
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network TroubleshootingOpen Source Consulting
 
INFA 620Laboratory 4 Configuring a FirewallIn this exercise.docx
INFA 620Laboratory 4 Configuring a FirewallIn this exercise.docxINFA 620Laboratory 4 Configuring a FirewallIn this exercise.docx
INFA 620Laboratory 4 Configuring a FirewallIn this exercise.docxcarliotwaycave
 
Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1 Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1 Sam Kim
 
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...Salem Trabelsi
 
Free radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmapleFree radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmapleChanaka Lasantha
 
CMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAM
CMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAMCMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAM
CMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAMHamesKellor
 
Training Day Slides
Training Day SlidesTraining Day Slides
Training Day Slidesadam_merritt
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation ToolsEdwin Beekman
 

Similaire à Nova HA (20)

Layer2&arp
Layer2&arpLayer2&arp
Layer2&arp
 
CCNA Packet Tracer 1.6.1
CCNA Packet Tracer 1.6.1CCNA Packet Tracer 1.6.1
CCNA Packet Tracer 1.6.1
 
Linux Networking Commands
Linux Networking CommandsLinux Networking Commands
Linux Networking Commands
 
Implementing an IPv6 Enabled Environment for a Public Cloud Tenant
Implementing an IPv6 Enabled Environment for a Public Cloud TenantImplementing an IPv6 Enabled Environment for a Public Cloud Tenant
Implementing an IPv6 Enabled Environment for a Public Cloud Tenant
 
Mikrotik link redundancy solution
Mikrotik link redundancy solution Mikrotik link redundancy solution
Mikrotik link redundancy solution
 
Sharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linuxSharing your-internet-connection-on-linux
Sharing your-internet-connection-on-linux
 
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
 
6005679.ppt
6005679.ppt6005679.ppt
6005679.ppt
 
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, DockerUnder the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
Under the Hood with Docker Swarm Mode - Drew Erny and Nishant Totla, Docker
 
Run Run Trema Test
Run Run Trema TestRun Run Trema Test
Run Run Trema Test
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
 
INFA 620Laboratory 4 Configuring a FirewallIn this exercise.docx
INFA 620Laboratory 4 Configuring a FirewallIn this exercise.docxINFA 620Laboratory 4 Configuring a FirewallIn this exercise.docx
INFA 620Laboratory 4 Configuring a FirewallIn this exercise.docx
 
Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1 Make container without_docker_6-overlay-network_1
Make container without_docker_6-overlay-network_1
 
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...2.5.1.2 packet tracer   configure cisco routers for syslog, ntp, and ssh oper...
2.5.1.2 packet tracer configure cisco routers for syslog, ntp, and ssh oper...
 
Free radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmapleFree radius billing server with practical vpn exmaple
Free radius billing server with practical vpn exmaple
 
CMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAM
CMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAMCMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAM
CMIT 350 FINAL EXAM CCNA CERTIFICATION PRACTICE EXAM
 
Dhcp confg
Dhcp confgDhcp confg
Dhcp confg
 
Training Day Slides
Training Day SlidesTraining Day Slides
Training Day Slides
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation Tools
 
net work iTM3
net work iTM3net work iTM3
net work iTM3
 

Plus de Open Stack

OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpen Stack
 
OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011Open Stack
 
Dell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONDell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONOpen Stack
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introOpen Stack
 
OpenStack Technology Overview
OpenStack Technology OverviewOpenStack Technology Overview
OpenStack Technology OverviewOpen Stack
 
JCO Conference OpenStack
JCO Conference OpenStackJCO Conference OpenStack
JCO Conference OpenStackOpen Stack
 
OpenStack 101 Technical Overview
OpenStack 101 Technical OverviewOpenStack 101 Technical Overview
OpenStack 101 Technical OverviewOpen Stack
 
Nebula james Williams
Nebula james WilliamsNebula james Williams
Nebula james WilliamsOpen Stack
 
Open stack dashboard diablo
Open stack dashboard   diabloOpen stack dashboard   diablo
Open stack dashboard diabloOpen Stack
 
Snapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-finalSnapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-finalOpen Stack
 
Opening Presentation
Opening PresentationOpening Presentation
Opening PresentationOpen Stack
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Open Stack
 
Swift container sync
Swift container syncSwift container sync
Swift container syncOpen Stack
 
The site architecture you can edit
The site architecture you can editThe site architecture you can edit
The site architecture you can editOpen Stack
 
Mach Technology
Mach Technology Mach Technology
Mach Technology Open Stack
 
OpenStack on Intel
OpenStack on IntelOpenStack on Intel
OpenStack on IntelOpen Stack
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale CloudOpen Stack
 
Openstack and eBay
Openstack and eBay Openstack and eBay
Openstack and eBay Open Stack
 
OpenStack Opportunity - Citrix
OpenStack Opportunity - CitrixOpenStack Opportunity - Citrix
OpenStack Opportunity - CitrixOpen Stack
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on OpenstackOpen Stack
 

Plus de Open Stack (20)

OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overview
 
OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011
 
Dell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONDell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCON
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
 
OpenStack Technology Overview
OpenStack Technology OverviewOpenStack Technology Overview
OpenStack Technology Overview
 
JCO Conference OpenStack
JCO Conference OpenStackJCO Conference OpenStack
JCO Conference OpenStack
 
OpenStack 101 Technical Overview
OpenStack 101 Technical OverviewOpenStack 101 Technical Overview
OpenStack 101 Technical Overview
 
Nebula james Williams
Nebula james WilliamsNebula james Williams
Nebula james Williams
 
Open stack dashboard diablo
Open stack dashboard   diabloOpen stack dashboard   diablo
Open stack dashboard diablo
 
Snapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-finalSnapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-final
 
Opening Presentation
Opening PresentationOpening Presentation
Opening Presentation
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
 
Swift container sync
Swift container syncSwift container sync
Swift container sync
 
The site architecture you can edit
The site architecture you can editThe site architecture you can edit
The site architecture you can edit
 
Mach Technology
Mach Technology Mach Technology
Mach Technology
 
OpenStack on Intel
OpenStack on IntelOpenStack on Intel
OpenStack on Intel
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale Cloud
 
Openstack and eBay
Openstack and eBay Openstack and eBay
Openstack and eBay
 
OpenStack Opportunity - Citrix
OpenStack Opportunity - CitrixOpenStack Opportunity - Citrix
OpenStack Opportunity - Citrix
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
 

Dernier

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Dernier (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Nova HA

  • 1. Making Nova Infrastructure HA/Fault Tolerant This discussion will focus only on infrastructure and has nothing to do with guests. http://etherpad.openstack.org/making-nova-components-ha
  • 2. Goals of discussion 1. Discuss changes that are needed for each of the components. 2. Not solve the issues today. 3. Leave with ideas for blueprints or topics of discussion on the mailing list.
  • 3. My reference implementation based on where nova is today, Cactus release.
  • 4. MySQL Active/Passive setup Two nodes DRDB or shared DAS for file system Heartbeat and/or Pacemaker to control resource fail over Wanted to use stock MySQL included from distributions No patches No recompiling No complex setups
  • 5. RabbitMQ Active/Passive Cluster != HA Active/Passive DRDB or DAS for shared filesystem Heartbeat and/or Pacemaker to control resource fail over RabbitMQ seems limited in scaling options http://www.rabbitmq.com/pacemaker.html http://www.rabbitmq.com/clustering.html
  • 6. Managers Active/Passive 1. Nova-API Should have option to run with real web server 2. Nova-Network 3. Nova-Scheduler 4. Nova-compute Hard to make HA as its only one per node
  • 7. NTT’s Goal High Availability of Nova-Network Service
  • 8. Use of Open Source Software’s Linux-HA (Heartbeat) Keepalived
  • 9. Design Approaches Active/Standby using Linux-HA In this approach, we are trying to switch over from primary network server to standby network server using heartbeat as a HA software. Detecting failover criteria Failover will be detected by the peer system in following cases Heartbeat service is down Network failure( ethernet interface on which the udp packets are sent by the heartbeats is down i.e. eth0, eth1 and so on)P Powered off nova-network service is not running on primary network server syslog has reported any fatal error
  • 10. Configure Heartbeat You must configure three files to get heartbeat to work: authkeys, ha.cf and haresources. /etc/ha.d/ha.cf - the global cluster configuration /etc/ha.d/authkeys - a file containing keys for mutual node authentication /etc/ha.d/haresources - This file specifies the services for the cluster and who the default owner is
  • 12. Nova Network Configuration Nova.conf Routing source IP address should be same on both primary and standby network server. --routing_source_ip=10.2.3.120 Host flag settings should be same on both primary and standby network server. All network messages communication happens using rabbitmq on topic network.<hostname>. By default hostname is the hostname of the server on which nova-network server is running if host flag is not provided. After failover we don’t want hostname to differ from the primary network server so it should use host flag to get rid of this problem. Most importantly, administrator will have to use the same host to create networks using nova-manage. --host=somehostname
  • 13. Test HA Start the heartbeat service on the primary and then on the standby network server After the heartbeat starts successfully, you should see a new network interface with the IP address that you configured in the haresources file on the primary network server. Run new VM instance and associate floating ip address 10.2.3.104 to an instance. Ping to 10.2.3.104 from your laptop. Ping should be successful. Now simulate failover by simply stopping heartbeat on the primary network server You should see nova-network service will be running on the standby network server and nova-network service will be stopped on the primary server. Verify that you can still ping to 10.2.3.104. If ping is successful that means you are successful in implementing HA of the network server.
  • 14. Disconnection Time The disconnection time should vary depending upon deadtime parameter you have set in the ha.cf. To find out the actual network disconnection time, I have taken 3 samples using tcpdump (tcpdump -i eth1 -n icmp) with different deadtime settings. But shockingly I got unexpected results. Dead time Ping Reply Ping Reply Time difference (in (in Stop Response Seconds) Seconds) 4 15:20:28 15:20:56 28 4 15:21:47 15:22:14 27 4 15:23:46 15:24:26 40 3 16:23:07 16:23:54 47 3 16:25:16 16:26:03 47 3 16:26:59 16:27:46 47 2 16:40:54 16:41:19 25 2 16:41:56 16:42:20 24 2 16:43:02 16:43:26 24
  • 15. Problems The failover happens exactly after 4/3/2 seconds but ARP table in the neighboring machines are not updated with the new MAC address of the standby network server. It takes lot of time to update the ARP table if we don’t broadcasts a gratuitous ARP to inform the neighboring machines about the change of MAC address for the IP. I tried broadcasting a gratuitous ARPs to router (in our setup 10.2.3.1 is the router) just before starting the nova network service and recorded the disconnection time again. ARP command arping -b -c 1 -s <floating Ip address> -I <public interface> <router ip address> $arping -b -c 1 -s 10.2.3.104 -I eth1 10.2.3.1 Amazingly, this time the disconnection time recorded was in the range of 4-5 seconds for 4/3/2 deadtime parameter.
  • 16. Possible Changes to the nova-network We may need to send gratuitous ARPs for following IP addresses:- - Routing source IP address - Floating IP Addresses Gateway address of each vlan(which is assigned to each bridge) Instead of handling these gratuitous ARPs externally when the failover happens, we think we should make changes to the nova-network source code to address all of the above 3 points. Even if we modify the source code to send the ARP messages, there will be a down time of few seconds. However, this modification sound reasonable than the current implementation in terms of HA. Note: The down time will depend on many factors like number of VM instances, vlans etc. For POC purpose, we have used 2 VM instances and one Vlan.
  • 17. Discussion Database Ideas? Concerns? replacements?
  • 18. Discussion RabbitMQ Ideas? Concerns? replacements?
  • 19. Discussion Nova-network Ideas? Concerns? Multi-master?
  • 20. Discussion Nova-scheduler Ideas? Concerns? Multi-master?
  • 21. Discussion Nova-api Ideas? Concerns? run multiple instances with lb in front?