SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Torturing OpenNebula for Fun and Profit
Carlo Daffara - NodeWeaver
● Ensuring that the platform runs well in uncontrolled environment
requires some attention to design (focused on the target) and lots
of testing
● Some basic principles:
○ “perfection is finally attained not when there is no longer
anything to add, but when there is no longer anything to take
away” - Antoine de Saint Exupéry
○ Complexity may be necessary at scale, but not for every
application. Every piece that is added may break at some
point
Source: Werner Wogels, Real-time graph of microservice dependencies at http://amazon.com in 2008.
● If you ever ask the user for something, she becomes part of the
system to be tested! …
● … which means that in principle, you should never ask the user for
information that may be obtained in some other (automated) way
● The user may not understand, may not be there, may be mistaken
by all the knobs and dials, or may be deliberately destructive
● Testing must be done on the complete system -
software+hardware+configs …
● … because software faults are more common than hardware ones
● Faults are complex: stop, corruption, limping…
● Trust only what you measure (as Grace Hopper said: "One
accurate measurement is worth a thousand expert opinions.")
● We model our system as a Petri Net
● We run a group of NodeWeaver images (within NodeWeaver),
each with a set of disks attached to emulate local storage &
multiple virtual ethernet links
● Within each emulated node, we run a small set of Centos images,
that receive from contextualization the number of FIO runs and
the kind of emulated workload
● And we run our little chaos monkey process (actually, some bash
scripts)
● Disks:
○ detach disk, then destroy it
○ detach disk, then attach an empty disk
○ detach, wait (random), then reattach
○ Inject random data in a random file within the disk image
○ Inject random data in the disk image
● Network: virsh domif-setlink (up, down) to simulate a faulty cable
(hint: https://dev.opennebula.org/issues/3219 pretty pleeeease... )
● Virtual Node: Hardreset + full time cluster reset
● Future: wrong BIOS clock (through qemu -rtc base=XXXX), IPMI
emulation, packet loss/latency/bandwidth (through NETem: only
25MB!)
● What we discovered:
○ The underlying filesystem is hugely important
○ EXT4 handles most of it, XFS works (but recovery may be very
slow), BTRFS dies in horrible ways, ZFS barely notices
○ Using MySQL as the OpenNebula DB: every ≅25 crashes it
requires some work, every ≅150 crashes requires non-trivial
manual effort
○ Our custom SQLite (with WAL) survives happily (we
compensate the lack of concurrency with a query sequencer)
○ LizardFS is highly tolerant of multiple, parallel failures - disk,
network, whatever
● We took advantage of the exceptionally simple host probes
mechanism, to add additional information that is used by the
platform and the recovery heuristics
● Adding new probes takes very little time and effort - thanks to
OpenNebula simplicity
● We continue to add probes (for example, the P-value for
predicted user experience) and use background processes to add
forecasts
Conclusions
● OpenNebula works exceptionally well under torture, both in
virtual and physical testing
● LizardFS is amazingly resilient (CRC everywhere helps)...
● ...especially on ZFS with its transaction groups
● Chaos-monkey testing does not guarantee that every possible
fault path is tested…
● ...yet it helps in finding paths that we never thought about - but
our customers will for sure
Thanks!
Carlo Daffara
carlo.daffara@nodeweaver.eu
@cdaffara

Contenu connexe

Tendances

OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...
OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...
OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...
OpenNebula Project
 

Tendances (20)

OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...
OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...
OpenNebulaConf 2016 - ONEDock: Docker as a hypervisor in ONE by Carlos de Alf...
 
TechDay - Cambridge 2016 - OpenNebula at Knight Point Systems
TechDay - Cambridge 2016 - OpenNebula at Knight Point SystemsTechDay - Cambridge 2016 - OpenNebula at Knight Point Systems
TechDay - Cambridge 2016 - OpenNebula at Knight Point Systems
 
OpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on TutorialOpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on Tutorial
 
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebulaOpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
 
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
 
TechDay - April - Customizing VM Images
TechDay - April - Customizing VM ImagesTechDay - April - Customizing VM Images
TechDay - April - Customizing VM Images
 
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebulaD’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
D’une infrastructure de virtualisation scripté à un cloud privé OpenNebula
 
Customizing Virtual Machine Images - Javier Fontán
Customizing Virtual Machine Images - Javier FontánCustomizing Virtual Machine Images - Javier Fontán
Customizing Virtual Machine Images - Javier Fontán
 
An OpenNebula Private Cloud
An OpenNebula Private CloudAn OpenNebula Private Cloud
An OpenNebula Private Cloud
 
Open nebula is evolving paris techday 2015
Open nebula is evolving   paris techday 2015Open nebula is evolving   paris techday 2015
Open nebula is evolving paris techday 2015
 
How Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project FeedbackHow Can OpenNebula Fit Your Needs: A European Project Feedback
How Can OpenNebula Fit Your Needs: A European Project Feedback
 
How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013 How can OpenNebula fit your needs - OpenNebulaConf 2013
How can OpenNebula fit your needs - OpenNebulaConf 2013
 
Optimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_PerformanceOptimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_Performance
 
Locally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another HypervisorsLocally run a FIWARE Lab Instance In another Hypervisors
Locally run a FIWARE Lab Instance In another Hypervisors
 
OpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime Melis
OpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime MelisOpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime Melis
OpenNebulaConf2015 2.03 Docker-Machine and OpenNebula - Jaime Melis
 
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving FastOpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
 
OpenNebula TechDay Waterloo 2015 - Open nebula hands on workshop
OpenNebula TechDay Waterloo 2015 - Open nebula hands on workshopOpenNebula TechDay Waterloo 2015 - Open nebula hands on workshop
OpenNebula TechDay Waterloo 2015 - Open nebula hands on workshop
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Cobbler - Fast and reliable multi-OS provisioning
Cobbler - Fast and reliable multi-OS provisioningCobbler - Fast and reliable multi-OS provisioning
Cobbler - Fast and reliable multi-OS provisioning
 

Similaire à OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffara, NodeWeaver

Why threads are a bad idea
Why threads are a bad ideaWhy threads are a bad idea
Why threads are a bad idea
George Ang
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
Jérôme Petazzoni
 
Perfect Linux Desktop - OpenSuSE 12.2
Perfect Linux Desktop - OpenSuSE 12.2Perfect Linux Desktop - OpenSuSE 12.2
Perfect Linux Desktop - OpenSuSE 12.2
Davor Guttierrez
 

Similaire à OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffara, NodeWeaver (20)

Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersEasier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
 
Polstra 44con2012
Polstra 44con2012Polstra 44con2012
Polstra 44con2012
 
Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
 
MIPS-X
MIPS-XMIPS-X
MIPS-X
 
Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
syzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugssyzbot and the tale of million kernel bugs
syzbot and the tale of million kernel bugs
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势
 
Data corruption
Data corruptionData corruption
Data corruption
 
Multicore
MulticoreMulticore
Multicore
 
Why threads are a bad idea
Why threads are a bad ideaWhy threads are a bad idea
Why threads are a bad idea
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQDocker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
 
Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9 Docker Introduction + what is new in 0.9
Docker Introduction + what is new in 0.9
 
Summer of Fuzz: macOS
Summer of Fuzz: macOSSummer of Fuzz: macOS
Summer of Fuzz: macOS
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
Headless Android
Headless AndroidHeadless Android
Headless Android
 
Perfect Linux Desktop - OpenSuSE 12.2
Perfect Linux Desktop - OpenSuSE 12.2Perfect Linux Desktop - OpenSuSE 12.2
Perfect Linux Desktop - OpenSuSE 12.2
 
Lightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSLightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFS
 

Plus de OpenNebula Project

OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebula Project
 

Plus de OpenNebula Project (20)

OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
 
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
 
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
 
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
 
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAFOpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
 
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
 
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
 
Replacing vCloud with OpenNebula
Replacing vCloud with OpenNebulaReplacing vCloud with OpenNebula
Replacing vCloud with OpenNebula
 
NTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do ItNTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do It
 
OpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISPOpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISP
 
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbHNTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux Way
 
NetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebulaNetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebula
 
NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10
 
Security for Private Cloud Environments
Security for Private Cloud EnvironmentsSecurity for Private Cloud Environments
Security for Private Cloud Environments
 
CheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebulaCheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebula
 
DE-CIX: CloudConnectivity
DE-CIX: CloudConnectivityDE-CIX: CloudConnectivity
DE-CIX: CloudConnectivity
 
DDC Demo
DDC DemoDDC Demo
DDC Demo
 
Cloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebulaCloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebula
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffara, NodeWeaver

  • 1. Torturing OpenNebula for Fun and Profit Carlo Daffara - NodeWeaver
  • 2.
  • 3.
  • 4.
  • 5. ● Ensuring that the platform runs well in uncontrolled environment requires some attention to design (focused on the target) and lots of testing ● Some basic principles: ○ “perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away” - Antoine de Saint Exupéry ○ Complexity may be necessary at scale, but not for every application. Every piece that is added may break at some point
  • 6. Source: Werner Wogels, Real-time graph of microservice dependencies at http://amazon.com in 2008.
  • 7. ● If you ever ask the user for something, she becomes part of the system to be tested! … ● … which means that in principle, you should never ask the user for information that may be obtained in some other (automated) way ● The user may not understand, may not be there, may be mistaken by all the knobs and dials, or may be deliberately destructive
  • 8. ● Testing must be done on the complete system - software+hardware+configs … ● … because software faults are more common than hardware ones ● Faults are complex: stop, corruption, limping… ● Trust only what you measure (as Grace Hopper said: "One accurate measurement is worth a thousand expert opinions.")
  • 9. ● We model our system as a Petri Net ● We run a group of NodeWeaver images (within NodeWeaver), each with a set of disks attached to emulate local storage & multiple virtual ethernet links ● Within each emulated node, we run a small set of Centos images, that receive from contextualization the number of FIO runs and the kind of emulated workload ● And we run our little chaos monkey process (actually, some bash scripts)
  • 10. ● Disks: ○ detach disk, then destroy it ○ detach disk, then attach an empty disk ○ detach, wait (random), then reattach ○ Inject random data in a random file within the disk image ○ Inject random data in the disk image ● Network: virsh domif-setlink (up, down) to simulate a faulty cable (hint: https://dev.opennebula.org/issues/3219 pretty pleeeease... ) ● Virtual Node: Hardreset + full time cluster reset ● Future: wrong BIOS clock (through qemu -rtc base=XXXX), IPMI emulation, packet loss/latency/bandwidth (through NETem: only 25MB!)
  • 11. ● What we discovered: ○ The underlying filesystem is hugely important ○ EXT4 handles most of it, XFS works (but recovery may be very slow), BTRFS dies in horrible ways, ZFS barely notices ○ Using MySQL as the OpenNebula DB: every ≅25 crashes it requires some work, every ≅150 crashes requires non-trivial manual effort ○ Our custom SQLite (with WAL) survives happily (we compensate the lack of concurrency with a query sequencer) ○ LizardFS is highly tolerant of multiple, parallel failures - disk, network, whatever
  • 12. ● We took advantage of the exceptionally simple host probes mechanism, to add additional information that is used by the platform and the recovery heuristics ● Adding new probes takes very little time and effort - thanks to OpenNebula simplicity ● We continue to add probes (for example, the P-value for predicted user experience) and use background processes to add forecasts
  • 13.
  • 15. ● OpenNebula works exceptionally well under torture, both in virtual and physical testing ● LizardFS is amazingly resilient (CRC everywhere helps)... ● ...especially on ZFS with its transaction groups ● Chaos-monkey testing does not guarantee that every possible fault path is tested… ● ...yet it helps in finding paths that we never thought about - but our customers will for sure