SlideShare une entreprise Scribd logo
1  sur  39
SOFTWARE
ARCHITECTURE
IN THE AGE OF
CLOUD
COMPUTING
JAROSLAV GERGIC
Industrial Keynote
16th European Conference on Software
Architecture (ECSA), Prague,
19 – 23 September 2022
AG E N DA
INTRODUCTION
CLOUD SCALE COMPUTING
ARCHITECTING CLOUS SCALE SAAS
CLOSING THOUGHTS
SUMMARY
JA RO S L AV
G E RG I C
Always busy building the next big thing,
now living in the confluence of
cybersecurity, machine learning,
and cloud computing.
2022  1995:
Cisco, GoodData, Ariba, IBM Research, Reuters, Mobil
Server, LCS International
Mentoring: StartupYard, JIC, MSIC
I
N
T
R
O
D
U
C
T
I
O
N
3
C LO U D
C O M P U T I N G
LET’S DEFINE THE TERM
C
L
O
U
D
C
O
M
P
U
T
I
N
G
4
PUBLIC CLOUDS IN
CLOUD
COMPUTING
P
U
B
L
I
C
C
L
O
U
D
S
I
N
C
L
O
U
D
C
O
M
P
U
T
I
N
G
5
E N T E R P R I S E
S C A L E
is no longer the
summit of software
architecture
E
N
T
E
R
P
R
I
S
E
S
C
A
L
E
6
LET’S TALK CLOUD SCALE
Cloud Computing =/= Public Cloud
“
”
C
L
O
U
D
S
C
A
L
E
7
LET’S TALK CLOUD SCALE
Software as a Service (SaaS)
“
”
C
L
O
U
D
S
C
A
L
E
–
S
O
F
T
W
A
R
E
A
S
A
S
E
R
V
I
C
E
8
B2C
Serve millions or billions of users
• Facebook
• YouTube
• TikTok
• Seznam.cz
B2B
(tens of) thousands businesses
• Salesforce
• Dropbox*)
• WorkDay
• GoodData*)
H OW B I G I S C LO U D S C A L E ?
C
L
O
U
S
S
C
A
L
E
S
A
A
S
–
B
2
B
9
B 2 B S A A S : Cloud scale is at lest
three orders of
magnitude bigger than
enterprise scale.
Because you need to serve
thousands of enterprises.
C
L
O
U
D
S
C
A
L
E
B
2
B
S
A
A
S
10
Reverse Migration
• Both Dropbox and GoodData started
originally on AWS
• As they grew, they sought to reduce costs
• GoodData migrated to Rackspace
managed hosting in 2014
• Dropbox migrated to their own datacenters
in 2016
B2B
(tens of) thousands businesses
• Salesforce
• Dropbox*)
• WorkDay
• GoodData*)
P U B L I C C LO U D V S P R I VAT E D C
P
U
B
L
I
C
C
L
O
U
D
V
S
.
P
R
I
V
A
T
E
D
C
11
Public Cloud
• developer productivity
• time to market
• smaller scale
• high-margin product
Private Datacenter
• operational costs
• steady state product
• extreme scale
• margins under pressure
P U B L I C C LO U D V S P R I VAT E D C
P
U
B
L
I
C
C
L
O
U
D
V
S
.
P
R
I
V
A
T
E
D
C
12
C LO U D S C A L E
S A A S
A RC H I T E C T U R E
WHAT DOES IT TAKE TO ARCHITECT CLOUD
SCALE SOFTWARE AS A SERVICE?
C
L
O
U
D
S
C
A
L
E
S
A
A
S
A
R
C
H
I
T
E
C
T
U
R
E
13
Scalability Costs SLAs
Security
Compliance
Productivity
A RC H I T E C T I N G C LO U D S A A S
C
L
O
U
D
S
C
A
L
E
S
A
A
S
–
A
S
P
E
C
T
S
T
O
C
O
N
S
I
D
E
R
14
S C A L A B I L I T Y
• Horizontal scaling
• Distributed computing
• Redundancy and Fault Tolerance
• Elastic workloads
S
C
A
L
A
B
I
L
I
T
Y
15
SINGLE CAUSE OF FAILURE
(vs. Single Point of Failure)
S
I
N
G
L
E
C
A
U
S
E
O
F
F
A
I
L
U
R
E
16
S I N G L E C AU S E O F FA I LU R E
• DNS issue
• credentials rotation
• kernel update
• networking issue
• Infrastructure-level configuration change
S
I
N
G
L
E
C
A
U
S
E
O
F
F
A
I
L
U
R
E
17
Beware of ubiquitous things, which seemingly always work fine!
AVO I D I N G T H E P I T FA L L S
• avoid singletons at any cost*)
• always think of blast radius when any
component, service or piece of underlying
infrastructure fails
• pro tip: checkout out service mesh such as
ISTIO (https://istio.io/)
• allows us to operate multiple
interconnected K8S clusters
A
V
O
I
D
I
N
G
T
H
E
P
I
T
F
A
L
L
S
18
*) there can be only one!
C A PAC I T Y
P L A N N I N G
Why would I need to
do capacity planning in
a public cloud?
Is not it elastic by
design?
C
A
P
A
C
I
T
Y
P
L
A
N
N
I
N
G
19
C O S T S
• Gross Margin in SaaS
• Gross Margin = (Revenue – COGS)/Revenue
• COGS – Cost of Goods Sold
• What is COGS in SaaS?
• All costs needed to operate your SaaS offering.
• HW, SW, operations, support
• What is the benchmark Gross Margin in SaaS?
C
O
S
T
S
20
80%
C O S T S
• Gross Margin in SaaS
• Gross Margin = (Revenue – COGS)/Revenue
• COGS – Cost of Goods Sold
• What is COGS in SaaS?
• All costs needed to operate your SaaS offering.
• HW, SW, operations, support
• What is the benchmark Gross Margin in SaaS?
C
O
S
T
S
21
80%
COGS
$10
$2
C O S T S AV I N G S
STORIES FROM THE TRENCHES
C
O
S
T
S
A
V
I
N
G
S
22
L I N U X K E R N E L T U N I N G
Low-level Linux kernel settings
like huge pages and NUMA
options settings led to
35% - 40% performance boost
for the prevailing workloads
L
I
N
U
X
K
E
R
N
E
L
T
U
N
I
N
G
23
R E G U L A R E X P R E S S I O N S 1 0 1
Parsing input data at cloud
scale…
On multiple occasions we hit
performance issues with 3rd
party regex libraries in different
programming languages.
The improvement was > 10x.
R
E
G
U
L
A
R
E
X
P
R
E
S
S
I
O
N
S
1
0
1
24
E L A S T I C S C A L I N G W I T H S P OT
I N S TA N C E S
Use case:
• A stateful compute and memory
intensive workload driven by
incoming telemetry flow.
Solution:
• Fleet of inexpensive spot instances
coupled with ML-based capacity
predictor.
E
L
A
S
T
I
C
S
C
A
L
I
N
G
W
I
T
H
S
P
O
T
I
N
S
T
A
N
C
E
S
25
S L A S
• SLAs – Service Level Agreements
• Uptime, Latency, Throughput
• Recovery Time/Point Objectives (RTO/RPO)
• Requires supporting infrastructure
• Monitoring – metrics, dashboards
• Logging – instrumentation,
troubleshooting, auditing
• Alerting – 24/7 reliable notification with
duty rotation and escalation paths
S
L
A
S
26
S E C U R I T Y & C O M P L I A N C E
Security Compliance
Threat Modeling SOC 2, ISO 27001, HIPAA, GDPR, Accessibility
Vulnerability Management SOC – Security and Organization Controls
Access Controls SOC 2 - Security, Availability, Processing
Integrity, Confidentiality, or Privacy
Supply chain attack prevention Objectives -> Controls -> Assessments
Security Monitoring PII protection
S
E
C
U
R
I
T
Y
&
C
O
M
P
L
I
A
N
C
E
27
~30% of R&D effort
( D E V E LO P E R ) P RO D U C T I V I T Y
• Continuous Integration / Continuous
Delivery pipelines (CI/CD)
• Development, Testing and Release
Processes
• Quality Assurance, Cycle Time
• Making sure the above scale to many
R&D teams – avoiding bottlenecks.
(
D
E
V
E
L
O
P
E
R
)
P
R
O
D
U
C
T
I
V
I
T
Y
28
Scalability
•Horizontal
scaling
Distributed
computing
Redundancy
and Fault
Tolerance
Elastic
workloads
Costs
Gross Margin
Profiling
Performance
Tuning
Cost
Optimization
Capacity
Management
SLAs
Uptime
Latency
Throughput
RTO / RPO
Monitoring
Logging
Alerting
Security
Compliance
Threat
Modeling
Vulnerability
Management
Supply Chain
Management
Security
Monitoring
SOC2, ISO
27001
GDPR, PII
Productivity
•CI / CD
Development
Testing
Release
DevOps
Scalability
A RC H I T E C T I N G C LO U D S A A S
C
L
O
U
D
S
C
A
L
E
S
A
A
S
–
A
S
P
E
C
T
S
T
O
C
O
N
S
I
D
E
R
29
W H AT N E X T ?
NOW, WHEN I AM DONE ARCHITECTING AND
BUILDING MY CLOUD SCALE SAAS OFFERING?
P
R
E
S
E
N
T
A
T
I
O
N
T
I
T
L
E
30
E VO LV I N G : P U B L I C C LO U D
• Periodically review and benchmark new
instance types.
• Review, evaluate and benchmark new
services provided by the vendor.
• Issue recommendations and develop
blueprints for R&D teams.
• Plan migration.
• Rinse & Repeat.
E
V
O
L
V
I
N
G
:
P
U
B
L
I
C
C
L
O
U
D
31
E VO LV I N G : P R I VAT E D C
• Periodically perform capacity planning and
maintain HW order book based on up-to-date
predictions.
• Periodically review and benchmark new HW
generations. Negotiate prices with the
vendor(s).
• Issue recommendations and develop
blueprints for R&D teams.
• Plan migration.
• Rinse & Repeat.
E
V
O
L
V
I
N
G
:
P
U
B
L
I
C
C
L
O
U
D
32
S TA R T M E U P !
PRODUCTION WORKLOADS IN PUBLIC CLOUD
NEWCOMER GUIDE 2022 EDITION
S
T
A
R
T
M
E
U
P
!
33
W H E R E TO S TA R T ?
W
H
E
R
E
T
O
S
T
A
R
T
?
34
https://googlecloudcheatsheet.withgoogle.com/
P U B L I C C LO U D S I N 2 0 2 2
• All three leading public cloud providers
(AWS, Azure, GCP) exhibit increasing
complexity.
• It is relatively easy to spin up proof of
concepts or play with technologies.
• But launching production workloads is a
whole different story. It is not just about
getting started, but also about doing
things right.
P
U
B
L
I
C
C
L
O
U
D
S
I
N
2
0
2
2
35
S O F T W A R E A R C H I T E C T U R E
I N T H E A G E O F C L O U D
C O M P U T I N G
The cloud-era software architect needs to accommodate not only
functional requirements and customer-defined throughput and
performance requirements at cloud scale, but also a large set of non-
functional requirements related to cyber security, compliance, developer
productivity, and most notably also the financial/cost characteristics,
which at cloud scale can make or break a software-as-a-service
company. The role of a software architect thus became interdisciplinary
by nature and its mental model needs to accommodate all the above
non-functional aspects all the while maintain full picture across all
layers of the software stack from user-facing features all the way down
to operating system and underlying hardware platform levels.
Interdisciplinary “decathlon”
S
U
M
M
A
R
Y
36
O U T LO O K
S IM P LIFICATION
There are vast opportunities for simplification
and codification of best practices as the cloud
computing industry matures.
HYBRID CLOUD
Hybrid cloud deployments will become more
prevalent to protect margins.
O
U
T
L
O
O
K
37
QUESTIONS?
Q
U
E
S
T
I
O
N
S
38
T H A N K YO U
Jaroslav Gergic
@jgergic jaroslavgergic
https://cognitive.cisco.com/
https://conf.researchr.org/home/ecsa-2022
T
H
A
N
K
Y
O
U
&
C
O
N
T
A
C
T
S
39

Contenu connexe

Similaire à Software Architecture in the age of Cloud Computing

NaviSite Webinar_Scramble to Strategy_final
NaviSite Webinar_Scramble to Strategy_finalNaviSite Webinar_Scramble to Strategy_final
NaviSite Webinar_Scramble to Strategy_final
Ray Glass
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
Databricks
 

Similaire à Software Architecture in the age of Cloud Computing (20)

NaviSite Webinar_Scramble to Strategy_final
NaviSite Webinar_Scramble to Strategy_finalNaviSite Webinar_Scramble to Strategy_final
NaviSite Webinar_Scramble to Strategy_final
 
Enterprise serverless
Enterprise serverlessEnterprise serverless
Enterprise serverless
 
Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)
Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)
Connectivity is here (5 g, swarm,...). now, let's build interplanetary apps! (1)
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on Azure
 
AWS Summit Canberra Keynote 2016
AWS Summit Canberra Keynote 2016AWS Summit Canberra Keynote 2016
AWS Summit Canberra Keynote 2016
 
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your CloudHumans and Data Don’t Mix: Best Practices to Secure Your Cloud
Humans and Data Don’t Mix: Best Practices to Secure Your Cloud
 
Keynote - AWS Summit Milano 2018
Keynote - AWS Summit Milano 2018Keynote - AWS Summit Milano 2018
Keynote - AWS Summit Milano 2018
 
Advantages of Converged Infrastructures
Advantages of Converged InfrastructuresAdvantages of Converged Infrastructures
Advantages of Converged Infrastructures
 
Cast cloud april_2019
Cast cloud april_2019Cast cloud april_2019
Cast cloud april_2019
 
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Een andere kijk op Microservices
Een andere kijk op MicroservicesEen andere kijk op Microservices
Een andere kijk op Microservices
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
AWS Initiate Berlin - Plenary Session - Digitale Transformation im öffentlich...
AWS Initiate Berlin - Plenary Session - Digitale Transformation im öffentlich...AWS Initiate Berlin - Plenary Session - Digitale Transformation im öffentlich...
AWS Initiate Berlin - Plenary Session - Digitale Transformation im öffentlich...
 
UNICORN PROJECT - PAGE BROCHURE
UNICORN PROJECT - PAGE BROCHUREUNICORN PROJECT - PAGE BROCHURE
UNICORN PROJECT - PAGE BROCHURE
 
Soirée du Test Logiciel - Présentation de Kiuwan (Jack ABDO)
Soirée du Test Logiciel - Présentation de Kiuwan (Jack ABDO)Soirée du Test Logiciel - Présentation de Kiuwan (Jack ABDO)
Soirée du Test Logiciel - Présentation de Kiuwan (Jack ABDO)
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
 
Embracing the Risk and Opportunity of AI & Cloud.pptx
Embracing the Risk and Opportunity of AI & Cloud.pptxEmbracing the Risk and Opportunity of AI & Cloud.pptx
Embracing the Risk and Opportunity of AI & Cloud.pptx
 
Handout1o
Handout1oHandout1o
Handout1o
 
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORSBig Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
 

Plus de Jaroslav Gergic

Plus de Jaroslav Gergic (8)

Agile Development Practices May 2017
Agile Development Practices May 2017Agile Development Practices May 2017
Agile Development Practices May 2017
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
GoodData: The DevOps Story @ FIT CVUT October 16 2013
GoodData: The DevOps Story @ FIT CVUT October 16 2013GoodData: The DevOps Story @ FIT CVUT October 16 2013
GoodData: The DevOps Story @ FIT CVUT October 16 2013
 
Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...
Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...
Software Engineering in the Age of SaaS and Cloud Computing - SERA 2013 - MFF...
 
GoodData case study at "Nápad roku 2013" - "Jak vybudovat úspěšný globální st...
GoodData case study at "Nápad roku 2013" - "Jak vybudovat úspěšný globální st...GoodData case study at "Nápad roku 2013" - "Jak vybudovat úspěšný globální st...
GoodData case study at "Nápad roku 2013" - "Jak vybudovat úspěšný globální st...
 
eClub CVUT - How to organize work in a small startup? - Prague - April 11 2013
eClub CVUT - How to organize work in a small startup? - Prague - April 11 2013eClub CVUT - How to organize work in a small startup? - Prague - April 11 2013
eClub CVUT - How to organize work in a small startup? - Prague - April 11 2013
 
SaaS - Software as a Service - Charles University - Prague - March 2013
SaaS - Software as a Service - Charles University - Prague - March 2013SaaS - Software as a Service - Charles University - Prague - March 2013
SaaS - Software as a Service - Charles University - Prague - March 2013
 
CZJUG Intro - BI Platform as a Service - a case for Java in the Cloud
CZJUG Intro - BI Platform as a Service - a case for Java in the CloudCZJUG Intro - BI Platform as a Service - a case for Java in the Cloud
CZJUG Intro - BI Platform as a Service - a case for Java in the Cloud
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Software Architecture in the age of Cloud Computing

  • 1. SOFTWARE ARCHITECTURE IN THE AGE OF CLOUD COMPUTING JAROSLAV GERGIC Industrial Keynote 16th European Conference on Software Architecture (ECSA), Prague, 19 – 23 September 2022
  • 2. AG E N DA INTRODUCTION CLOUD SCALE COMPUTING ARCHITECTING CLOUS SCALE SAAS CLOSING THOUGHTS SUMMARY
  • 3. JA RO S L AV G E RG I C Always busy building the next big thing, now living in the confluence of cybersecurity, machine learning, and cloud computing. 2022  1995: Cisco, GoodData, Ariba, IBM Research, Reuters, Mobil Server, LCS International Mentoring: StartupYard, JIC, MSIC I N T R O D U C T I O N 3
  • 4. C LO U D C O M P U T I N G LET’S DEFINE THE TERM C L O U D C O M P U T I N G 4
  • 6. E N T E R P R I S E S C A L E is no longer the summit of software architecture E N T E R P R I S E S C A L E 6
  • 7. LET’S TALK CLOUD SCALE Cloud Computing =/= Public Cloud “ ” C L O U D S C A L E 7
  • 8. LET’S TALK CLOUD SCALE Software as a Service (SaaS) “ ” C L O U D S C A L E – S O F T W A R E A S A S E R V I C E 8
  • 9. B2C Serve millions or billions of users • Facebook • YouTube • TikTok • Seznam.cz B2B (tens of) thousands businesses • Salesforce • Dropbox*) • WorkDay • GoodData*) H OW B I G I S C LO U D S C A L E ? C L O U S S C A L E S A A S – B 2 B 9
  • 10. B 2 B S A A S : Cloud scale is at lest three orders of magnitude bigger than enterprise scale. Because you need to serve thousands of enterprises. C L O U D S C A L E B 2 B S A A S 10
  • 11. Reverse Migration • Both Dropbox and GoodData started originally on AWS • As they grew, they sought to reduce costs • GoodData migrated to Rackspace managed hosting in 2014 • Dropbox migrated to their own datacenters in 2016 B2B (tens of) thousands businesses • Salesforce • Dropbox*) • WorkDay • GoodData*) P U B L I C C LO U D V S P R I VAT E D C P U B L I C C L O U D V S . P R I V A T E D C 11
  • 12. Public Cloud • developer productivity • time to market • smaller scale • high-margin product Private Datacenter • operational costs • steady state product • extreme scale • margins under pressure P U B L I C C LO U D V S P R I VAT E D C P U B L I C C L O U D V S . P R I V A T E D C 12
  • 13. C LO U D S C A L E S A A S A RC H I T E C T U R E WHAT DOES IT TAKE TO ARCHITECT CLOUD SCALE SOFTWARE AS A SERVICE? C L O U D S C A L E S A A S A R C H I T E C T U R E 13
  • 14. Scalability Costs SLAs Security Compliance Productivity A RC H I T E C T I N G C LO U D S A A S C L O U D S C A L E S A A S – A S P E C T S T O C O N S I D E R 14
  • 15. S C A L A B I L I T Y • Horizontal scaling • Distributed computing • Redundancy and Fault Tolerance • Elastic workloads S C A L A B I L I T Y 15
  • 16. SINGLE CAUSE OF FAILURE (vs. Single Point of Failure) S I N G L E C A U S E O F F A I L U R E 16
  • 17. S I N G L E C AU S E O F FA I LU R E • DNS issue • credentials rotation • kernel update • networking issue • Infrastructure-level configuration change S I N G L E C A U S E O F F A I L U R E 17 Beware of ubiquitous things, which seemingly always work fine!
  • 18. AVO I D I N G T H E P I T FA L L S • avoid singletons at any cost*) • always think of blast radius when any component, service or piece of underlying infrastructure fails • pro tip: checkout out service mesh such as ISTIO (https://istio.io/) • allows us to operate multiple interconnected K8S clusters A V O I D I N G T H E P I T F A L L S 18 *) there can be only one!
  • 19. C A PAC I T Y P L A N N I N G Why would I need to do capacity planning in a public cloud? Is not it elastic by design? C A P A C I T Y P L A N N I N G 19
  • 20. C O S T S • Gross Margin in SaaS • Gross Margin = (Revenue – COGS)/Revenue • COGS – Cost of Goods Sold • What is COGS in SaaS? • All costs needed to operate your SaaS offering. • HW, SW, operations, support • What is the benchmark Gross Margin in SaaS? C O S T S 20 80%
  • 21. C O S T S • Gross Margin in SaaS • Gross Margin = (Revenue – COGS)/Revenue • COGS – Cost of Goods Sold • What is COGS in SaaS? • All costs needed to operate your SaaS offering. • HW, SW, operations, support • What is the benchmark Gross Margin in SaaS? C O S T S 21 80% COGS $10 $2
  • 22. C O S T S AV I N G S STORIES FROM THE TRENCHES C O S T S A V I N G S 22
  • 23. L I N U X K E R N E L T U N I N G Low-level Linux kernel settings like huge pages and NUMA options settings led to 35% - 40% performance boost for the prevailing workloads L I N U X K E R N E L T U N I N G 23
  • 24. R E G U L A R E X P R E S S I O N S 1 0 1 Parsing input data at cloud scale… On multiple occasions we hit performance issues with 3rd party regex libraries in different programming languages. The improvement was > 10x. R E G U L A R E X P R E S S I O N S 1 0 1 24
  • 25. E L A S T I C S C A L I N G W I T H S P OT I N S TA N C E S Use case: • A stateful compute and memory intensive workload driven by incoming telemetry flow. Solution: • Fleet of inexpensive spot instances coupled with ML-based capacity predictor. E L A S T I C S C A L I N G W I T H S P O T I N S T A N C E S 25
  • 26. S L A S • SLAs – Service Level Agreements • Uptime, Latency, Throughput • Recovery Time/Point Objectives (RTO/RPO) • Requires supporting infrastructure • Monitoring – metrics, dashboards • Logging – instrumentation, troubleshooting, auditing • Alerting – 24/7 reliable notification with duty rotation and escalation paths S L A S 26
  • 27. S E C U R I T Y & C O M P L I A N C E Security Compliance Threat Modeling SOC 2, ISO 27001, HIPAA, GDPR, Accessibility Vulnerability Management SOC – Security and Organization Controls Access Controls SOC 2 - Security, Availability, Processing Integrity, Confidentiality, or Privacy Supply chain attack prevention Objectives -> Controls -> Assessments Security Monitoring PII protection S E C U R I T Y & C O M P L I A N C E 27 ~30% of R&D effort
  • 28. ( D E V E LO P E R ) P RO D U C T I V I T Y • Continuous Integration / Continuous Delivery pipelines (CI/CD) • Development, Testing and Release Processes • Quality Assurance, Cycle Time • Making sure the above scale to many R&D teams – avoiding bottlenecks. ( D E V E L O P E R ) P R O D U C T I V I T Y 28
  • 29. Scalability •Horizontal scaling Distributed computing Redundancy and Fault Tolerance Elastic workloads Costs Gross Margin Profiling Performance Tuning Cost Optimization Capacity Management SLAs Uptime Latency Throughput RTO / RPO Monitoring Logging Alerting Security Compliance Threat Modeling Vulnerability Management Supply Chain Management Security Monitoring SOC2, ISO 27001 GDPR, PII Productivity •CI / CD Development Testing Release DevOps Scalability A RC H I T E C T I N G C LO U D S A A S C L O U D S C A L E S A A S – A S P E C T S T O C O N S I D E R 29
  • 30. W H AT N E X T ? NOW, WHEN I AM DONE ARCHITECTING AND BUILDING MY CLOUD SCALE SAAS OFFERING? P R E S E N T A T I O N T I T L E 30
  • 31. E VO LV I N G : P U B L I C C LO U D • Periodically review and benchmark new instance types. • Review, evaluate and benchmark new services provided by the vendor. • Issue recommendations and develop blueprints for R&D teams. • Plan migration. • Rinse & Repeat. E V O L V I N G : P U B L I C C L O U D 31
  • 32. E VO LV I N G : P R I VAT E D C • Periodically perform capacity planning and maintain HW order book based on up-to-date predictions. • Periodically review and benchmark new HW generations. Negotiate prices with the vendor(s). • Issue recommendations and develop blueprints for R&D teams. • Plan migration. • Rinse & Repeat. E V O L V I N G : P U B L I C C L O U D 32
  • 33. S TA R T M E U P ! PRODUCTION WORKLOADS IN PUBLIC CLOUD NEWCOMER GUIDE 2022 EDITION S T A R T M E U P ! 33
  • 34. W H E R E TO S TA R T ? W H E R E T O S T A R T ? 34 https://googlecloudcheatsheet.withgoogle.com/
  • 35. P U B L I C C LO U D S I N 2 0 2 2 • All three leading public cloud providers (AWS, Azure, GCP) exhibit increasing complexity. • It is relatively easy to spin up proof of concepts or play with technologies. • But launching production workloads is a whole different story. It is not just about getting started, but also about doing things right. P U B L I C C L O U D S I N 2 0 2 2 35
  • 36. S O F T W A R E A R C H I T E C T U R E I N T H E A G E O F C L O U D C O M P U T I N G The cloud-era software architect needs to accommodate not only functional requirements and customer-defined throughput and performance requirements at cloud scale, but also a large set of non- functional requirements related to cyber security, compliance, developer productivity, and most notably also the financial/cost characteristics, which at cloud scale can make or break a software-as-a-service company. The role of a software architect thus became interdisciplinary by nature and its mental model needs to accommodate all the above non-functional aspects all the while maintain full picture across all layers of the software stack from user-facing features all the way down to operating system and underlying hardware platform levels. Interdisciplinary “decathlon” S U M M A R Y 36
  • 37. O U T LO O K S IM P LIFICATION There are vast opportunities for simplification and codification of best practices as the cloud computing industry matures. HYBRID CLOUD Hybrid cloud deployments will become more prevalent to protect margins. O U T L O O K 37
  • 39. T H A N K YO U Jaroslav Gergic @jgergic jaroslavgergic https://cognitive.cisco.com/ https://conf.researchr.org/home/ecsa-2022 T H A N K Y O U & C O N T A C T S 39