SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
ASE Sept 6, 2018@_jon_bell_
Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser and Baishakhi Ray
Dropbox, George Mason University, Google and Columbia University
Fork
Parikshan
on
G
itH
ub
Replay without Recording of
Production Bugs for Service Oriented
Applications
ASE Sept 6, 2018@_jon_bell_
Performance Bugs
Developer
AvgThroughput
Time
App throughput over time
Why is performance getting worse?
ASE Sept 6, 2018@_jon_bell_
Debugging Production Bugs
• What CAN we do?
• Inspect logs
• Heap dumps and telemetry
• Sample application inputs and performance
• What CAN’T we do?
• Anything that introduces more overhead (can’t use debugger)
• Anything that might impact application correctness (can’t change code)
ASE Sept 6, 2018@_jon_bell_
Debugging Production
Failures in Development
Developer
AvgThroughput
Time
App throughput over time
SOA App
[Production]
SOA App
[Testing]
AvgThroughput
Time
App throughput over time
Bug does not appear in
debugging/testing environment
#!?
ASE Sept 6, 2018@_jon_bell_
Problem: Distributed App State
Developer
SOA App [Production]
DNS
Apache/
NGINX
Glassfish App
Server
Cache Database
Glassfish App
Server
Glassfish App
Server
Each component has its own accumulated state; it’s unknown which
component(s) are buggy and which state is relevant!
ASE Sept 6, 2018@_jon_bell_
Live Debugging
• What if our developer could attach their favorite debugging tools
directly to the production environment?
• Would allow existing state-of-the-art tools used to debug programs in
the lab to be applied directly to field failures
• But, we are constrained from making modifications or introducing
latency to production service
ASE Sept 6, 2018@_jon_bell_
Live Debugging with Parikshan
Developer
SOA App [Production]
DNS Database
Apache/
NGINX
Cache
Glassfish App
Server
Glassfish App
Server
Glassfish App
Server
SOA App [Debugging]
DNS Database
Apache/
NGINX
Cache
Glassfish App
Server
Glassfish App
Server
Glassfish App
Server
“Debug” environment mirrors production, contains the same bad state that caused the bug
ASE Sept 6, 2018@_jon_bell_
Challenge: Maintaining Synchronization
SOA App [Production]
DNS
Apache/
NGINX
Glassfish App
Server
Cache Database
Glassfish App
Server
Glassfish App
Server
SOA App [Debugging]
DNS
Apache/
NGINX
Glassfish App
Server
Cache Database
Glassfish App
Server
Glassfish App
Server
Glassfish App
Server
Cache
Glassfish App
Server
Database
The moment after it’s created, the debug environment will diverge!
Developer
#!?
ASE Sept 6, 2018@_jon_bell_
Parikshan
• To enable live debugging, address two key challenges:
• 1: How to create the debug container?
• 2: How to keep the debug container in sync with production?
ASE Sept 6, 2018@_jon_bell_
Physical Machine
Key Insight: Containers
are Everywhere!
Parikshan adopts live migration technology from
containers/VMs to do live cloning
Physical Machine
Container
Glassfish App
Server
Live Migration:
ASE Sept 6, 2018@_jon_bell_
NAT
Live Cloning
• Live Cloning vs Live Migration
• Encapsulate both containers in different networks but have internal network ports
or addresses remain the same for the processes running within each container
• Live Cloning starts both the original container and the debug container at the end
of the cloning
Physical MachinePhysical Machine
Container
Glassfish App
Server
Container
Glassfish App
Server
ASE Sept 6, 2018@_jon_bell_
SOA App
[Production]
SOA App
[Debugging]
Users
Parikshan Relies on Network
Duplication
Key insight: ditch traditional
high-fidelity record and replay
Thread scheduling
decisions
System calls
ASE Sept 6, 2018@_jon_bell_
SOA App
[Production]
SOA App
[Debugging]
Network
Duplicator
(Asynchronous)
2: Install network
proxies
3: Monitor responses for
divergence
Network
Aggregator
Buffer
Network Duplication
UsersDeveloper
Asynchronous duplicator buffers requests allowing debug
environment to be completely paused (until buffer is full)
1: Replica environment
created with live cloning
(e.g. of VM or container)
Debugs without fear of
breaking production
ASE Sept 6, 2018@_jon_bell_
Buffer
Detecting Divergence
SOA App
[Production]
SOA App
[Debugging]
Users
Network
Duplicator
(Asynchronous)
Network
Aggregator
Developer
Network aggregator checks packets on responses to measure
user-perceived divergence
Response
History
ASE Sept 6, 2018@_jon_bell_
Is network data enough to
reproduce real bugs?
ASE Sept 6, 2018@_jon_bell_
Network Data IS Often Enough!
• 217 real-world bugs- Apache (45), MySQL (96), HDFS (76) from issue trackers
• Only excluded: feature requests, misunderstandings, etc
Key Takeaway:
• Approx. 80% semantic bugs, 6% non-deterministic
• Manually confirmed these bugs could be triggered by network input
Apache
HTTPD MySQL HDFS
Performance
Semantic
Concurrency
Resource Leak
ASE Sept 6, 2018@_jon_bell_
Reproducing Real Bugs
Category Bug ID Application Symptom/Cause Deterministic Crash Trigger
Performance
Bugs
MySQL
#15811
mysql-5.0.15
Bug caused due to multiple calls
in a loop
Yes No Repeated insert into table
MySQL

#26527
mysql-5.1.14
Load data is slow in a partitioned
table
Yes No
Create table with partition and
load data
MySQL

#49491
Mysql-5.1.38
Calculation of hash values
inefficient
Yes No MySQL client select requests
Redis 

#614
Redis-2.6.0
Master + replica, not replicated
correctly
Yes No
Setup replication, push and pop
some elements
Resource
Leaks
Redis

#417
Redis-2.4.9 Memory leak in master Yes No Concurrent key set requests
Redis

#487
Redis-2.6.14
Keys* command duplicate or
omits keys
Yes No
Set keys to execute specific set of
requests
Semantic
Bugs
Cassandra
#5225
Cassandra-1.5.2 Missing columns from wide row Yes No Fetch columns from cassandra
Cassandra
#1837
Cassandra-0.7.0
Deleted columns become
available after flush
Yes No Insert, delete and flush columns
Redis 

#761
Redis-2.6.0 Crash with a large integer input Yes Yes Query for a input of a large integer
ASE Sept 6, 2018@_jon_bell_
Reproducing Real Bugs
Category Bug ID Application Symptom/Cause Deterministic Crash Trigger
Concurrency
Bugs
Apache
#25520
Httpd-2.0.4
Per-child buffer management not
thread safe
No No
Continuous concurrent requests
initiated by the client
Apache

#21287
Httpd-2.0.48

Php-4.4.1
Dangling pointer due to atomicity
violation
No Yes
Concurrent requests initiated by
the client
MySQL

#644
Mysql-4.1 Data race leading to crash No Yes Concurrent select queries
MySQL

#169
Mysql-3.23
Race condition leading to out-of-
order logging
No No Delete and insert requests
MySQL
#791
Mysql-4.0 Race-visible in logging No No
Concurrent flush log and insert
requests
Configuration
Bug
Redis #957 Redis-2.6.11 Slave cannot sync with master Yes No Load a very large database
HDFS

#1904
Hdfs-0.23.0
Create a directory in wrong
location
Yes No Create new directory
ASE Sept 6, 2018@_jon_bell_
End-to-End Evaluation
• Recreated partial Wikipedia DB, & ran workload trace from 2008 (while
maintaining a replica)
• Difference in latencies between the proxy and duplicate was found to
be less than 2%
ASE Sept 6, 2018@_jon_bell_
Microbenchmark: Network Forwarding
• (Bonus, not in paper)
• Measured network-level performance using iPerf
• Native Mode: Direct network communication
• Proxy Mode: Network communication via a Proxy
• Duplication Mode: Network communication with a proxy duplicating traffic
• The bandwidth difference
between the proxy and
duplication was less than
0.5%
• The latency difference
between http ping requests
was less than 0.3%
934
943
0
20
Native Proxy Duplication
Throughput(Mbps)
ASE Sept 6, 2018@_jon_bell_
Benchmarks: Live Cloning
• Measured the time to do a live clone on five
applications, plus an empty container (“Basic”)
• Suspend time ranged 2-3 seconds for Apache,
Thttpd, ~10 for TradeBeans/TradeSoap (large
heaps in JVM)
• Note time is mostly in doing the actual copy
Time to clone, broken down by step
Replay without Recording of Production Bugs for
Service Oriented Applications
Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser and Baishakhi Ray
Dropbox, George Mason University, Google and Columbia University
https://github.com/Programming-Systems-Lab/parikshan

Contenu connexe

Tendances

BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!Andreas Grabner
 
DOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBsDOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBsGene Kim
 
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin KnaufVirtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin KnaufFlink Forward
 
PHP on Windows 2008
PHP on Windows 2008PHP on Windows 2008
PHP on Windows 2008jorke
 
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleDOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleGene Kim
 
Delphix and DBmaestro
Delphix and DBmaestroDelphix and DBmaestro
Delphix and DBmaestroKyle Hailey
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Railsjduff
 
Continuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at NetflixContinuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at NetflixAtlassian
 
Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? - Xiaow...
Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? -  Xiaow...Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? -  Xiaow...
Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? - Xiaow...Flink Forward
 
Creating an Effective Mobile API
Creating an Effective Mobile API Creating an Effective Mobile API
Creating an Effective Mobile API Nick DeNardis
 
2011 aug-gdd-mexico-city-high-replication-datastore
2011 aug-gdd-mexico-city-high-replication-datastore2011 aug-gdd-mexico-city-high-replication-datastore
2011 aug-gdd-mexico-city-high-replication-datastoreikailan
 
Strategies to edit production data
Strategies to edit production dataStrategies to edit production data
Strategies to edit production dataSoftware Guru
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environmentEuropean Collaboration Summit
 
2011 august-gdd-mexico-city-rest-json-oauth
2011 august-gdd-mexico-city-rest-json-oauth2011 august-gdd-mexico-city-rest-json-oauth
2011 august-gdd-mexico-city-rest-json-oauthikailan
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prodYan Cui
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture PatternsCelso Crivelaro
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture PatternsCelso Crivelaro
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInMichael Kehoe
 

Tendances (20)

BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
 
DOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBsDOES SFO 2016 - Chris Fulton - CD for DBs
DOES SFO 2016 - Chris Fulton - CD for DBs
 
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin KnaufVirtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
Virtual Flink Forward 2020: Apache Flink Worst Wractices - Konstantin Knauf
 
PHP on Windows 2008
PHP on Windows 2008PHP on Windows 2008
PHP on Windows 2008
 
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleDOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
 
Delphix and DBmaestro
Delphix and DBmaestroDelphix and DBmaestro
Delphix and DBmaestro
 
How Shopify Scales Rails
How Shopify Scales RailsHow Shopify Scales Rails
How Shopify Scales Rails
 
EVOLVE'13 | Keynote | Roy Fielding
EVOLVE'13 | Keynote | Roy FieldingEVOLVE'13 | Keynote | Roy Fielding
EVOLVE'13 | Keynote | Roy Fielding
 
Continuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at NetflixContinuously Integrating Distributed Code at Netflix
Continuously Integrating Distributed Code at Netflix
 
Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? - Xiaow...
Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? -  Xiaow...Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? -  Xiaow...
Virtual Flink Forward 2020: Data Warehouse, Data Lakes, What's Next? - Xiaow...
 
Creating an Effective Mobile API
Creating an Effective Mobile API Creating an Effective Mobile API
Creating an Effective Mobile API
 
2011 aug-gdd-mexico-city-high-replication-datastore
2011 aug-gdd-mexico-city-high-replication-datastore2011 aug-gdd-mexico-city-high-replication-datastore
2011 aug-gdd-mexico-city-high-replication-datastore
 
Strategies to edit production data
Strategies to edit production dataStrategies to edit production data
Strategies to edit production data
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
2011 august-gdd-mexico-city-rest-json-oauth
2011 august-gdd-mexico-city-rest-json-oauth2011 august-gdd-mexico-city-rest-json-oauth
2011 august-gdd-mexico-city-rest-json-oauth
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture Patterns
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture Patterns
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 

Similaire à Replay without Recording of Production Bugs for Service Oriented Applications

Ase 2018 parikshan
Ase 2018 parikshanAse 2018 parikshan
Ase 2018 parikshangailkaiser
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
Open source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packagesOpen source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packagesRogue Wave Software
 
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricOSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricNETWAYS
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...apidays
 
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...Capgemini
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Andreas Grabner
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2Sean Braymen
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsAndreas Grabner
 
Optimizing React at Postmates
Optimizing React at PostmatesOptimizing React at Postmates
Optimizing React at PostmatesTrey Huffine
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
Characterizing Defective Configuration Scripts Used for Continuous Deployment
Characterizing Defective Configuration Scripts Used for Continuous DeploymentCharacterizing Defective Configuration Scripts Used for Continuous Deployment
Characterizing Defective Configuration Scripts Used for Continuous DeploymentAkond Rahman
 
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...Andreas Grabner
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 Omnilogy
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Catalogic Software
 

Similaire à Replay without Recording of Production Bugs for Service Oriented Applications (20)

Ase 2018 parikshan
Ase 2018 parikshanAse 2018 parikshan
Ase 2018 parikshan
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Open source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packagesOpen source: Top issues in the top enterprise packages
Open source: Top issues in the top enterprise packages
 
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabricOSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
 
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
apidays LIVE LONDON - Protecting financial-grade APIs - Getting the right API...
 
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance ProblemsFour Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
 
Optimizing React at Postmates
Optimizing React at PostmatesOptimizing React at Postmates
Optimizing React at Postmates
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Characterizing Defective Configuration Scripts Used for Continuous Deployment
Characterizing Defective Configuration Scripts Used for Continuous DeploymentCharacterizing Defective Configuration Scripts Used for Continuous Deployment
Characterizing Defective Configuration Scripts Used for Continuous Deployment
 
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31
 
Version Control meets Database Control
Version Control meets Database ControlVersion Control meets Database Control
Version Control meets Database Control
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems Five Ways to Fix Your SQL Server Dev-Test Problems
Five Ways to Fix Your SQL Server Dev-Test Problems
 

Plus de jon_bell

A Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage EvolutionA Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage Evolutionjon_bell
 
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)jon_bell
 
Efficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test AccelerationEfficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test Accelerationjon_bell
 
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMsPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMsjon_bell
 
ICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVMICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVMjon_bell
 
Unit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing TimeUnit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing Timejon_bell
 
Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...
Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...
Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...jon_bell
 
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of WarcraftA Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraftjon_bell
 

Plus de jon_bell (8)

A Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage EvolutionA Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage Evolution
 
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
 
Efficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test AccelerationEfficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test Acceleration
 
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMsPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
 
ICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVMICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVM
 
Unit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing TimeUnit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing Time
 
Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...
Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...
Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at I...
 
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of WarcraftA Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
 

Dernier

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 

Dernier (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 

Replay without Recording of Production Bugs for Service Oriented Applications

  • 1. ASE Sept 6, 2018@_jon_bell_ Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser and Baishakhi Ray Dropbox, George Mason University, Google and Columbia University Fork Parikshan on G itH ub Replay without Recording of Production Bugs for Service Oriented Applications
  • 2. ASE Sept 6, 2018@_jon_bell_ Performance Bugs Developer AvgThroughput Time App throughput over time Why is performance getting worse?
  • 3. ASE Sept 6, 2018@_jon_bell_ Debugging Production Bugs • What CAN we do? • Inspect logs • Heap dumps and telemetry • Sample application inputs and performance • What CAN’T we do? • Anything that introduces more overhead (can’t use debugger) • Anything that might impact application correctness (can’t change code)
  • 4. ASE Sept 6, 2018@_jon_bell_ Debugging Production Failures in Development Developer AvgThroughput Time App throughput over time SOA App [Production] SOA App [Testing] AvgThroughput Time App throughput over time Bug does not appear in debugging/testing environment #!?
  • 5. ASE Sept 6, 2018@_jon_bell_ Problem: Distributed App State Developer SOA App [Production] DNS Apache/ NGINX Glassfish App Server Cache Database Glassfish App Server Glassfish App Server Each component has its own accumulated state; it’s unknown which component(s) are buggy and which state is relevant!
  • 6. ASE Sept 6, 2018@_jon_bell_ Live Debugging • What if our developer could attach their favorite debugging tools directly to the production environment? • Would allow existing state-of-the-art tools used to debug programs in the lab to be applied directly to field failures • But, we are constrained from making modifications or introducing latency to production service
  • 7. ASE Sept 6, 2018@_jon_bell_ Live Debugging with Parikshan Developer SOA App [Production] DNS Database Apache/ NGINX Cache Glassfish App Server Glassfish App Server Glassfish App Server SOA App [Debugging] DNS Database Apache/ NGINX Cache Glassfish App Server Glassfish App Server Glassfish App Server “Debug” environment mirrors production, contains the same bad state that caused the bug
  • 8. ASE Sept 6, 2018@_jon_bell_ Challenge: Maintaining Synchronization SOA App [Production] DNS Apache/ NGINX Glassfish App Server Cache Database Glassfish App Server Glassfish App Server SOA App [Debugging] DNS Apache/ NGINX Glassfish App Server Cache Database Glassfish App Server Glassfish App Server Glassfish App Server Cache Glassfish App Server Database The moment after it’s created, the debug environment will diverge! Developer #!?
  • 9. ASE Sept 6, 2018@_jon_bell_ Parikshan • To enable live debugging, address two key challenges: • 1: How to create the debug container? • 2: How to keep the debug container in sync with production?
  • 10. ASE Sept 6, 2018@_jon_bell_ Physical Machine Key Insight: Containers are Everywhere! Parikshan adopts live migration technology from containers/VMs to do live cloning Physical Machine Container Glassfish App Server Live Migration:
  • 11. ASE Sept 6, 2018@_jon_bell_ NAT Live Cloning • Live Cloning vs Live Migration • Encapsulate both containers in different networks but have internal network ports or addresses remain the same for the processes running within each container • Live Cloning starts both the original container and the debug container at the end of the cloning Physical MachinePhysical Machine Container Glassfish App Server Container Glassfish App Server
  • 12. ASE Sept 6, 2018@_jon_bell_ SOA App [Production] SOA App [Debugging] Users Parikshan Relies on Network Duplication Key insight: ditch traditional high-fidelity record and replay Thread scheduling decisions System calls
  • 13. ASE Sept 6, 2018@_jon_bell_ SOA App [Production] SOA App [Debugging] Network Duplicator (Asynchronous) 2: Install network proxies 3: Monitor responses for divergence Network Aggregator Buffer Network Duplication UsersDeveloper Asynchronous duplicator buffers requests allowing debug environment to be completely paused (until buffer is full) 1: Replica environment created with live cloning (e.g. of VM or container) Debugs without fear of breaking production
  • 14. ASE Sept 6, 2018@_jon_bell_ Buffer Detecting Divergence SOA App [Production] SOA App [Debugging] Users Network Duplicator (Asynchronous) Network Aggregator Developer Network aggregator checks packets on responses to measure user-perceived divergence Response History
  • 15. ASE Sept 6, 2018@_jon_bell_ Is network data enough to reproduce real bugs?
  • 16. ASE Sept 6, 2018@_jon_bell_ Network Data IS Often Enough! • 217 real-world bugs- Apache (45), MySQL (96), HDFS (76) from issue trackers • Only excluded: feature requests, misunderstandings, etc Key Takeaway: • Approx. 80% semantic bugs, 6% non-deterministic • Manually confirmed these bugs could be triggered by network input Apache HTTPD MySQL HDFS Performance Semantic Concurrency Resource Leak
  • 17. ASE Sept 6, 2018@_jon_bell_ Reproducing Real Bugs Category Bug ID Application Symptom/Cause Deterministic Crash Trigger Performance Bugs MySQL #15811 mysql-5.0.15 Bug caused due to multiple calls in a loop Yes No Repeated insert into table MySQL #26527 mysql-5.1.14 Load data is slow in a partitioned table Yes No Create table with partition and load data MySQL #49491 Mysql-5.1.38 Calculation of hash values inefficient Yes No MySQL client select requests Redis #614 Redis-2.6.0 Master + replica, not replicated correctly Yes No Setup replication, push and pop some elements Resource Leaks Redis #417 Redis-2.4.9 Memory leak in master Yes No Concurrent key set requests Redis #487 Redis-2.6.14 Keys* command duplicate or omits keys Yes No Set keys to execute specific set of requests Semantic Bugs Cassandra #5225 Cassandra-1.5.2 Missing columns from wide row Yes No Fetch columns from cassandra Cassandra #1837 Cassandra-0.7.0 Deleted columns become available after flush Yes No Insert, delete and flush columns Redis #761 Redis-2.6.0 Crash with a large integer input Yes Yes Query for a input of a large integer
  • 18. ASE Sept 6, 2018@_jon_bell_ Reproducing Real Bugs Category Bug ID Application Symptom/Cause Deterministic Crash Trigger Concurrency Bugs Apache #25520 Httpd-2.0.4 Per-child buffer management not thread safe No No Continuous concurrent requests initiated by the client Apache #21287 Httpd-2.0.48 Php-4.4.1 Dangling pointer due to atomicity violation No Yes Concurrent requests initiated by the client MySQL #644 Mysql-4.1 Data race leading to crash No Yes Concurrent select queries MySQL #169 Mysql-3.23 Race condition leading to out-of- order logging No No Delete and insert requests MySQL #791 Mysql-4.0 Race-visible in logging No No Concurrent flush log and insert requests Configuration Bug Redis #957 Redis-2.6.11 Slave cannot sync with master Yes No Load a very large database HDFS #1904 Hdfs-0.23.0 Create a directory in wrong location Yes No Create new directory
  • 19. ASE Sept 6, 2018@_jon_bell_ End-to-End Evaluation • Recreated partial Wikipedia DB, & ran workload trace from 2008 (while maintaining a replica) • Difference in latencies between the proxy and duplicate was found to be less than 2%
  • 20. ASE Sept 6, 2018@_jon_bell_ Microbenchmark: Network Forwarding • (Bonus, not in paper) • Measured network-level performance using iPerf • Native Mode: Direct network communication • Proxy Mode: Network communication via a Proxy • Duplication Mode: Network communication with a proxy duplicating traffic • The bandwidth difference between the proxy and duplication was less than 0.5% • The latency difference between http ping requests was less than 0.3% 934 943 0 20 Native Proxy Duplication Throughput(Mbps)
  • 21. ASE Sept 6, 2018@_jon_bell_ Benchmarks: Live Cloning • Measured the time to do a live clone on five applications, plus an empty container (“Basic”) • Suspend time ranged 2-3 seconds for Apache, Thttpd, ~10 for TradeBeans/TradeSoap (large heaps in JVM) • Note time is mostly in doing the actual copy Time to clone, broken down by step
  • 22. Replay without Recording of Production Bugs for Service Oriented Applications Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser and Baishakhi Ray Dropbox, George Mason University, Google and Columbia University https://github.com/Programming-Systems-Lab/parikshan