Measure() or die()

By Arik Lerner
Team Lead Automation & Performance/Resilience
Measure() OR Die();
Measure
or
Die

- 3.5 years in Liveperson
- 2 years - Reporting Platform
- 1.5 years Team Lead Automation & Performance/Resilience
- Interests: Private pilot on Cessna 172
Bio

➔ How we monitor with e2e testing
➔ E2E Products & Persona’s
➔ The Awakens of the End2End Data
➔ Architecture & Life cycle
Meetup Agenda

About Liveperson
Liveperson transforms the
connection between brands and
consumers.

3BN Visits/month
200BN API calls/month
2 PB data a year
1.5 M Visits concurrent
Our Scale

Our Engineering
~200 people RnD
Constant innovation
Multiple Technologies
Fast release cycle

We Monitor Liveperson Services
By e2e tests which simulate
Real Business scenario
➔ Indicates real business problems
➔ Service availability from consumer eyes.
➔ Alert and acquire immediate action.
➔ Insight on our business services

Agent Login Enter into the system
Visitor init chatVisitor enter into site
Agent Chat
E2E Scenario Example

E2E customers expectations
➔Stability == TRUST
➔Investigatable
➔Service Coverage
➔Scale

Kibana - HAR statistics & Aggregation

E2E Persona’s
Production specialist
PMO
Management

This is Yossi.
When Yossi gets up in the morning
Yossi looks at the E2E RT dashboard
Yossi recognize failure
Yossi enters into E2E debug center tools
Yossi is smart!
Be like Yossi.
Production Specialist User Story

PMO User Story
This is Michal.
Before any software deployment
When dashboard failure rate is below 3%
Michal have a GO for deployment
Michal is smart!
Be like Michal.

Management story
This is Eli.
When Eli getup in the morning.
Eli looks into the Dashboard statistics
Eli can see the health and availability
Each Data Centers
Eli is smart!
Be like Eli.

➔ Total failures rate.
◆ Filter for each Data Center
◆ Filter each business flow
KPIs
➔ Trend to understand service stability
Widgets
What KPIs do I need to measure ?

➔ Total chats failure rate.
➔ Total missing engagements
➔ Total login failures
➔ Average login response time.
KPIs
➔ Failure cause break down
➔ Client location root cause
➔ Test scenario failures
Widgets
What KPIs do I need to measure ?

The Awakening of the
End2End Data

Start collecting the data!
➔ Get build failures/success
➔ Get failure cause
➔ Business flows
➔ Test duration
➔ Client location
➔ Data Center location
➔ Account
@Test
Raw Data Output

The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser's
interaction with a site. The common extension for these files is .har.
The specification for the HTTP Archive (HAR) format defines an archival format for HTTP transactions that can
be used by a web browser to export detailed performance data about web pages it loads. The specification for
this format is produced by the Web Performance Working Group[1] of the World Wide Web Consortium (W3C).
The specification is in draft form and is a work in progress.
HAR (Http Archive)
➔Logging web browser traffic

HAR proxy diagram
Proxy on
port XXX
Selenium
WebDriver
HAR
www.Liveperson.com
Request passes
through proxy
Based on BrowserMob embedded proxy server
Code snippet - adding proxy into Selenium

• N scenarios
• Running from M locations
• Running to X Data Centers
• Yields HAR Data
Question: how do we investigate the data for the
entire Farm/Location/Scenario ? etc...
Answer: aggregation.
Pop quiz:

Start with collecting the data!
@Test
Raw Data Output {
metaData:{
"Testname": ChatFlow,
"Account": qa12345,
"ClientLocation": US,
"DataCenter": UK,
}
}
MetadataHAR

Kafka (topic e2e)
Logstash + Elasticsearch
Kibana Dashboard
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
HAR
files@Test @Test
HAR
Processor
Files Output
Get Json
Send data
Code snippet send message into Kafka

Our benefits
➔ Data Retention - 30 days
➔ Ability to query and aggregate over the data for investigation
➔ Ability to build dashboards
➔ Access to the data thorough Elasticsearch APIs
ELK & HAR Downsides
➔ Complicated queries over Kibana
➔ ELK setup & maintenance
➔ When getting response timeout -> HAR displayed enormous number (need to be handled by code)

What more E2E outputs do we have ?
@Test
More Output BDD Reports
Video
Logs
Browser console logs

Code snippet
BDD - Behaviour Driven Development

MySql DB KAFKA + ELK
Kibana serviceE2E Reports
HAR data
e2e data
Graphite
Zabbix
Jenkins Master
Production
metrics
Grafana
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
DC-1 DC-2 DC-N
@Test @Test
RT Dashboard
Jenkins Master DR

E2E Test Lifecycle
DEV ProductionStagingQADEV

E2E @ Scale
➔ 1.5M http traffic records per day
➔ 200K runs per day
➔ 60 Jenkins slaves machines
➔ 28 scenarios
➔ 6 client location
➔ 6 Regions

What to take home ?
➔ Monitor your Data Centers from consumer experience
➔ Collect data
➔ Provide business meaning with the data.

YouTube.com/LivePersonDev
Twitter.com/LivePersonDev
Facebook.com/LivePersonDev
Slideshare.net/LivePersonDev

Measure() or die()

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Measure() or die()

Similaire à Measure() or die() (20)

Plus de LivePerson

Plus de LivePerson (13)

Dernier

Dernier (20)

Measure() or die()