In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .
Over 200K tests runs per day providing statistics and insights into the problem as they happen.
Arik will go through different topics and stages of the journey and share details that led to current results .
Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)
In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.
Boost Fertility New Invention Ups Success Rates.pdf
Measure() or die()
1.
2. By Arik Lerner
Team Lead Automation & Performance/Resilience
Measure() OR Die();
Measure
or
Die
3. - 3.5 years in Liveperson
- 2 years - Reporting Platform
- 1.5 years Team Lead Automation & Performance/Resilience
- Interests: Private pilot on Cessna 172
Bio
4.
5. ➔ How we monitor with e2e testing
➔ E2E Products & Persona’s
➔ The Awakens of the End2End Data
➔ Architecture & Life cycle
Meetup Agenda
10. We Monitor Liveperson Services
By e2e tests which simulate
Real Business scenario
➔ Indicates real business problems
➔ Service availability from consumer eyes.
➔ Alert and acquire immediate action.
➔ Insight on our business services
11. Agent Login Enter into the system
Visitor init chatVisitor enter into site
Agent Chat
E2E Scenario Example
18. This is Yossi.
When Yossi gets up in the morning
Yossi looks at the E2E RT dashboard
Yossi recognize failure
Yossi enters into E2E debug center tools
Yossi is smart!
Be like Yossi.
Production Specialist User Story
19. PMO User Story
This is Michal.
Before any software deployment
When dashboard failure rate is below 3%
Michal have a GO for deployment
Michal is smart!
Be like Michal.
20. Management story
This is Eli.
When Eli getup in the morning.
Eli looks into the Dashboard statistics
Eli can see the health and availability
Each Data Centers
Eli is smart!
Be like Eli.
21.
22. ➔ Total failures rate.
◆ Filter for each Data Center
◆ Filter each business flow
KPIs
➔ Trend to understand service stability
Widgets
What KPIs do I need to measure ?
23. ➔ Total chats failure rate.
➔ Total missing engagements
➔ Total login failures
➔ Average login response time.
KPIs
➔ Failure cause break down
➔ Client location root cause
➔ Test scenario failures
Widgets
What KPIs do I need to measure ?
26. Start collecting the data!
➔ Get build failures/success
➔ Get failure cause
➔ Business flows
➔ Test duration
➔ Client location
➔ Data Center location
➔ Account
@Test
Raw Data Output
27. The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser's
interaction with a site. The common extension for these files is .har.
The specification for the HTTP Archive (HAR) format defines an archival format for HTTP transactions that can
be used by a web browser to export detailed performance data about web pages it loads. The specification for
this format is produced by the Web Performance Working Group[1] of the World Wide Web Consortium (W3C).
The specification is in draft form and is a work in progress.
HAR (Http Archive)
➔Logging web browser traffic
28. HAR proxy diagram
Proxy on
port XXX
Selenium
WebDriver
HAR
www.Liveperson.com
Request passes
through proxy
Based on BrowserMob embedded proxy server
Code snippet - adding proxy into Selenium
29. • N scenarios
• Running from M locations
• Running to X Data Centers
• Yields HAR Data
Question: how do we investigate the data for the
entire Farm/Location/Scenario ? etc...
Answer: aggregation.
Pop quiz:
30. Start with collecting the data!
@Test
Raw Data Output {
metaData:{
"Testname": ChatFlow,
"Account": qa12345,
"ClientLocation": US,
"DataCenter": UK,
}
}
MetadataHAR
31. Kafka (topic e2e)
Logstash + Elasticsearch
Kibana Dashboard
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
HAR
files@Test @Test
HAR
Processor
Files Output
Get Json
Send data
Code snippet send message into Kafka
32. Our benefits
➔ Data Retention - 30 days
➔ Ability to query and aggregate over the data for investigation
➔ Ability to build dashboards
➔ Access to the data thorough Elasticsearch APIs
ELK & HAR Downsides
➔ Complicated queries over Kibana
➔ ELK setup & maintenance
➔ When getting response timeout -> HAR displayed enormous number (need to be handled by code)
33. What more E2E outputs do we have ?
@Test
More Output BDD Reports
Video
Logs
Browser console logs
36. MySql DB KAFKA + ELK
Kibana serviceE2E Reports
HAR data
e2e data
Graphite
Zabbix
Jenkins Master
Production
metrics
Grafana
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
Jenkin
s Slave
DC-1 DC-2 DC-N
@Test @Test
RT Dashboard
Jenkins Master DR