SlideShare une entreprise Scribd logo
1  sur  30
Escalation Prediction on Defects Database
Dr. K. V. SubramaniamChetan Hireholi, 01FM14ESE006
GuideProject author
Problem statement
 Determine what lead to Escalation by interpreting the Defects Corpus of the
customer support cases
 Alert on the Escalation based on the nature of the Defects, correlate the
Escalations on defects discovered by the customers and find the trigger point
which leads to one such Escalation
Data Source
Incident
Database
CRs
Database
 The Incident Database: Contained the Customer Support
cases.
 The CRs Database: Internally used database which
details the cases which were Change Requests
Data Cleansing
The data in the Incidents and CRs Database had a lot of discrepancy (Ex. Rows
not in order, special characters in the Date Field, Multiple discrepancy in the
company names viz. Boeing, Boeing Inc.,)
Tools such as OpenRefine & Microsoft Excel helped in removing such
discrepancies.
3831
125
329
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Green Red Yellow
Total
Green
Red
Yellow
Incident Database
Understanding the workflow
Algorithms
20.779
70.22
J48 Decision Tree
Correctly Classified Incorrectly classified
1. J 48 Decision Tree: 2. Naïve Bayes (RED & YELLOW corpus):
Attributes selected: Escalation, Expectation, Modules, Severity.
Motivation to do Textual Analysis:
The discussion between the client and the developer is captured in the ‘Comments’ attribute in the
Incidents Database. By analyzing this can unearth additional info about the defects (viz. what triggered the
escalation?, initial escalation of a defect, nature of the client, etc.). This lead to the use of R to do Text
Mining
a. Attributes selected: Escalation, Expectation, Modules,
Severity.
b. Probability distribution for:
i. RED Escalation: 0.242 (24.2%)
ii. YELLOW Escalation: 0.758 (75.8%)
iii. When Escalation is RED, then it is more likely that the
Severity is URGENT, with its probability distribution: 0.449
(44.9%)
iv. When Escalation is YELLOW, then it is more likely that the
Severity is HIGH, with its probability distribution: 0.634
(63.4%)
3. Simple K Means method:
a. Cluster 1 formed: YELLOW, Investigate
Issue & Hotfix required, Installation, High
b. Cluster 2 formed: RED, Investigate Issue,
Installation, High
Text Mining using R
Why R over NLTK (Python)?
Easy to code, abundant packages
Faster Pre Processing of the text
Mining the E- mail dump
Create Corpus
(RED, YELLOW & GREEN)
Pre Processing of the Text
(Removing punctuations,
Stop words, Numbers, Noise)
Apply ‘tm’ package for Text
Mining the Corpus
Extract Graphs, Word Clouds
of the trigger points which
are causing Escalations
Results from Text mining
Final escalation state= GREEN; Observations made prior to RED
Most frequently usedThe affected module
Final escalation state= GREEN; Observations made prior to YELLOW
Aiding words / Prefix- Postfix Most frequently used
Words with highest frequency mined
Final escalation state= YELLOW; Observations made prior to RED
(only 4 cases)
Developer who is associated with the
bug/incident
Final escalation state= RED; Observations made prior to RED
(Incidents jumped to RED from YELLOW state)
Most frequently usedThe affected module
Observations made on RED corpus
(The whole RED escalated dump)
The term
“escalation” used
along with “please”
and “support”
indicates that the
escalation is RED or
it will get converted
to RED
Observations made on GREEN corpus
(The whole GREEN escalated dump)
The use of “Please” is not frequent;
which in turn indicates- there are no
much RED escalations happening in the
incident history
Escalation count on the defect dump
3831
125
329
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Green Red Yellow
Total
Green
Red
Yellow
Other observations made on Incidents
 For RED cases:
 (Where SEVERITY is URGENT) The Average number of days for a case to get escalated = 13.56 days
 (Where SEVERITY is HIGH) The Average number of days for a case to get escalated= 25.29 days
 (Where SEVERITY is MEDIUM) The Average number of days for a case to get escalated= 19.66 days
Analyzing Incidents: Customers vs Escalations
RHEINENERGIE, HEWLETT PACKARD,
DEUTSCHE BUNDESBANK: Highest number of
RED escalations
4
33
222222222222222
1111111111111111111111111111111111111111111111111111111111111111111111111111111111111
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
RHEINENERGIE
VR-LEASINGAG
SWIFTINC
EURIWARES.A.
CTCTechnology
INTESASANPAOLOS.P.A.
FASTWEBS.P.A.
THEBOEINGCOMPANY
TOYOTA
TELECOMITALIASPA
USCensus
TATACONSULTANCYSERVICESLTD
THECAPITALGROUPCOMPANIESINC
Walmart
HPCMS
STADTKÖLN
TycoElectronics;CITEC;GOVERNMENT…
SIACNYSEGROUP
BANKOFINDIA
RockwellCollins
ITCBANGALOREDATACENTRE
WELLSFARGO
NTTDATA
PostNordic
HEWLETT-PACKARDGMBH
PepsiCo.
FOXTELEVISIONSTATIONSINC
PACIFICORP
T-SYSTEMSINTERNATIONALGMBH
NTTWest/HPPSO
BOEHRINGERINGELHEIM
MEDCOHEALTH
BANGALOREELECTRICITYSUPPLY
McKesson
NTTDATA
Total
Total
 Total RED escalations: 125/6433; The below shows the highest number of escalations on
modules
Ops - Action Agent (opcacta) & Installation: Highest number of RED escalations
31
12
9 9
8
6
5 5
4 4 4
3 3
2 2 2 2 2 2 2
1 1 1 1 1 1 1 1
0
5
10
15
20
25
30
35
Total
Total
Analyzing Incidents: Modules vs Escalations
28
12 12
11 11
10
9
7
6 6 6
3 3
1
0
5
10
15
20
25
30
8.6 11.14 11.02 11.03 11 11.11 11.13 8.60.501 11.12 11.04 11.01 11.1 unknown 8.53
patch
Count of ESCALATION
Analyzing Incidents: S/w release vs Escalations
Row Labels Count of ESCALATION
8.6 28
11.14 12
11.02 12
11.03 11
11 11
11.11 10
11.13 9
8.60.501 7
11.12 6
11.04 6
11.01 6
11.1 3
unknown 3
8.53 patch 1
Grand Total 125
Analyzing Incidents: OS vs Escalations
83
4 3 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
0
10
20
30
40
50
60
70
80
90
Red
Red
Analyzing Incidents: Developer vs Escalations
prasad.m.k_hp.com:
Handled high number of
escalations 29
15
10 10
8 8
5 5 5 5
3 3
2 2 2 2 2
1 1 1 1 1 1 1 1 1
0
5
10
15
20
25
30
35
Total
Total
Analyzing CR data
10219
75 93
0
2000
4000
6000
8000
10000
12000
Total
Escalations in CR
N
Showstopper
Y
N Showstopper Y Grand Total
Count of ESCALATION 10219 75 93 10387
Note: For Defects or CRs (QCCR) ,
Showstopper would be marked for the
defects which are must fixes or immediate
fix is needed for a release
Analyzing CRs: Customers vs Escalations
TATA CONSULTANCY SERVICES LTD: Highest
”Showstopper” escalations
Allegis, NORTHROP GRUMMAN,PepperWeed:
Highest escalations
2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0
0.5
1
1.5
2
2.5
Showstopper
Y
Analyzing CRs: Modules vs Escalations
Ops - Monitor Agent (opcmona) &
Installation: Highest ”Showstopper” escalations
Installation & Lcore – Other: Highest escalations
17
3
5
4
1
2
1
2
23
4
1 1
2 2
4
1 1 1
31
20
10
5
3
2 2 2 2 2 2 2 2 2
1 1 1 1 1 1
0
5
10
15
20
25
30
35
Showstopper
Y
Analyzing CRs: S/w release vs Escalations
20
5
1 1
4
1 2
10
5
12
5
63
14
6
4
1 1 1 1
0
10
20
30
40
50
60
70
Showstopper
Y
Release 11 : Highest number of ”Showstopper” and ”Y” escalations
Analyzing CRs: OS vs Escalations
Windows (Version number
not clear): Highest number of
Escalations Both
“Showstopper” and “Y”
4 4 4
3 3
2
1
9
1
2
1 1
3
2
1 1 1
0
1
2
3
4
5
6
7
8
9
10
Showstopper
Y
Note: Submitter of CRs tend to
choose the OS fields as they want to.
Some choose the exact versions
where the issue was seen or reported
or some choose just at a high level.
No strict rules observed
Analyzing CRs: Developer vs Escalations
swati.sinha_hp.com:
Handled highest number of
Showstopper Escalations
umesh.sharoff_hp.com :
Handled highest number of
Escalations
1 1 1 1
9
2
4
1
3
1 1
2 2
6
1
3 3
2
8
1 1 1 1
2
1 1 1 1 1
6
2 2
1 1
8
7
6 6
4 4 4
3 3 3 3 3 3
2 2 2 2 2 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0
1
2
3
4
5
6
7
8
9
10
Showstopper
Y
Company behavior analysis: RHEINENERGIE
(Had maximum RED escalations)
28 incident cases Patterns observed:
◦ 6 RED escalation
◦ Mostly contains RED escalations (6/28); 21.28% chance that an incident logged in will be a
RED escalation
◦ Most reported module:
◦ Ops - Monitor Agent (opcmona) (7 nos.) ; 3 of them were RED escalated
◦ Installation (6 nos.)
◦ Perf – Collector (3 nos.)
◦ Average number of days a single incident handled: 73.5 days
◦ Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs;
◦ All the 6 RED escalations moved to CR;
◦ 8 GREEN escalations moved to CR;
◦ 1 YELLOW escalations moved to CR;
Company behavior analysis: APPLE INC
27 incident cases
Patterns observed:
No RED escalations ever
Mostly contains GREEN escalations (19/27); 70.37% chance that an incident logged in will be a
GREEN escalation
Most reported modules:
◦ Ops Monitor Agent (4 nos)
◦ Perf Collector (3 nos)
◦ Installation, Ops- Action Agent, Ops- Ops Agent, Perf Other (2 nos each)
Average number of days a single incident handled: 463.777 days
Number of incidents which move to CR: 10; 37.03% of the incidents move to CRs
Company behavior analysis: BOEING
33 incident cases
Patterns observed:
1 RED escalation
Mostly contains GREEN escalations (31/33); 93.93% chance that an incident logged in will be a GREEN
escalation
Most reported module:
◦ Installation (7 nos.)
◦ Perf Collector, Other (5 nos.)
◦ Perf GlancePlus (4 nos.)
◦ Perf ARM (RED escalation); 3% chance that it will be an RED escalation
Average number of days a single incident handled: 399.322 days
Number of incidents which move to CR: 22; 66.66% of the incidents move to CRs
Other observations made on Incidents
 DIFFERENCE_INITIAL_CLOSED and DAYS_SUPPORT_TO_CPE are not matching
-400
-300
-200
-100
0
100
200
300
400
500
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100103106109112115118121
DIFFERENCE_INITIAL_CLOSED DAYS_SUPPORT_TO_CPE

Contenu connexe

En vedette

Browsing Large Collections of Geo-Tagged Pictures
Browsing Large Collections of Geo-Tagged PicturesBrowsing Large Collections of Geo-Tagged Pictures
Browsing Large Collections of Geo-Tagged PicturesDavide Carboni
 
WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016
WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016
WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016bhatiak997
 
Prism break: Minimize surveillance and protect your privacy
Prism break: Minimize surveillance and protect your privacyPrism break: Minimize surveillance and protect your privacy
Prism break: Minimize surveillance and protect your privacydjtennant
 
Geolocation: Improving our BI solutions with SharePoint
Geolocation: Improving our BI solutions with SharePointGeolocation: Improving our BI solutions with SharePoint
Geolocation: Improving our BI solutions with SharePointRuben Pertusa Lopez
 
Social media and lawyers
Social media and lawyersSocial media and lawyers
Social media and lawyersKlamberg
 
12 faces bi business intelligence ~Abdoulaye Mouke Yansane
12 faces bi business intelligence ~Abdoulaye Mouke Yansane12 faces bi business intelligence ~Abdoulaye Mouke Yansane
12 faces bi business intelligence ~Abdoulaye Mouke YansaneAbdoulaye M Yansane
 
Location based targeting technologies for mobile advertisement ppt
Location based targeting technologies for mobile advertisement pptLocation based targeting technologies for mobile advertisement ppt
Location based targeting technologies for mobile advertisement pptYiwei Chen
 
Mobile Advertising 2014 - Targeting Your Audience
Mobile Advertising 2014 - Targeting Your AudienceMobile Advertising 2014 - Targeting Your Audience
Mobile Advertising 2014 - Targeting Your AudiencePurplegator
 
Google Geo APIs Overview
Google Geo APIs OverviewGoogle Geo APIs Overview
Google Geo APIs OverviewOssama Alami
 
Ib geo skills overview
Ib geo skills overviewIb geo skills overview
Ib geo skills overviewTom McLean
 
Electronic Surveillance of Communications 100225
Electronic Surveillance of Communications 100225Electronic Surveillance of Communications 100225
Electronic Surveillance of Communications 100225Klamberg
 
Data Protection
 in the Age of Big Data
Data Protection
 in the Age of Big DataData Protection
 in the Age of Big Data
Data Protection
 in the Age of Big DataArthit Suriyawongkul
 
India electronic security market report 2020 |India CCTV Market |India Video...
India electronic security market report  2020 |India CCTV Market |India Video...India electronic security market report  2020 |India CCTV Market |India Video...
India electronic security market report 2020 |India CCTV Market |India Video...Ken Research Pvt ltd.
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternZakaria Zubi
 
JARenetiFullCV21December2015
JARenetiFullCV21December2015JARenetiFullCV21December2015
JARenetiFullCV21December2015Jenny Reneti
 
Promar Standard September 2015
Promar Standard September 2015Promar Standard September 2015
Promar Standard September 2015John Giles
 
bahce turk
bahce turkbahce turk
bahce turkH&M
 
Benessere e produttività al lavoro? Dipendono anche dall'orecchio.
Benessere e produttività al lavoro? Dipendono anche dall'orecchio.Benessere e produttività al lavoro? Dipendono anche dall'orecchio.
Benessere e produttività al lavoro? Dipendono anche dall'orecchio.Antonio Becherucci
 

En vedette (20)

Browsing Large Collections of Geo-Tagged Pictures
Browsing Large Collections of Geo-Tagged PicturesBrowsing Large Collections of Geo-Tagged Pictures
Browsing Large Collections of Geo-Tagged Pictures
 
WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016
WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016
WEA-Final-Report-Geo-Targeting-Cell-Radio-RF-Field-Validation-042016
 
Prism break: Minimize surveillance and protect your privacy
Prism break: Minimize surveillance and protect your privacyPrism break: Minimize surveillance and protect your privacy
Prism break: Minimize surveillance and protect your privacy
 
Geolocation: Improving our BI solutions with SharePoint
Geolocation: Improving our BI solutions with SharePointGeolocation: Improving our BI solutions with SharePoint
Geolocation: Improving our BI solutions with SharePoint
 
Social media and lawyers
Social media and lawyersSocial media and lawyers
Social media and lawyers
 
12 faces bi business intelligence ~Abdoulaye Mouke Yansane
12 faces bi business intelligence ~Abdoulaye Mouke Yansane12 faces bi business intelligence ~Abdoulaye Mouke Yansane
12 faces bi business intelligence ~Abdoulaye Mouke Yansane
 
Location based targeting technologies for mobile advertisement ppt
Location based targeting technologies for mobile advertisement pptLocation based targeting technologies for mobile advertisement ppt
Location based targeting technologies for mobile advertisement ppt
 
Mobile Advertising 2014 - Targeting Your Audience
Mobile Advertising 2014 - Targeting Your AudienceMobile Advertising 2014 - Targeting Your Audience
Mobile Advertising 2014 - Targeting Your Audience
 
Google Geo APIs Overview
Google Geo APIs OverviewGoogle Geo APIs Overview
Google Geo APIs Overview
 
Aliens
AliensAliens
Aliens
 
Ib geo skills overview
Ib geo skills overviewIb geo skills overview
Ib geo skills overview
 
Electronic Surveillance of Communications 100225
Electronic Surveillance of Communications 100225Electronic Surveillance of Communications 100225
Electronic Surveillance of Communications 100225
 
Data Protection
 in the Age of Big Data
Data Protection
 in the Age of Big DataData Protection
 in the Age of Big Data
Data Protection
 in the Age of Big Data
 
India electronic security market report 2020 |India CCTV Market |India Video...
India electronic security market report  2020 |India CCTV Market |India Video...India electronic security market report  2020 |India CCTV Market |India Video...
India electronic security market report 2020 |India CCTV Market |India Video...
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
JARenetiFullCV21December2015
JARenetiFullCV21December2015JARenetiFullCV21December2015
JARenetiFullCV21December2015
 
Promar Standard September 2015
Promar Standard September 2015Promar Standard September 2015
Promar Standard September 2015
 
bahce turk
bahce turkbahce turk
bahce turk
 
Benessere e produttività al lavoro? Dipendono anche dall'orecchio.
Benessere e produttività al lavoro? Dipendono anche dall'orecchio.Benessere e produttività al lavoro? Dipendono anche dall'orecchio.
Benessere e produttività al lavoro? Dipendono anche dall'orecchio.
 

Similaire à A Machine Learning approach to predict Software Defects

2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systems2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systemsJaap van Ekris
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)Dinis Cruz
 
Optimal+ GSA 2014
Optimal+ GSA  2014Optimal+ GSA  2014
Optimal+ GSA 2014OptimalPlus
 
Functional safety certification guide
Functional safety certification guideFunctional safety certification guide
Functional safety certification guideMohammed Majid Khan
 
RCFA Success of Root Cause Failure Analysis
RCFA Success of Root Cause Failure AnalysisRCFA Success of Root Cause Failure Analysis
RCFA Success of Root Cause Failure AnalysisAbdulrahman Alkhowaiter
 
Cigdem gencel persistence in poor estimating in software engineering- whys an...
Cigdem gencel persistence in poor estimating in software engineering- whys an...Cigdem gencel persistence in poor estimating in software engineering- whys an...
Cigdem gencel persistence in poor estimating in software engineering- whys an...oxwocs
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production ReadinessAmazon Web Services
 
Atlas Services Remote Analysis Report Sample
Atlas Services Remote Analysis Report SampleAtlas Services Remote Analysis Report Sample
Atlas Services Remote Analysis Report SampleExtraHop Networks
 
Software Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life CycleSoftware Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life CycleDerek Callaway
 
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningAmazon Web Services
 
Data mining final report
Data mining final reportData mining final report
Data mining final reportKedar Kumar
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Ann Marie Neufelder
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Ann Marie Neufelder
 
Root Cause Analysis Guide Book.pdf
Root Cause Analysis Guide Book.pdfRoot Cause Analysis Guide Book.pdf
Root Cause Analysis Guide Book.pdfRohitLakhotia12
 
Effective Prioritization Through Exploit Prediction
Effective Prioritization Through Exploit Prediction Effective Prioritization Through Exploit Prediction
Effective Prioritization Through Exploit Prediction Jonathan Cran
 
V center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentationV center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentationsolarisyourep
 
When do software issues get reported in large open source software
When do software issues get reported in large open source softwareWhen do software issues get reported in large open source software
When do software issues get reported in large open source softwareRAKESH RANA
 

Similaire à A Machine Learning approach to predict Software Defects (20)

2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systems2011-05-02 - VU Amsterdam - Testing safety critical systems
2011-05-02 - VU Amsterdam - Testing safety critical systems
 
Hgch1
Hgch1Hgch1
Hgch1
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
Optimal+ GSA 2014
Optimal+ GSA  2014Optimal+ GSA  2014
Optimal+ GSA 2014
 
Functional safety certification guide
Functional safety certification guideFunctional safety certification guide
Functional safety certification guide
 
RCFA Success of Root Cause Failure Analysis
RCFA Success of Root Cause Failure AnalysisRCFA Success of Root Cause Failure Analysis
RCFA Success of Root Cause Failure Analysis
 
Cigdem gencel persistence in poor estimating in software engineering- whys an...
Cigdem gencel persistence in poor estimating in software engineering- whys an...Cigdem gencel persistence in poor estimating in software engineering- whys an...
Cigdem gencel persistence in poor estimating in software engineering- whys an...
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production Readiness
 
Atlas Services Remote Analysis Report Sample
Atlas Services Remote Analysis Report SampleAtlas Services Remote Analysis Report Sample
Atlas Services Remote Analysis Report Sample
 
Software Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life CycleSoftware Testing: Test Design and the Project Life Cycle
Software Testing: Test Design and the Project Life Cycle
 
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
 
Data mining final report
Data mining final reportData mining final report
Data mining final report
 
FMEA Presentation V1.1
FMEA Presentation V1.1FMEA Presentation V1.1
FMEA Presentation V1.1
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...
 
Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...Four things that are almost guaranteed to reduce the reliability of a softwa...
Four things that are almost guaranteed to reduce the reliability of a softwa...
 
Root Cause Analysis Guide Book.pdf
Root Cause Analysis Guide Book.pdfRoot Cause Analysis Guide Book.pdf
Root Cause Analysis Guide Book.pdf
 
Effective Prioritization Through Exploit Prediction
Effective Prioritization Through Exploit Prediction Effective Prioritization Through Exploit Prediction
Effective Prioritization Through Exploit Prediction
 
V center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentationV center operations enterprise standalone technical presentation
V center operations enterprise standalone technical presentation
 
When do software issues get reported in large open source software
When do software issues get reported in large open source softwareWhen do software issues get reported in large open source software
When do software issues get reported in large open source software
 
Ch01-whyTest.pptx
Ch01-whyTest.pptxCh01-whyTest.pptx
Ch01-whyTest.pptx
 

Dernier

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 

Dernier (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 

A Machine Learning approach to predict Software Defects

  • 1. Escalation Prediction on Defects Database Dr. K. V. SubramaniamChetan Hireholi, 01FM14ESE006 GuideProject author
  • 2. Problem statement  Determine what lead to Escalation by interpreting the Defects Corpus of the customer support cases  Alert on the Escalation based on the nature of the Defects, correlate the Escalations on defects discovered by the customers and find the trigger point which leads to one such Escalation
  • 3. Data Source Incident Database CRs Database  The Incident Database: Contained the Customer Support cases.  The CRs Database: Internally used database which details the cases which were Change Requests
  • 4. Data Cleansing The data in the Incidents and CRs Database had a lot of discrepancy (Ex. Rows not in order, special characters in the Date Field, Multiple discrepancy in the company names viz. Boeing, Boeing Inc.,) Tools such as OpenRefine & Microsoft Excel helped in removing such discrepancies. 3831 125 329 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Green Red Yellow Total Green Red Yellow Incident Database
  • 6. Algorithms 20.779 70.22 J48 Decision Tree Correctly Classified Incorrectly classified 1. J 48 Decision Tree: 2. Naïve Bayes (RED & YELLOW corpus): Attributes selected: Escalation, Expectation, Modules, Severity. Motivation to do Textual Analysis: The discussion between the client and the developer is captured in the ‘Comments’ attribute in the Incidents Database. By analyzing this can unearth additional info about the defects (viz. what triggered the escalation?, initial escalation of a defect, nature of the client, etc.). This lead to the use of R to do Text Mining a. Attributes selected: Escalation, Expectation, Modules, Severity. b. Probability distribution for: i. RED Escalation: 0.242 (24.2%) ii. YELLOW Escalation: 0.758 (75.8%) iii. When Escalation is RED, then it is more likely that the Severity is URGENT, with its probability distribution: 0.449 (44.9%) iv. When Escalation is YELLOW, then it is more likely that the Severity is HIGH, with its probability distribution: 0.634 (63.4%) 3. Simple K Means method: a. Cluster 1 formed: YELLOW, Investigate Issue & Hotfix required, Installation, High b. Cluster 2 formed: RED, Investigate Issue, Installation, High
  • 7. Text Mining using R Why R over NLTK (Python)? Easy to code, abundant packages Faster Pre Processing of the text Mining the E- mail dump Create Corpus (RED, YELLOW & GREEN) Pre Processing of the Text (Removing punctuations, Stop words, Numbers, Noise) Apply ‘tm’ package for Text Mining the Corpus Extract Graphs, Word Clouds of the trigger points which are causing Escalations
  • 9. Final escalation state= GREEN; Observations made prior to RED Most frequently usedThe affected module
  • 10. Final escalation state= GREEN; Observations made prior to YELLOW Aiding words / Prefix- Postfix Most frequently used Words with highest frequency mined
  • 11. Final escalation state= YELLOW; Observations made prior to RED (only 4 cases) Developer who is associated with the bug/incident
  • 12. Final escalation state= RED; Observations made prior to RED (Incidents jumped to RED from YELLOW state) Most frequently usedThe affected module
  • 13. Observations made on RED corpus (The whole RED escalated dump) The term “escalation” used along with “please” and “support” indicates that the escalation is RED or it will get converted to RED
  • 14. Observations made on GREEN corpus (The whole GREEN escalated dump) The use of “Please” is not frequent; which in turn indicates- there are no much RED escalations happening in the incident history Escalation count on the defect dump 3831 125 329 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Green Red Yellow Total Green Red Yellow
  • 15. Other observations made on Incidents  For RED cases:  (Where SEVERITY is URGENT) The Average number of days for a case to get escalated = 13.56 days  (Where SEVERITY is HIGH) The Average number of days for a case to get escalated= 25.29 days  (Where SEVERITY is MEDIUM) The Average number of days for a case to get escalated= 19.66 days
  • 16. Analyzing Incidents: Customers vs Escalations RHEINENERGIE, HEWLETT PACKARD, DEUTSCHE BUNDESBANK: Highest number of RED escalations 4 33 222222222222222 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 RHEINENERGIE VR-LEASINGAG SWIFTINC EURIWARES.A. CTCTechnology INTESASANPAOLOS.P.A. FASTWEBS.P.A. THEBOEINGCOMPANY TOYOTA TELECOMITALIASPA USCensus TATACONSULTANCYSERVICESLTD THECAPITALGROUPCOMPANIESINC Walmart HPCMS STADTKÖLN TycoElectronics;CITEC;GOVERNMENT… SIACNYSEGROUP BANKOFINDIA RockwellCollins ITCBANGALOREDATACENTRE WELLSFARGO NTTDATA PostNordic HEWLETT-PACKARDGMBH PepsiCo. FOXTELEVISIONSTATIONSINC PACIFICORP T-SYSTEMSINTERNATIONALGMBH NTTWest/HPPSO BOEHRINGERINGELHEIM MEDCOHEALTH BANGALOREELECTRICITYSUPPLY McKesson NTTDATA Total Total
  • 17.  Total RED escalations: 125/6433; The below shows the highest number of escalations on modules Ops - Action Agent (opcacta) & Installation: Highest number of RED escalations 31 12 9 9 8 6 5 5 4 4 4 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 0 5 10 15 20 25 30 35 Total Total Analyzing Incidents: Modules vs Escalations
  • 18. 28 12 12 11 11 10 9 7 6 6 6 3 3 1 0 5 10 15 20 25 30 8.6 11.14 11.02 11.03 11 11.11 11.13 8.60.501 11.12 11.04 11.01 11.1 unknown 8.53 patch Count of ESCALATION Analyzing Incidents: S/w release vs Escalations Row Labels Count of ESCALATION 8.6 28 11.14 12 11.02 12 11.03 11 11 11 11.11 10 11.13 9 8.60.501 7 11.12 6 11.04 6 11.01 6 11.1 3 unknown 3 8.53 patch 1 Grand Total 125
  • 19. Analyzing Incidents: OS vs Escalations 83 4 3 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 0 10 20 30 40 50 60 70 80 90 Red Red
  • 20. Analyzing Incidents: Developer vs Escalations prasad.m.k_hp.com: Handled high number of escalations 29 15 10 10 8 8 5 5 5 5 3 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 0 5 10 15 20 25 30 35 Total Total
  • 21. Analyzing CR data 10219 75 93 0 2000 4000 6000 8000 10000 12000 Total Escalations in CR N Showstopper Y N Showstopper Y Grand Total Count of ESCALATION 10219 75 93 10387 Note: For Defects or CRs (QCCR) , Showstopper would be marked for the defects which are must fixes or immediate fix is needed for a release
  • 22. Analyzing CRs: Customers vs Escalations TATA CONSULTANCY SERVICES LTD: Highest ”Showstopper” escalations Allegis, NORTHROP GRUMMAN,PepperWeed: Highest escalations 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0.5 1 1.5 2 2.5 Showstopper Y
  • 23. Analyzing CRs: Modules vs Escalations Ops - Monitor Agent (opcmona) & Installation: Highest ”Showstopper” escalations Installation & Lcore – Other: Highest escalations 17 3 5 4 1 2 1 2 23 4 1 1 2 2 4 1 1 1 31 20 10 5 3 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0 5 10 15 20 25 30 35 Showstopper Y
  • 24. Analyzing CRs: S/w release vs Escalations 20 5 1 1 4 1 2 10 5 12 5 63 14 6 4 1 1 1 1 0 10 20 30 40 50 60 70 Showstopper Y Release 11 : Highest number of ”Showstopper” and ”Y” escalations
  • 25. Analyzing CRs: OS vs Escalations Windows (Version number not clear): Highest number of Escalations Both “Showstopper” and “Y” 4 4 4 3 3 2 1 9 1 2 1 1 3 2 1 1 1 0 1 2 3 4 5 6 7 8 9 10 Showstopper Y Note: Submitter of CRs tend to choose the OS fields as they want to. Some choose the exact versions where the issue was seen or reported or some choose just at a high level. No strict rules observed
  • 26. Analyzing CRs: Developer vs Escalations swati.sinha_hp.com: Handled highest number of Showstopper Escalations umesh.sharoff_hp.com : Handled highest number of Escalations 1 1 1 1 9 2 4 1 3 1 1 2 2 6 1 3 3 2 8 1 1 1 1 2 1 1 1 1 1 6 2 2 1 1 8 7 6 6 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 10 Showstopper Y
  • 27. Company behavior analysis: RHEINENERGIE (Had maximum RED escalations) 28 incident cases Patterns observed: ◦ 6 RED escalation ◦ Mostly contains RED escalations (6/28); 21.28% chance that an incident logged in will be a RED escalation ◦ Most reported module: ◦ Ops - Monitor Agent (opcmona) (7 nos.) ; 3 of them were RED escalated ◦ Installation (6 nos.) ◦ Perf – Collector (3 nos.) ◦ Average number of days a single incident handled: 73.5 days ◦ Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs; ◦ All the 6 RED escalations moved to CR; ◦ 8 GREEN escalations moved to CR; ◦ 1 YELLOW escalations moved to CR;
  • 28. Company behavior analysis: APPLE INC 27 incident cases Patterns observed: No RED escalations ever Mostly contains GREEN escalations (19/27); 70.37% chance that an incident logged in will be a GREEN escalation Most reported modules: ◦ Ops Monitor Agent (4 nos) ◦ Perf Collector (3 nos) ◦ Installation, Ops- Action Agent, Ops- Ops Agent, Perf Other (2 nos each) Average number of days a single incident handled: 463.777 days Number of incidents which move to CR: 10; 37.03% of the incidents move to CRs
  • 29. Company behavior analysis: BOEING 33 incident cases Patterns observed: 1 RED escalation Mostly contains GREEN escalations (31/33); 93.93% chance that an incident logged in will be a GREEN escalation Most reported module: ◦ Installation (7 nos.) ◦ Perf Collector, Other (5 nos.) ◦ Perf GlancePlus (4 nos.) ◦ Perf ARM (RED escalation); 3% chance that it will be an RED escalation Average number of days a single incident handled: 399.322 days Number of incidents which move to CR: 22; 66.66% of the incidents move to CRs
  • 30. Other observations made on Incidents  DIFFERENCE_INITIAL_CLOSED and DAYS_SUPPORT_TO_CPE are not matching -400 -300 -200 -100 0 100 200 300 400 500 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100103106109112115118121 DIFFERENCE_INITIAL_CLOSED DAYS_SUPPORT_TO_CPE