Stefan Wallins Ph D Presentation 23 Feb 2013.
http://pure.ltu.se/portal/sv/publications/rethinking-network-management-solutions%28524ec0f6-7cb3-45bd-b350-72a21f0b7c6e%29.html
4. Main Thesis
Use domain-specific languages to
specify alarm and service models
Explicit knowledge
Text-based representation
Use data-mining and self-learning to
capture “hard-to-model” things
Tacit knowledge
4
5. Research Structure
Service Models
Alarm Models
Configuration Changes
Service Type
Status Calculation
Causality
Alarm Type
Service Type
Component
Alarm Type
Device Type Constraints
Constraints
5
6. Problems and
Contributions
Alarm Models
Defined a Domain-Specific
Alarm Type
Language BASS for specifying
Causality
alarm models
Alarm Type Model Quality
Constraints
Automatic Correlation
Data-Mining and Self-Learning to
assign alarm severity levels
Service Models
Domain-Specific Languages for
Configuration Changes
Service
Service Management
Status Calculation
Type
Service Defined SALmon for monitoring
Type
Component Test of IETF YANG for Service
Configuration
Device
Type
Constraints
6
7. Attacking the Problems
Challenges
Solutions Computer Science
Service Providers Validations Solutions specialists from
Equipment Vendors
me • LTU
• Data Ductus
• Tail-f
• YALTS
Journals
Conferences
7
8. Publication Overview
Journals Conferences/Workshops
IEEE IT Professional
IFIP ManWeek
Springer
Journal of Network and IEEE IM
Systems Management
IEEE NOMS
John Wiley & Sons
International Journal of Usenix LISA
Network Management
IEEE AINA TeNAS
Inderscience
International Journal of IEEE SOSE
Business Intelligence and
Data-Mining
Springer
Telecommunications
Systems
8
9. Contents
Problems? – Input from Service
Providers
The Alarm Problem The Service Management Problem
Alarm Solutions Service Management Solutions
BASS Monitoring with SALmon
Alarm prioritization Configuring with IETF YANG
Conclusions and Future
Work
Acknowledgements
9
14. Alarm Chain
?
Managed System Management System
Estimated
Resource Alarm Estimated
Alarms Resource
States Notifications Alarms
States
Alarm Type
Resource
Severity
Raise / Clear
Text
14
15. The Alarm Problem
Most network elements […] does not
have the notion of an alarm state.
Devices emit event notifications whenever
an implementor thought this is a good
idea
[around] 40% percent of the alarms are
considered to be redundant as many alarms
appear at the same time for one ’fault’. Many
alarms are also repeated [...]. One alarm had for
example appeared 65000 times in today’s
browser. Correlation is hardly used even if it
supported by the systems, [current correlation
level is] 1-2 % maybe.
15
16. The Alarm Problem
Too many
?
> 1 / Sec
Which ones are relevant?
Several alarms for the same fault
Wrong severity levels
Interpreting meaning and impact
16
17. Interpreting an Alarm
*A0628/546 /08-07-01/10 H 38/ N=0407/TYP=ICT/CAT=SI
/EVENT=DAL/NCEN=AMS1
/AM=SMTA7/AGEO=S1-TR03-B06-A085-R000
/TEXAL=IND RECEPTION/COMPL.INF: /AF=URMA7/ICTQ7
AGCA=S1-TR03-B06-A085-R117/DAT=08-07-01/HRS=10-38-14
/AMET=07-020-01 /AFLR=175-011/PLS/CRC=NACT
/NSAE=186/NSGE=186/NIND=14/INDI=956/NSDT=0
17
22. Research Structure
Service Models
Service Type Alarm Models
Configuration Changes
Status Calculation
Alarm Type
Causality
Service Type
Component
Alarm Type
Device Type Constraints
Constraints
22
23. Alarms Today
?
We have:
Alarm interface standards
Envelope, the parameters
Alarm documentation
Informal documents for humans
What we do not have:
Formal alarm definitions that can be used for automation
The contents of the envelope
“Alarm Model”
23
24. Alarm Model
BASS
Alarm Types
Predicates
Constraints
- Information
- Semantic
24
26. Bass Prototype and
Validation
.alarm Correlated
Alarm Doc Alarm DB
from Feedback Correlation
BASS Rules Alarms from
Real Vendor Real Operator
Uncorrelated
Documentation
Graphs
Information Constraints
Semantic Constraints
26
29. Alarm Monitoring
Data-Mining and Self-Learning
Assigning Correct Severity Levels by Learning from Experts
29
30. Research Structure
Service Models
Service Type Alarm Models
Configuration Changes
Status Calculation
Alarm Type
Causality
Service Type
Component
Alarm Type
Device Type Constraints
Constraints
30
31. Learning Alarm Priorities
Databases
From Real
Service
Provider
Suggest
Neural Network Priority
Alarm Trouble Ticket
Alarm Prio System System
Training Priority
31
32. Result
Distribution of Errors
Neural
Original network
severity
• Neural network
correct in 53 %
Percentage of Alarms
• Original severity
correct in 11 %
Magnitude of Error
32
Too high Too low
34. Service Management
”Services are not currently managed well in any suite of
applications and require a tremendous amount of work to
maintain”
”Service models are becoming more and more important”
”Focus on service management - bringing this up to 40%
from [the] current level of 5-10%”
”Managing services must be the focus of the future
development, while pushing network management into a
supporting role”
34
35. Complex Structures
“Service Models” Configuration
Class Name
Attribute
Attribute
Operation
Operation Software
Class Name
Implementation
Attribute
Attribute
Operation
Operation
Monitoring
Interpretations and Tedious Mappings
35
36.
37. Research Structure
Service Models
Service Type Alarm Models
Configuration Changes
Status Calculation
Alarm Type
Causality
Service Type
Component
Alarm Type
Device Type Constraints
Constraints
37
38. My Two Tracks for Service
Management
Configuration Changes Service Type
Status Calculation
IETF YANG
SALmon
Service Type
Component
Device Type
1 Model the Services
2 Express the transformations
38
39. Simplifed Structures
Configuration
Models Models
Monitoring
Remove room for interpretations
and automate mappings
39
41. SALmon Test
• The TR-126
model could be
executed
SLA and Service • Compact complete model
monitor UI • Easy to change in one place
41
42. My Two Tracks for Service
Management
Configuration Changes Service Type
IETF YANG
Status Calculation
SALmon
Service Type
Component
Released Device Type
2010
1 Model the Services
2 Express the transformations
42
43. Service Configuration and
Activation
IETF Defined YANG as data-modeling language for
managing devices
“Replacing SNMP MIBs”
Thesis:
YANG can be used to model services, not only devices
Service Configuration as a YANG – YANG transform
Work:
Service Modeling projects at service providers
Service Activation product, Tail-f NCS
43
44. SALmon and YANG
SALmon IETF YANG Comment
Model Object Oriented Tree Tree structures more
Structure suited for rendering
Purpose Operational Data Configuration Data
and
Operational Data
Time-Series
Calculations Functional - - YANG to YANG
mapping in Java for
imperative
configuration
- XPATH possible to
express aggregation
Constraints - XPATH
44
45.
46. Conclusions
For Research
Closer cooperation with equipment and service providers
Network management is in need of computer science
For Network Equipment Providers
Provide models (in a form) that can be used for automation
Interface quality
For Service Providers Overcome current practice of incomplete
illustrations and free-form documents
Model the offered services
Knowledge management
46
47. Future Work
Alarm Models
SALmon features represented in YANG
Alarm Type
Language extensions or as models
Causality
Time-series Alarm Type
Functional calculations
Constraints
XPATH
Database representation
Service Models
Imperative activation as part of
Configuration Changes
Service
Status Calculation
the model ? Type
Service
More knowledge management by using Type
Component
data-mining and self-learning Device
Type
Constraints
47
48. Errata
Paper C :
Says trivial approach is correct in 17 % of the cases
Should be 11 %
Section 2 :
Wrong “T”, should be:
48
49. Thank You !
Klacke Wikström Jörgen Öfjell
Håkan Millroth Johan Ehnmark
Martin Björklund Christer Åhlund Andreas Jonsson
Seb Strollo Johan Nordlander Ulrik Forsgren
Johan Bevemyr Viktor Leijon Magnus Karlsson
Joakim Grebenö Robert Brännström Leif Landén
Chris Williams Karl Andersson
Daniel Granlund
Dan Johansson
Nicklas Bystedt
Mikael Börjesson
EU Funded Equipment Vendors and
Sidath Handurukande Service Providers
Magneto Project
Test Data
49