SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
GLORIAD's New Measurement
and Monitoring System for
Addressing Individual
Customer-based Performance
across a Global Network Fabric
APAN Meeting
January 16, 2013
Greg Cole
Principal Investigator
GLORIAD
gcole@gloriad.org
Presentation
During the past year, GLORIAD has been working on a new system for
measuring and monitoring global network infrastructure focused less on "links"
and more on addressing needs of individual users. To accomplish its goal of
actively improving global infrastructure for individual customers, the new system
is designed to:
(1) understand the network needs and requirements of a global customer base
by actively studying utilization; (2) identify poor performance of individual
applications by constantly (and in near-real-time) analyzing information on such
per-flow metrics as load, packet loss, jitter and routing asymmetries; (3) mitigate
poor performance of applications by identifying fabric weaknesses (4) build richly
visual analysis applications such as GLORIAD-Earth and the new GloTOP to
help make sense of the enormous volume of data.
To realize this new model of measurement and monitoring (focused less on links
and more on individual customers), GLORIAD has recently moved from its
netflow-based system (used since 1998 and storing approximately 1 million
records per day) to a new, much more detailed system – collecting, storing and
analyzing 200-400 million network utilization records per day – based on
deployment of open-source Argus software (www.qosient.com/argus). The talk
will focus on the benefits and the technical challenges of this new and actively
evolving work.
International infrastructure (circuits) ..
“No GLIF no GLORIAD”
Partners: SURFnet, NORDUnet, CSTnet (China), e-ARENA (Russia), KISTI (Korea), CANARIE (Canada), SingaREN,
ENSTInet (Egypt), Tata Inst / Fund Rsrch/Bangalore Science Community, NLR/Internet2/NLR/NASA/FedNets, CERN/LHC
Sponsors: US NSF ($18.5M 1998-2015), Tata ($6M), USAID ($3.5M 2011-2013) all Intl partners (~$240M 1998-2015)
History: 1994 US-Russia Friends and Partners; 1996 US-Russia Civic Networking; 1997 US-Russia MIRnet; 2004 GLORIAD;
2009 GLORIAD/Taj; 2011 GLORIAD/Africa
Thank you GLORIAD-US Team
(Anita, Harika, Karen, Kim, Naveen, Predrag, Susie)
GLORIAD Metrics
Utilization
Performance
Operations
Security
(“you can’t manage [or improve] what you can’t measure”
– quoting a wise NSF program official))
(“it’s all about ‘situational awareness’ and instrumenting
towards that goal” )
Utilization Monitoring
(in (near) real-time)
“Top Talkers”
Protocol utilization
Application identification/utilization
Traffic Analysis (DNS, etc.)
Real-time (alerts, etc.)
Historical timeline analysis
Utilization Monitoring
0"
20,000"
40,000"
60,000"
80,000"
100,000"
120,000"
140,000"
160,000"
180,000"
200,000"
2004)01"
2004)04"
2004)07"
2004)10"
2005)01"
2005)04"
2005)07"
2005)10"
2006)01"
2006)04"
2006)07"
2006)10"
2007)01"
2007)04"
2007)07"
2007)10"
2008)01"
2008)04"
2008)07"
2008)10"
2009)01"
2009)04"
2009)07"
2009)10"
2010)01"
2010)04"
2010)07"
2010)10"
2011)01"
2011)04"
2011)07"
2011)10"
2012)01"
Gigabytes*
GLORIAD*U2liza2on,*2004;2012*
by*Country*Source*of*Traffic*
Other"
Poland"
Sweden"
Romania"
Netherlands"
Brazil"
Italy"
France"
Taiwan"
Norway"
Great"Britain"(UK)"
Hong"Kong"
Singapore"
Germany"
Switzerland"
Russian"FederaOon"
Canada"
Korea"(South)"
China"
United"States"
!(500.0)!
!'!!!!
!500.0!!
!1,000.0!!
!1,500.0!!
!2,000.0!!
!2,500.0!!
!3,000.0!!
!3,500.0!!
!4,000.0!!
!4,500.0!!
1999'06!1999'08!1999'10!1999'12!2000'02!2000'06!2000'08!2000'10!2000'12!2001'02!2001'04!2001'06!2001'08!2001'10!2001'12!2002'02!2002'04!2002'06!2002'10!2002'12!2003'02!2003'04!2003'06!2003'08!2003'10!2003'12!
%"of"Traffic"/"Monthly"
GLORIAD/MirNET"Traffic"1999=2003"
Russian!Federa:on!
United!States!
Germany!
Switzerland!
Taiwan!
Israel!
Great!Britain!(UK)!
Poland!
Netherlands!
France!
Japan!
Sweden!
Other!
Utilization Monitoring
(in (near) real-time)
Thank you
CSTnet and KISTI (special thanks to
Tong, Haina, Chunjing, Jiangning,
Xiaodan, Gang, Lei, Hui, Dongkyun,
Buseung)
Utilization Monitoring
(in (near) real-time)
Utilization Monitoring
(in (near) real-time)
Utilization Monitoring
(in (near) real-time)
Utilization Monitoring
(in (near) real-time)
Utilization Monitoring
(in (near) real-time)
Utilization Monitoring
(in (near) real-time)
Performance Monitoring
(in (near) real-time)
Key theme: we want to address
real performance needs *before*
users have to figure out who in
the world to call about their “bad
connection” - or before they
decide that the “R&E Internet” is
not adequate to their needs -
i.e.,proactive performance
mitigation (instead of reactive).
Another theme: we want to
develop tools, technologies and
experience that can be used
throughout the global network
fabric (local, campus, regional,
national, international)- the real
“home” for these tools will
ultimately be the local network
operators who live closest to the
customers.
New GloTop Application
New dvNOC Application
dvNOC System
Joint effort by US, China, Korea, Nordic teams (and,
now, new GLORIAD/Taj partners)
Based on solid measurement infrastructure,
information management and information sharing
Fueled by the open-source Argus system of flow
monitoring (5 second updates on all flows, 200-400
million flow-records/day; handles multi-G flow rates
with room to spare)
Focused on (1) understanding utilization, (2)
improving performance systemically, (3) ensuring
appropriate use, (4) distributing (decentralizing)
operations and management of R&E networks
Former Metrics Data Source
“Taj” Measurement/Monitoring Update
Picture of GLORIAD/Taj new “nprobe” network
measurement device. Hardware: Dell PowerEdge R410
Server - 8 core intel processor, 10GE Intel Fiber Card (ixgbe
driver). Network utilization and performance measurement
box - at 10G line speed designed to improve and extend
open source nprobe netflow emitter software, emit extended
netflow records including detailed information of packet
retransmissions. Software base: Luca Deri’s nprobe.
The two screenshots above illustrate data generated from the Taj project’s new “nprobe” boxes deployed in Chicago and Seattle. The first
illustrates top flows on the network; the second illustrates large flows suffering from poor performance (i.e., high packet retransmits). This
data was formerly generated from GLORIAD’s packeteer system (limited to 1 Gbps circuit capacity).
2012 Transition to Argus
http://www.qosient.com/argus/
We use Luca Deri’s enabling pf_ring underneath
(and we’re also exploring freebsd’s netmap)
R
Argus
Flexible open-source software packet sensors to
generate network flow records at line rate, for
operations, performance and security.
Comprehensive, not statistical, bi-directional,
with many flow models allowing you to track any
network traffic, not just 5-tuple IP traffic.
Support for large scale collection, data
processing, storage and archiving, sharing,
vizualization, with analytics, aggregation,
geospatial, netspatial analysis.
Argus
(author: Carter Bullard)
Current GLORIAD-US Deployment of Argus
KNOXVILLE RADIUM SERVER
Apple Xserver-
1) Processors - 2 x 2.93GHz Quad-
Core Intel Xeon
2) Memory - 24GB (6x4GB)
3) Hard drive - 1TB Serial ATA
4) OS - 10.8
Argus Analysis Tools
(running on various (mostly apple)
SEATTLE ARGUS NODE
DELL R410 servers  -  
1) Processors - 2 x Intel xeon
X55670, 2.93GHz (Quad cores)
2) Memory  - 8 GB (4 x 2GB)
UDDIMMs 
3) Hard drive - 500GB SAS 
4) Intel 82599EB 10G NIC 
5) OS - Centos 6
6) modified for PF_RING
7) running argus daemon sending
data to radium server in Knoxville
Seattle Force-10 Router
10G SPAN port
CHICAGO ARGUS NODE
DELL R410 servers  -  
1) Processors - 2 x Intel xeon
X55670, 2.93GHz (Quad cores)
2) Memory  - 8 GB (4 x 2GB)
UDDIMMs 
3) Hard drive - 500GB SAS 
4) Intel 82599EB 10G NIC 
5) OS - Centos 6
6) modified for PF_RING
7) running argus daemon sending
data to radium server in Knoxville
Chicago Force-10 Router
10G SPAN port
Near-future GLORIAD-US Deployment of Argus
SEATTLE ARGUS NODE
DELL R410 servers  -  
1) Processors - 2 x Intel xeon
X55670, 2.93GHz (Quad cores)
2) Memory  - 8 GB (4 x 2GB)
UDDIMMs 
3) Hard drive - 500GB SAS 
4) Intel 82599EB 10G NIC 
5) OS - Centos 6
6) modified for PF_RING
7) running argus daemon sending
data to radium server in Knoxville
Seattle Force-10 Router
10G SPAN port
CHICAGO ARGUS NODE
DELL R410 servers  -  
1) Processors - 2 x Intel xeon
X55670, 2.93GHz (Quad cores)
2) Memory  - 8 GB (4 x 2GB)
UDDIMMs 
3) Hard drive - 500GB SAS 
4) Intel 82599EB 10G NIC 
5) OS - Centos 6
6) modified for PF_RING
7) running argus daemon sending
data to radium server in Knoxville
Chicago Force-10 Router
10G SPAN port
X X(use taps
instead)
(use taps
instead)
• Local Storage
• Local Analysis Hardware
• Ability to handle much more capacity
• Local Storage
• Local Analysis Hardware
• Ability to handle much more capacity
KNOXVILLE RADIUM SERVER
Apple Xserver-
1) Processors - 2 x 2.93GHz Quad-
Core Intel Xeon
2) Memory - 24GB (6x4GB)
3) Hard drive - 1TB Serial ATA
4) OS - 10.8
Argus Analysis Tools
(running on various (mostly apple)
Big Farm of Cisco-provided
Blade Servers
Fast Analysis
Parallel Database Architecture
Why all this power?
• Preparing the data for this
graph from 250G argus archive
(which helped a large
international R&E network
systemically address a huge
performance problem) took me
3 days with our current setup
• We want any of our partners
to be able do this in 3 minutes
(or less)
• We want “room” to better
research the area of
performance, operations and
security analytics with our
international partners
But we’re still designing for lesser
needs as well (targeting single 1G and
10G networks)
LinuxMacOSX
FreeBSD
Current Process
Chicago
Argus
Node
Seattle
Argus
Node
chained
radium
servers
Additional
“ad hoc”
processing
racluster
process
mysql
archive of
top users (10
seconds)
Live
Apps (glo-
earth, glo-
top, dvnoc)
rastream
process (5
minutes)
Disk
archive
(~300 million
recs / day)
mySQL
archive (1.8
million recs /
day)
Analysis
Applications
Knoxville
Radium
Server
3 Mbps stream
3 Mbps stream
New Process
Chicago
Argus
Node
Seattle
Argus
Node
chained
radium
servers
Additional
“ad hoc”
processing
racluster
process
mysql
archive of
top users (10
seconds)
Live
Apps (glo-
earth, glo-
top, dvnoc)
rastream
process (5
minutes)
Disk
archive
(~300 million
recs / day)
mySQL
archive (1.8
million recs /
day)
Analysis
Applications
Knoxville
Radium
Server
3 Mbps stream
3 Mbps stream
New Process
(Dec/2012-Jan/2013)
32 core Cisco Blade Server (freeBSD) with 128G RAM, 5T RAID storage
“Farm” of Perl/POE/IKC Daemons Near-Realtime Analytics and Local Storage of Data
“Top Users” DNS Analysis Bad Performers Link Analytics BGP Analysis ICMP Analysis Scan Analysis...
Argus Nodes (for GLORIAD currently, Chicago and Seattle)
Argus Data (from Argus Nodes to a Core Radium Collector)
...
dvNOC
...GloTOP GLOEarth Ticketing System NOC Access
User Tools for Analysis, Operational Support and Visualization
More detail ..
“Farm” of Perl/POE/IKC Daemons Near-Realtime Analytics and Local Storage of Data
“Top Users” DNS Analysis Bad Performers Link Analytics BGP Analysis ICMP Analysis Scan Analysis...
dvNOC ...GloTOP GLOEarth Web Reports NOC Access
User Tools for Analysis and Visualization
• Built with Runrev LiveCode
• Multi-platform (Mac, Windows, Linux, iOS, Android)
• Event-driven, graphic/media rich applications
• Perl POE event-loop, event-driven programming for “cooperative multi-tasking”
• IKC for inter-kernel communications between “animals”
• Daemonized (fast)
• Use MySQL (or any other) for long-term storage; SQLlite for local (fast) in-memory database
• Each “animal” on the “farm” is autonomous and very specialized
• Most read from a single argus RABINS stream
All of the software, tools,
data specifications, etc. are being
“Github’d”
(right thing to do (argus, perl, mysql,
sqlite are all open)
and
we want people to help us ..)
GLORIAD github
“Operationalizing”
this Data
New dvNOC Application
“REQUEST TRACKER”
FED BY DATA FROM MONITORING SYSTEMS HOMEhttp://
Poor-Performance Analysis
Chicago
Source
x.x.3.226 Destination
x.x.244.210
Active monitoring
system - My TraceRoute(MTR)
Harika Tandra, GLORIAD
Active monitoring system
• For each under-performing flow identified, MTR runs are
triggered to source and destination IPs
• Triggered in near-real-time to the flow detected. Thus, test
packets are triggered in network conditions similar to those
seen by the real traffic
• Combining the two gives approximate end-to-end performance


Harika Tandra, GLORIAD
Example network graphs for a
few end hosts in U.S.
Harika Tandra, GLORIAD
Example network graphs for a
few end hosts in China
Representation :
• Graph node - router in paths discovered by MTR.
• Rect. node - the end host.
• Node label -
• 1st line - value of cost function
• 2nd line - IP (anonymized)
• 3rd line- Avg. %packet loss at the node.
• Color map ranges from Yellow through orange to red.
• this graph is color mapped based on the ‘Avg.
%packet loss’ value.
• Edges labels : ‘A-B’ where
• A => Total number of mtr runs through the
parent to child node.
• B => Number of runs in which there was non-
zero packet loss.
• Gray nodes are nodes which saw no packet loss.
Harika Tandra, GLORIAD
Data Model
Use mySQL (partly for benefit of using the myisam heap tables (fast))
but also now using SQLite for local autonomous analysis engines (very
fast, especially with :memory: tables)
Use BerkeleyDB for some things (tying perl hashes to disk data stores)
Two large databases
pflow - primary IP@s, ASNums, Domains, all large flows (1998 -
current (~1.4 billion records)), support tables (ip mapping tables,
ccodes, world regions, sci disciplines, protocols, services, etc.)
summary - various tables to enable fast search/retrieval of flow
information
Experimenting with Argus rasql tools (powerful)
Using Argus ralabel (with geoip) for live labeling of all flow updates with
country codes, asnums, lat/long, etc.
Looking at hadoop, others for parallel capabilities
Key Database Management Tables
Summary: Core Technologies
Argus as passive monitor (formerly packeteer
and then nprobe) running on top of pf_ring (or
freebsd’s netmap)
Mtr as active monitor
Mysql as underlying database (exploring
alternatives now) along with SQLite and
BerkeleyDB
RunRev’s LiveCode for front-end client
development (we formerly used Flash)
(someday this should be html5 apps (?))
Perl/POE/IKC for back-end “cooperative
multitasking” server
Summary
Work builds on efforts since 1999
Argus offers a *lot* of advantages over
netflow or sflow
Data management problem *is* solvable
We hope to encourage an open global,
community effort to deploy common
standards and tools addressing metrics for
R&E network performance, operations and
security

Contenu connexe

Tendances

Introduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in JapanIntroduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in Japan
inside-BigData.com
 
InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing
inside-BigData.com
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
Supriya .
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Gp Introduction 200811
Gp Introduction 200811Gp Introduction 200811
Gp Introduction 200811
iswaha
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 

Tendances (20)

Introduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in JapanIntroduction to the Oakforest-PACS Supercomputer in Japan
Introduction to the Oakforest-PACS Supercomputer in Japan
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
 
InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing InfiniCortex and the Renaissance in Polish Supercomputing
InfiniCortex and the Renaissance in Polish Supercomputing
 
DRP (Stretch Cluster) for HDP - Future of Data : Paris
DRP (Stretch Cluster) for HDP - Future of Data : Paris DRP (Stretch Cluster) for HDP - Future of Data : Paris
DRP (Stretch Cluster) for HDP - Future of Data : Paris
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Druid at Hadoop Ecosystem
Druid at Hadoop EcosystemDruid at Hadoop Ecosystem
Druid at Hadoop Ecosystem
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and later
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Gp Introduction 200811
Gp Introduction 200811Gp Introduction 200811
Gp Introduction 200811
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 

Similaire à GLORIAD's New Measurement and Monitoring System

BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
lilyco
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
Deltares
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
sector-sphere
sector-spheresector-sphere
sector-sphere
xlight
 
ESnet Defined: Challenges and Overview Department of Energy ...
ESnet Defined: Challenges and Overview Department of Energy ...ESnet Defined: Challenges and Overview Department of Energy ...
ESnet Defined: Challenges and Overview Department of Energy ...
Videoguy
 

Similaire à GLORIAD's New Measurement and Monitoring System (20)

TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012PIMRC-2012, Sydney, Australia, 28 July, 2012
PIMRC-2012, Sydney, Australia, 28 July, 2012
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
ION Belgrade - IETF Update
ION Belgrade - IETF UpdateION Belgrade - IETF Update
ION Belgrade - IETF Update
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
National Polar-orbiting Operational Environmental Satellite System (NPOESS)
National Polar-orbiting Operational Environmental Satellite System (NPOESS)National Polar-orbiting Operational Environmental Satellite System (NPOESS)
National Polar-orbiting Operational Environmental Satellite System (NPOESS)
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
ESnet Defined: Challenges and Overview Department of Energy ...
ESnet Defined: Challenges and Overview Department of Energy ...ESnet Defined: Challenges and Overview Department of Energy ...
ESnet Defined: Challenges and Overview Department of Energy ...
 
OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 

Plus de Ed Dodds

Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Gloriad.flo con.2014.01
Gloriad.flo con.2014.01
Ed Dodds
 
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
2014 COMPENDIUM Edition of National Research and  Education Networks in Europe2014 COMPENDIUM Edition of National Research and  Education Networks in Europe
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
Ed Dodds
 
HIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryHIMSS Innovation Pathways Summary
HIMSS Innovation Pathways Summary
Ed Dodds
 

Plus de Ed Dodds (20)

Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaUpdated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
 
ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8
 
Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...
 
Iris Ritter interconnection map
Iris Ritter interconnection mapIris Ritter interconnection map
Iris Ritter interconnection map
 
Inoversity - Bob Metcalfe
Inoversity - Bob MetcalfeInoversity - Bob Metcalfe
Inoversity - Bob Metcalfe
 
Distributed Ledger Technology
Distributed Ledger TechnologyDistributed Ledger Technology
Distributed Ledger Technology
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond
 
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
 
Jetstream
JetstreamJetstream
Jetstream
 
Innovation Accelerators Report
Innovation Accelerators ReportInnovation Accelerators Report
Innovation Accelerators Report
 
Work.
Work.Work.
Work.
 
Strategy for American Innovation
Strategy for American InnovationStrategy for American Innovation
Strategy for American Innovation
 
Collaboration with NSFCloud
Collaboration with NSFCloudCollaboration with NSFCloud
Collaboration with NSFCloud
 
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareAppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
 
Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...
 
Data Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesData Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of Responses
 
Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Gloriad.flo con.2014.01
Gloriad.flo con.2014.01
 
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
2014 COMPENDIUM Edition of National Research and  Education Networks in Europe2014 COMPENDIUM Edition of National Research and  Education Networks in Europe
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
 
New Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisNew Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman Jacknis
 
HIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryHIMSS Innovation Pathways Summary
HIMSS Innovation Pathways Summary
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

GLORIAD's New Measurement and Monitoring System

  • 1. GLORIAD's New Measurement and Monitoring System for Addressing Individual Customer-based Performance across a Global Network Fabric APAN Meeting January 16, 2013 Greg Cole Principal Investigator GLORIAD gcole@gloriad.org
  • 2. Presentation During the past year, GLORIAD has been working on a new system for measuring and monitoring global network infrastructure focused less on "links" and more on addressing needs of individual users. To accomplish its goal of actively improving global infrastructure for individual customers, the new system is designed to: (1) understand the network needs and requirements of a global customer base by actively studying utilization; (2) identify poor performance of individual applications by constantly (and in near-real-time) analyzing information on such per-flow metrics as load, packet loss, jitter and routing asymmetries; (3) mitigate poor performance of applications by identifying fabric weaknesses (4) build richly visual analysis applications such as GLORIAD-Earth and the new GloTOP to help make sense of the enormous volume of data. To realize this new model of measurement and monitoring (focused less on links and more on individual customers), GLORIAD has recently moved from its netflow-based system (used since 1998 and storing approximately 1 million records per day) to a new, much more detailed system – collecting, storing and analyzing 200-400 million network utilization records per day – based on deployment of open-source Argus software (www.qosient.com/argus). The talk will focus on the benefits and the technical challenges of this new and actively evolving work.
  • 3. International infrastructure (circuits) .. “No GLIF no GLORIAD” Partners: SURFnet, NORDUnet, CSTnet (China), e-ARENA (Russia), KISTI (Korea), CANARIE (Canada), SingaREN, ENSTInet (Egypt), Tata Inst / Fund Rsrch/Bangalore Science Community, NLR/Internet2/NLR/NASA/FedNets, CERN/LHC Sponsors: US NSF ($18.5M 1998-2015), Tata ($6M), USAID ($3.5M 2011-2013) all Intl partners (~$240M 1998-2015) History: 1994 US-Russia Friends and Partners; 1996 US-Russia Civic Networking; 1997 US-Russia MIRnet; 2004 GLORIAD; 2009 GLORIAD/Taj; 2011 GLORIAD/Africa
  • 4. Thank you GLORIAD-US Team (Anita, Harika, Karen, Kim, Naveen, Predrag, Susie)
  • 5. GLORIAD Metrics Utilization Performance Operations Security (“you can’t manage [or improve] what you can’t measure” – quoting a wise NSF program official)) (“it’s all about ‘situational awareness’ and instrumenting towards that goal” )
  • 6. Utilization Monitoring (in (near) real-time) “Top Talkers” Protocol utilization Application identification/utilization Traffic Analysis (DNS, etc.) Real-time (alerts, etc.) Historical timeline analysis
  • 7. Utilization Monitoring 0" 20,000" 40,000" 60,000" 80,000" 100,000" 120,000" 140,000" 160,000" 180,000" 200,000" 2004)01" 2004)04" 2004)07" 2004)10" 2005)01" 2005)04" 2005)07" 2005)10" 2006)01" 2006)04" 2006)07" 2006)10" 2007)01" 2007)04" 2007)07" 2007)10" 2008)01" 2008)04" 2008)07" 2008)10" 2009)01" 2009)04" 2009)07" 2009)10" 2010)01" 2010)04" 2010)07" 2010)10" 2011)01" 2011)04" 2011)07" 2011)10" 2012)01" Gigabytes* GLORIAD*U2liza2on,*2004;2012* by*Country*Source*of*Traffic* Other" Poland" Sweden" Romania" Netherlands" Brazil" Italy" France" Taiwan" Norway" Great"Britain"(UK)" Hong"Kong" Singapore" Germany" Switzerland" Russian"FederaOon" Canada" Korea"(South)" China" United"States" !(500.0)! !'!!!! !500.0!! !1,000.0!! !1,500.0!! !2,000.0!! !2,500.0!! !3,000.0!! !3,500.0!! !4,000.0!! !4,500.0!! 1999'06!1999'08!1999'10!1999'12!2000'02!2000'06!2000'08!2000'10!2000'12!2001'02!2001'04!2001'06!2001'08!2001'10!2001'12!2002'02!2002'04!2002'06!2002'10!2002'12!2003'02!2003'04!2003'06!2003'08!2003'10!2003'12! %"of"Traffic"/"Monthly" GLORIAD/MirNET"Traffic"1999=2003" Russian!Federa:on! United!States! Germany! Switzerland! Taiwan! Israel! Great!Britain!(UK)! Poland! Netherlands! France! Japan! Sweden! Other!
  • 8. Utilization Monitoring (in (near) real-time) Thank you CSTnet and KISTI (special thanks to Tong, Haina, Chunjing, Jiangning, Xiaodan, Gang, Lei, Hui, Dongkyun, Buseung)
  • 15. Performance Monitoring (in (near) real-time) Key theme: we want to address real performance needs *before* users have to figure out who in the world to call about their “bad connection” - or before they decide that the “R&E Internet” is not adequate to their needs - i.e.,proactive performance mitigation (instead of reactive). Another theme: we want to develop tools, technologies and experience that can be used throughout the global network fabric (local, campus, regional, national, international)- the real “home” for these tools will ultimately be the local network operators who live closest to the customers.
  • 18. dvNOC System Joint effort by US, China, Korea, Nordic teams (and, now, new GLORIAD/Taj partners) Based on solid measurement infrastructure, information management and information sharing Fueled by the open-source Argus system of flow monitoring (5 second updates on all flows, 200-400 million flow-records/day; handles multi-G flow rates with room to spare) Focused on (1) understanding utilization, (2) improving performance systemically, (3) ensuring appropriate use, (4) distributing (decentralizing) operations and management of R&E networks
  • 20. “Taj” Measurement/Monitoring Update Picture of GLORIAD/Taj new “nprobe” network measurement device. Hardware: Dell PowerEdge R410 Server - 8 core intel processor, 10GE Intel Fiber Card (ixgbe driver). Network utilization and performance measurement box - at 10G line speed designed to improve and extend open source nprobe netflow emitter software, emit extended netflow records including detailed information of packet retransmissions. Software base: Luca Deri’s nprobe. The two screenshots above illustrate data generated from the Taj project’s new “nprobe” boxes deployed in Chicago and Seattle. The first illustrates top flows on the network; the second illustrates large flows suffering from poor performance (i.e., high packet retransmits). This data was formerly generated from GLORIAD’s packeteer system (limited to 1 Gbps circuit capacity). 2012 Transition to Argus http://www.qosient.com/argus/ We use Luca Deri’s enabling pf_ring underneath (and we’re also exploring freebsd’s netmap) R
  • 21. Argus Flexible open-source software packet sensors to generate network flow records at line rate, for operations, performance and security. Comprehensive, not statistical, bi-directional, with many flow models allowing you to track any network traffic, not just 5-tuple IP traffic. Support for large scale collection, data processing, storage and archiving, sharing, vizualization, with analytics, aggregation, geospatial, netspatial analysis.
  • 23.
  • 24. Current GLORIAD-US Deployment of Argus KNOXVILLE RADIUM SERVER Apple Xserver- 1) Processors - 2 x 2.93GHz Quad- Core Intel Xeon 2) Memory - 24GB (6x4GB) 3) Hard drive - 1TB Serial ATA 4) OS - 10.8 Argus Analysis Tools (running on various (mostly apple) SEATTLE ARGUS NODE DELL R410 servers  -   1) Processors - 2 x Intel xeon X55670, 2.93GHz (Quad cores) 2) Memory  - 8 GB (4 x 2GB) UDDIMMs  3) Hard drive - 500GB SAS  4) Intel 82599EB 10G NIC  5) OS - Centos 6 6) modified for PF_RING 7) running argus daemon sending data to radium server in Knoxville Seattle Force-10 Router 10G SPAN port CHICAGO ARGUS NODE DELL R410 servers  -   1) Processors - 2 x Intel xeon X55670, 2.93GHz (Quad cores) 2) Memory  - 8 GB (4 x 2GB) UDDIMMs  3) Hard drive - 500GB SAS  4) Intel 82599EB 10G NIC  5) OS - Centos 6 6) modified for PF_RING 7) running argus daemon sending data to radium server in Knoxville Chicago Force-10 Router 10G SPAN port
  • 25. Near-future GLORIAD-US Deployment of Argus SEATTLE ARGUS NODE DELL R410 servers  -   1) Processors - 2 x Intel xeon X55670, 2.93GHz (Quad cores) 2) Memory  - 8 GB (4 x 2GB) UDDIMMs  3) Hard drive - 500GB SAS  4) Intel 82599EB 10G NIC  5) OS - Centos 6 6) modified for PF_RING 7) running argus daemon sending data to radium server in Knoxville Seattle Force-10 Router 10G SPAN port CHICAGO ARGUS NODE DELL R410 servers  -   1) Processors - 2 x Intel xeon X55670, 2.93GHz (Quad cores) 2) Memory  - 8 GB (4 x 2GB) UDDIMMs  3) Hard drive - 500GB SAS  4) Intel 82599EB 10G NIC  5) OS - Centos 6 6) modified for PF_RING 7) running argus daemon sending data to radium server in Knoxville Chicago Force-10 Router 10G SPAN port X X(use taps instead) (use taps instead) • Local Storage • Local Analysis Hardware • Ability to handle much more capacity • Local Storage • Local Analysis Hardware • Ability to handle much more capacity KNOXVILLE RADIUM SERVER Apple Xserver- 1) Processors - 2 x 2.93GHz Quad- Core Intel Xeon 2) Memory - 24GB (6x4GB) 3) Hard drive - 1TB Serial ATA 4) OS - 10.8 Argus Analysis Tools (running on various (mostly apple) Big Farm of Cisco-provided Blade Servers Fast Analysis Parallel Database Architecture
  • 26. Why all this power? • Preparing the data for this graph from 250G argus archive (which helped a large international R&E network systemically address a huge performance problem) took me 3 days with our current setup • We want any of our partners to be able do this in 3 minutes (or less) • We want “room” to better research the area of performance, operations and security analytics with our international partners
  • 27. But we’re still designing for lesser needs as well (targeting single 1G and 10G networks) LinuxMacOSX FreeBSD
  • 28. Current Process Chicago Argus Node Seattle Argus Node chained radium servers Additional “ad hoc” processing racluster process mysql archive of top users (10 seconds) Live Apps (glo- earth, glo- top, dvnoc) rastream process (5 minutes) Disk archive (~300 million recs / day) mySQL archive (1.8 million recs / day) Analysis Applications Knoxville Radium Server 3 Mbps stream 3 Mbps stream
  • 29. New Process Chicago Argus Node Seattle Argus Node chained radium servers Additional “ad hoc” processing racluster process mysql archive of top users (10 seconds) Live Apps (glo- earth, glo- top, dvnoc) rastream process (5 minutes) Disk archive (~300 million recs / day) mySQL archive (1.8 million recs / day) Analysis Applications Knoxville Radium Server 3 Mbps stream 3 Mbps stream
  • 30. New Process (Dec/2012-Jan/2013) 32 core Cisco Blade Server (freeBSD) with 128G RAM, 5T RAID storage “Farm” of Perl/POE/IKC Daemons Near-Realtime Analytics and Local Storage of Data “Top Users” DNS Analysis Bad Performers Link Analytics BGP Analysis ICMP Analysis Scan Analysis... Argus Nodes (for GLORIAD currently, Chicago and Seattle) Argus Data (from Argus Nodes to a Core Radium Collector) ... dvNOC ...GloTOP GLOEarth Ticketing System NOC Access User Tools for Analysis, Operational Support and Visualization
  • 31. More detail .. “Farm” of Perl/POE/IKC Daemons Near-Realtime Analytics and Local Storage of Data “Top Users” DNS Analysis Bad Performers Link Analytics BGP Analysis ICMP Analysis Scan Analysis... dvNOC ...GloTOP GLOEarth Web Reports NOC Access User Tools for Analysis and Visualization • Built with Runrev LiveCode • Multi-platform (Mac, Windows, Linux, iOS, Android) • Event-driven, graphic/media rich applications • Perl POE event-loop, event-driven programming for “cooperative multi-tasking” • IKC for inter-kernel communications between “animals” • Daemonized (fast) • Use MySQL (or any other) for long-term storage; SQLlite for local (fast) in-memory database • Each “animal” on the “farm” is autonomous and very specialized • Most read from a single argus RABINS stream
  • 32. All of the software, tools, data specifications, etc. are being “Github’d” (right thing to do (argus, perl, mysql, sqlite are all open) and we want people to help us ..)
  • 36. “REQUEST TRACKER” FED BY DATA FROM MONITORING SYSTEMS HOMEhttp://
  • 39. Active monitoring system • For each under-performing flow identified, MTR runs are triggered to source and destination IPs • Triggered in near-real-time to the flow detected. Thus, test packets are triggered in network conditions similar to those seen by the real traffic • Combining the two gives approximate end-to-end performance 
 Harika Tandra, GLORIAD
  • 40. Example network graphs for a few end hosts in U.S. Harika Tandra, GLORIAD
  • 41. Example network graphs for a few end hosts in China Representation : • Graph node - router in paths discovered by MTR. • Rect. node - the end host. • Node label - • 1st line - value of cost function • 2nd line - IP (anonymized) • 3rd line- Avg. %packet loss at the node. • Color map ranges from Yellow through orange to red. • this graph is color mapped based on the ‘Avg. %packet loss’ value. • Edges labels : ‘A-B’ where • A => Total number of mtr runs through the parent to child node. • B => Number of runs in which there was non- zero packet loss. • Gray nodes are nodes which saw no packet loss. Harika Tandra, GLORIAD
  • 42. Data Model Use mySQL (partly for benefit of using the myisam heap tables (fast)) but also now using SQLite for local autonomous analysis engines (very fast, especially with :memory: tables) Use BerkeleyDB for some things (tying perl hashes to disk data stores) Two large databases pflow - primary IP@s, ASNums, Domains, all large flows (1998 - current (~1.4 billion records)), support tables (ip mapping tables, ccodes, world regions, sci disciplines, protocols, services, etc.) summary - various tables to enable fast search/retrieval of flow information Experimenting with Argus rasql tools (powerful) Using Argus ralabel (with geoip) for live labeling of all flow updates with country codes, asnums, lat/long, etc. Looking at hadoop, others for parallel capabilities
  • 44. Summary: Core Technologies Argus as passive monitor (formerly packeteer and then nprobe) running on top of pf_ring (or freebsd’s netmap) Mtr as active monitor Mysql as underlying database (exploring alternatives now) along with SQLite and BerkeleyDB RunRev’s LiveCode for front-end client development (we formerly used Flash) (someday this should be html5 apps (?)) Perl/POE/IKC for back-end “cooperative multitasking” server
  • 45. Summary Work builds on efforts since 1999 Argus offers a *lot* of advantages over netflow or sflow Data management problem *is* solvable We hope to encourage an open global, community effort to deploy common standards and tools addressing metrics for R&E network performance, operations and security