SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Visualization
  Lifecycle

datainsight
 San Francisco 2011
     Raffael Marty
“Transform a dataset into a captive story.”



              ‣ Assess                        Youʼre on your own              Art
              ‣ Parse

              ‣ Clean

              ‣ Visualize



                                          Visualization Tools and Libraries

pixlcloud | collect. visualize. understand.                                         Copyright (c) 2011
Audience
                                                        Expert

                                                                  Fun

                                Technical                               Overview

                                              Boring




                                                       Beginner

pixlcloud | collect. visualize. understand.                                        Copyright (c) 2011
Visualization Process
                                Contextual Data

                                                                                                     iterations




      Data Sources                  (Data Store)             Structured Data                   Visual Representation


                                                                               visualization

                                                   parsing
                                                                               feature selection

                                    files
                                    database
                                                              filtering
                                                              aggregation
                                                              cleansing



pixlcloud | collect. visualize. understand.                                                                       Copyright (c) 2011
Data Sources
      ‣ File                                             XML, JSON, CSV, TSV

      ‣Database                                 mysql -u root -p mydatabase < dump.sql

      ‣ API
                                                curl ‘http://freebase.com/api/service/
         ‣Factual                                   search?query=al+gore&indent=1’

         ‣Freebase

         ‣Infochimps

         ‣OpenStreetMap




pixlcloud | collect. visualize. understand.                                    Copyright (c) 2011
Explore Data
      ‣ What          is the data about?
      ‣ What          are the data features/columns?
      ‣ Is    there a common structure in the data?
      ‣ What          are the data types?
                Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c:
                29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00
                TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0

                May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT=
                MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15
                LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772
                WINDOW=65535 RES=0x00 ACK URGP=0



pixlcloud | collect. visualize. understand.                                                  Copyright (c) 2011
Parsing and Normalization
     ‣ Parsing
        ‣ extraction of entities / features

        ‣ imposing structure
                                              Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0:
                                              212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss
                                              1460,nop,nop,sackOK> (DF)

        ‣ often use regexes                   Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp
                                              src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access-
                                              group "internet_access_in"

     ‣ Normalize                              Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT=
                                              MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126
                                              DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624
                                              PROTO=TCP SPT=3859 DPT=135 LEN=556
        ‣ field normalization

        ‣ term normalization: block, deny, dropped

     ‣ Generate              a common output format for vis-tools (e.g., CSV)

pixlcloud | collect. visualize. understand.                                                          Copyright (c) 2011
Parser
                        Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53:    34388 [1au][|domain] (DF)

Raw                     Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53:   49962 [1au][|domain] (DF)

                        Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53:   14434 [1au][|domain] (DF)




                                      (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+):
                                                    (d+.d+.d+.d+).?(d*) [<>]
Regex / Parser                                       (d+.d+.d+.d+).?(d*): (.*)



                        Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF)
Normalized              Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF)
(CSV)                   Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF)




pixlcloud | collect. visualize. understand.                                                                                        Copyright (c) 2011
UNIX Tools
     ‣ grep
        ‣cat file | grep –v “foo”

     ‣ awk
        ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’

        ‣awk -F, -v OFS=, ‘{print $2,$1}’

     ‣ sed
        ‣sed -e 's/fubar/foobar/g' filename




pixlcloud | collect. visualize. understand.                Copyright (c) 2011
Regular Expression Resources
     ‣   http://regexlib.com
     ‣   http://www.regular-expressions.info
     ‣   http://gskinner.com/RegExr




pixlcloud | collect. visualize. understand.    Copyright (c) 2011
Data Cleansing
     ‣ Filter




     ‣ Normalize                  (see earlier)



     ‣ Aggregation



pixlcloud | collect. visualize. understand.             Copyright (c) 2011
Load CSV into Database
    # mysql -u <user> -p                          Sometimes you just load
                                                  your data into a tool,
                                                  and you can omit this
    mysql> create database data;                  step


    mysql> create table set1 (id int, address
           varchar(20), ...);
    mysql> LOAD DATA LOCAL INFILE 'input_file' INTO
                        TABLE set1 FIELDS TERMINATED BY ',' LINES
                        TERMINATED BY 'n';



pixlcloud | collect. visualize. understand.                        Copyright (c) 2011
Contextual Data
     ‣ Either          dump into DB or use via API calls to augment



     ‣ IP    -> Geo mapping
     ‣ Information                    about countries
     ‣ Port       number -> service name


pixlcloud | collect. visualize. understand.                     Copyright (c) 2011
Feature Selection
     ‣ What          are the fields you are interested in?
     ‣ Compute                 new fields
        ‣start time, end time -> duration

        ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ]
        ‣ Entropy: H ( X ) = E ( I ( X ) )

     ‣ Dimensionality                         reduction
        ‣See Bryan’s talk!




pixlcloud | collect. visualize. understand.                             Copyright (c) 2011
Choose Your Poison




pixlcloud | collect. visualize. understand.      Copyright (c) 2011
Ode to the Pie




pixlcloud | collect. visualize. understand.               Copyright (c) 2011
A Good Visual
     ‣ Chose        the right graph            ‣ Simultaneous   views




     ‣ Reduce         non-data ink                         ‣ Interactivity




pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
Visual Transformations
     ‣ keep         iterating on visual transformations, change
        ‣color

        ‣shape

        ‣features display

     ‣ add        new fields?
     ‣ add        more context?
     ‣ is   the output expressive?
     ‣ capture             output and prettify it for presentation
pixlcloud | collect. visualize. understand.                          Copyright (c) 2011
Data Visualization Tools
and Libraries
Tools and Libraries
      ‣ http://datainsightsf.com/resources/
         ‣Choose what’s appropriate!

      ‣ Data         Analysis and Visualization LInuX
         ‣davix.secviz.org

      ‣ GraphViz
         ‣graphviz.org

      ‣ AfterGlow                 (CSV -> DOT)
         ‣afterglow.sf.net


pixlcloud | collect. visualize. understand.             Copyright (c) 2011
Libraries
     ‣ Reporting                 Libraries         ‣Visualization Libraries
        ‣HighCharts                                 ‣TheJIT
        ‣Flot                                       ‣Graphael
        ‣Google Chart API                           ‣Protovis
        ‣Open Flash Chart                           ‣ProcessingJS
        ‣JQuery Sparklines                          ‣Flare
        ‣Polymaps                                   ‣D3


                                                    -

pixlcloud | collect. visualize. understand.                              Copyright (c) 2011
HighCharts



 ‣ Click-Through

 ‣ On      load
    ‣near real-time updates

 ‣ Zoom
                                                           www.highcharts.com

pixlcloud | collect. visualize. understand.                             Copyright (c) 2011
Google Visualization API


     http://code.google.com/apis/visualization/interactive_charts.html

      ‣ JavaScript

      ‣ Based          on DataTables()
      ‣ Many          graphs
      ‣ Playground
         ‣   http://code.google.com/apis/ajax/playground

pixlcloud | collect. visualize. understand.                              Copyright (c) 2011
ProtoVis
     ‣ JavaScript               based visualization library
     ‣ Charting

     ‣ Treemaps

     ‣ BoxPlots

     ‣ Parallel           Coordinates
     ‣ etc.


                                                   http://vis.stanford.edu/protovis/
pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
TheJIT   http://thejit.org/

     ‣ JavaScript               InfoVis Toolkit
     ‣ Interactive

     ‣ Link        Graphs




pixlcloud | collect. visualize. understand.                      Copyright (c) 2011
Processing
     ‣   Visualization library
     ‣   Java based
     ‣   Interactive (event handling)
     ‣   Number of libraries to
         ‣ draw    in OpenGL
         ‣ read    XML files
     ‣   Processing JS
         ‣ JavaScript
         ‣ HTML 5 Canvas
         ‣ WebGL                                   http://processingjs.org/
         ‣ Web IDE                                 http://processing.org/

pixlcloud | collect. visualize. understand.                                   Copyright (c) 2011
Visualization Tools
     ‣ Gephi

     ‣R

     ‣ Matlab

     ‣ Mondrian

     ‣ PicViz

     ‣ Treemap                 4.1
     ‣ Google             Earth
pixlcloud | collect. visualize. understand.         Copyright (c) 2011
Gephi   http://gephi.org


     ‣ reads:           CSV, DOT, etc.
     ‣ graph           analysis algorithms
     ‣ highly           interactive




pixlcloud | collect. visualize. understand.                    Copyright (c) 2011
PicViz




                                                   http://www.wallinfire.net/picviz/

pixlcloud | collect. visualize. understand.                               Copyright (c) 2011
Treemap 4.1




                                                    http://www.cs.umd.edu/hcil/treemap/
pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
Google Earth
 • KML data format for
   encoding data




pixlcloud | collect. visualize. understand.   Copyright (c) 2011
pixlcloud                       buy now



collect. visualize. understand.



                 @raffaelmarty

Contenu connexe

Tendances

Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
Bhavendra Chavan
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
Grammarly
 

Tendances (20)

Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Privacy Engineering
Privacy EngineeringPrivacy Engineering
Privacy Engineering
 
The 2012 Industry Digitization Index
The 2012 Industry Digitization IndexThe 2012 Industry Digitization Index
The 2012 Industry Digitization Index
 
Data Observability.pptx
Data Observability.pptxData Observability.pptx
Data Observability.pptx
 
Healthcare in the Metaverse.pdf
Healthcare in the Metaverse.pdfHealthcare in the Metaverse.pdf
Healthcare in the Metaverse.pdf
 
Airline Analysis of Data Using Hadoop
Airline Analysis of Data Using HadoopAirline Analysis of Data Using Hadoop
Airline Analysis of Data Using Hadoop
 
Importance of data analytics for business
Importance of data analytics for businessImportance of data analytics for business
Importance of data analytics for business
 
Getting Started With Digitisation
Getting Started With DigitisationGetting Started With Digitisation
Getting Started With Digitisation
 
PPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikem
PPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikemPPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikem
PPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikem
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Data Monetization Framework
Data Monetization FrameworkData Monetization Framework
Data Monetization Framework
 
Big data project management
Big data project managementBig data project management
Big data project management
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

En vedette

En vedette (6)

Analytic Journeys from Predictive Analytics World
Analytic Journeys from Predictive Analytics WorldAnalytic Journeys from Predictive Analytics World
Analytic Journeys from Predictive Analytics World
 
Cyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock InsightCyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock Insight
 
AfterGlow
AfterGlowAfterGlow
AfterGlow
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 

Similaire à Visualization Lifecycle

breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
Ryan Breed
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Summit
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 

Similaire à Visualization Lifecycle (20)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 

Plus de Raffael Marty

AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are Dangerous
Raffael Marty
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
Raffael Marty
 

Plus de Raffael Marty (20)

Exploring the Defender's Advantage
Exploring the Defender's AdvantageExploring the Defender's Advantage
Exploring the Defender's Advantage
 
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security Data
 
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?
 
Understanding the "Intelligence" in AI
Understanding the "Intelligence" in AIUnderstanding the "Intelligence" in AI
Understanding the "Intelligence" in AI
 
Security Chat 5.0
Security Chat 5.0Security Chat 5.0
Security Chat 5.0
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are Dangerous
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are Dangerous
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
Visualization for Security
Visualization for SecurityVisualization for Security
Visualization for Security
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
DAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization LinuxDAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization Linux
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Dernier (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

Visualization Lifecycle

  • 1. Visualization Lifecycle datainsight San Francisco 2011 Raffael Marty
  • 2. “Transform a dataset into a captive story.” ‣ Assess Youʼre on your own Art ‣ Parse ‣ Clean ‣ Visualize Visualization Tools and Libraries pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 3. Audience Expert Fun Technical Overview Boring Beginner pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 4. Visualization Process Contextual Data iterations Data Sources (Data Store) Structured Data Visual Representation visualization parsing feature selection files database filtering aggregation cleansing pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 5. Data Sources ‣ File XML, JSON, CSV, TSV ‣Database mysql -u root -p mydatabase < dump.sql ‣ API curl ‘http://freebase.com/api/service/ ‣Factual search?query=al+gore&indent=1’ ‣Freebase ‣Infochimps ‣OpenStreetMap pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 6. Explore Data ‣ What is the data about? ‣ What are the data features/columns? ‣ Is there a common structure in the data? ‣ What are the data types? Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c: 29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0 May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT= MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15 LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772 WINDOW=65535 RES=0x00 ACK URGP=0 pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 7. Parsing and Normalization ‣ Parsing ‣ extraction of entities / features ‣ imposing structure Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0: 212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss 1460,nop,nop,sackOK> (DF) ‣ often use regexes Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access- group "internet_access_in" ‣ Normalize Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126 DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624 PROTO=TCP SPT=3859 DPT=135 LEN=556 ‣ field normalization ‣ term normalization: block, deny, dropped ‣ Generate a common output format for vis-tools (e.g., CSV) pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 8. Parser Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53: 34388 [1au][|domain] (DF) Raw Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53: 49962 [1au][|domain] (DF) Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53: 14434 [1au][|domain] (DF) (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+): (d+.d+.d+.d+).?(d*) [<>] Regex / Parser (d+.d+.d+.d+).?(d*): (.*) Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF) Normalized Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF) (CSV) Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF) pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 9. UNIX Tools ‣ grep ‣cat file | grep –v “foo” ‣ awk ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’ ‣awk -F, -v OFS=, ‘{print $2,$1}’ ‣ sed ‣sed -e 's/fubar/foobar/g' filename pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 10. Regular Expression Resources ‣ http://regexlib.com ‣ http://www.regular-expressions.info ‣ http://gskinner.com/RegExr pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 11. Data Cleansing ‣ Filter ‣ Normalize (see earlier) ‣ Aggregation pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 12. Load CSV into Database # mysql -u <user> -p Sometimes you just load your data into a tool, and you can omit this mysql> create database data; step mysql> create table set1 (id int, address varchar(20), ...); mysql> LOAD DATA LOCAL INFILE 'input_file' INTO TABLE set1 FIELDS TERMINATED BY ',' LINES TERMINATED BY 'n'; pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 13. Contextual Data ‣ Either dump into DB or use via API calls to augment ‣ IP -> Geo mapping ‣ Information about countries ‣ Port number -> service name pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 14. Feature Selection ‣ What are the fields you are interested in? ‣ Compute new fields ‣start time, end time -> duration ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ] ‣ Entropy: H ( X ) = E ( I ( X ) ) ‣ Dimensionality reduction ‣See Bryan’s talk! pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 15. Choose Your Poison pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 16. Ode to the Pie pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 17. A Good Visual ‣ Chose the right graph ‣ Simultaneous views ‣ Reduce non-data ink ‣ Interactivity pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 18. Visual Transformations ‣ keep iterating on visual transformations, change ‣color ‣shape ‣features display ‣ add new fields? ‣ add more context? ‣ is the output expressive? ‣ capture output and prettify it for presentation pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 20. Tools and Libraries ‣ http://datainsightsf.com/resources/ ‣Choose what’s appropriate! ‣ Data Analysis and Visualization LInuX ‣davix.secviz.org ‣ GraphViz ‣graphviz.org ‣ AfterGlow (CSV -> DOT) ‣afterglow.sf.net pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 21. Libraries ‣ Reporting Libraries ‣Visualization Libraries ‣HighCharts ‣TheJIT ‣Flot ‣Graphael ‣Google Chart API ‣Protovis ‣Open Flash Chart ‣ProcessingJS ‣JQuery Sparklines ‣Flare ‣Polymaps ‣D3 - pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 22. HighCharts ‣ Click-Through ‣ On load ‣near real-time updates ‣ Zoom www.highcharts.com pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 23. Google Visualization API http://code.google.com/apis/visualization/interactive_charts.html ‣ JavaScript ‣ Based on DataTables() ‣ Many graphs ‣ Playground ‣ http://code.google.com/apis/ajax/playground pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 24. ProtoVis ‣ JavaScript based visualization library ‣ Charting ‣ Treemaps ‣ BoxPlots ‣ Parallel Coordinates ‣ etc. http://vis.stanford.edu/protovis/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 25. TheJIT http://thejit.org/ ‣ JavaScript InfoVis Toolkit ‣ Interactive ‣ Link Graphs pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 26. Processing ‣ Visualization library ‣ Java based ‣ Interactive (event handling) ‣ Number of libraries to ‣ draw in OpenGL ‣ read XML files ‣ Processing JS ‣ JavaScript ‣ HTML 5 Canvas ‣ WebGL http://processingjs.org/ ‣ Web IDE http://processing.org/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 27. Visualization Tools ‣ Gephi ‣R ‣ Matlab ‣ Mondrian ‣ PicViz ‣ Treemap 4.1 ‣ Google Earth pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 28. Gephi http://gephi.org ‣ reads: CSV, DOT, etc. ‣ graph analysis algorithms ‣ highly interactive pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 29. PicViz http://www.wallinfire.net/picviz/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 30. Treemap 4.1 http://www.cs.umd.edu/hcil/treemap/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 31. Google Earth • KML data format for encoding data pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 32. pixlcloud buy now collect. visualize. understand. @raffaelmarty