SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Visualizing Software Behavior
                   Wu Yongzheng




14/Sep/2011          NUS SoC CSTalks     1
Problems
• Software is complex
      –   Large codebase
      –   Interaction between components
      –   Components from different vendor
      –   Closed source, closed API
• Why understand software?
      – As developer => less bugs
      – As administrator => diagnosis
      – Curiosity?
• Execution trace contains software behavior
  information, but it’s huge.
14/Sep/2011                 NUS SoC CSTalks    2
Software Traces
• Types of traces
      – Instruction trace: records machine instructions
      – Call trace: records function calls
      – System call trace: records system calls
      – Software logs: important events
• System trace
      – System call trace from all processes
      – Mainly resource usage, system & process
        interaction

14/Sep/2011                NUS SoC CSTalks                3
WinResMon
• WinResMon: our trace recorder.
• Works in Windows
• Types of events:
      – File: open, read, write, close, rename, …
      – Registry: open, get value, set value, delete, …
      – Network: connect, listen, send, receive, …
      – Process/thread: create, terminate.


14/Sep/2011                 NUS SoC CSTalks               4
Information (fields) in an Event
•   PID/TID                   Process/thread ID
•   Program name              Path of program’s EXE
•   User name/group           Process’ owner
•   Start/end time            Event timing in CPU ticks
•   Operation type            E.g. file open
•   Parameter                 Type dependent. E.g.
      – file path, system call flags, registry path
      – IP address
• Call stack trace            Call stack in user process
14/Sep/2011                   NUS SoC CSTalks              5
Why visualize System Traces
• Software is complex
      – Interaction between modules, other software
• Software can be closed source, but interaction
  is open
• Human is good at detecting
      – Repeated pattern
      – Anomaly


14/Sep/2011                NUS SoC CSTalks            6
What is DotPlot?
                                                 Trace X
                       E   A   C     B       E     E   E   D   C

                   A

                   C

                   B

                   C

                   D

                   E
         Trace Y
                   B

                   C

                   E

14/Sep/2011                    NUS SoC CSTalks                     7
What is DotPlot?
                                                 Trace X
                       E   A   C     B       E     E   E   D   C

                   A

                   C

                   B

                   C

                   D

                   E
         Trace Y
                   B

                   C

                   E

14/Sep/2011                    NUS SoC CSTalks                     8
An Example


                                   Visualization
                                   comparing:
                                   MS PowerPoint,
                                   MS Word,
                                   OO Word, and
                                   OO PowerPoint.




14/Sep/2011      NUS SoC CSTalks                    9
Elements of VDP
              2

                                     1: Extended DotPlot

                                     2,3: Axis Histogram
3             1             4
                                     4,5: Barcode



              3
14/Sep/2011        NUS SoC CSTalks                         10
Extended DotPlot
              • Matching Rule
                 – Define whether two events match
                 – By fields: e.g. “if PIDs and resource paths are
                   the same”, “if program names are the same”
              • DP Coloring Rule
                 – Define color for matched events
                 – Traditional DP uses black only
                 – Use RGB model on black background, CMY
                   on white background
                 – Use regular expression to specify events
                 – E.g. “.*file_open.*”→blue. “.*reg_.*”→cyan

14/Sep/2011              NUS SoC CSTalks                         11
Event-ordered and Time-ordered
• Each event takes different time
• The meaning/unit of each axis




       Event-ordered                      Time-ordered


14/Sep/2011             NUS SoC CSTalks                  12
Axis Histogram
              – Ticks mark unit time (e.g. 1 second)
              – Histogram
                 • Event density (time-ordered)
                 • Time spent (event-ordered)




14/Sep/2011            NUS SoC CSTalks                 13
Barcode

              • One dimensional
              • Highlight user chosen events
                 • E.g. file_open → red
              • One or more (e.g. three below)
              • Barcode coloring rules




14/Sep/2011          NUS SoC CSTalks             14
Example 1: File Copying
                                         Self-comparison, event-ordered

                                         xcopy copying 8 files: 1MB,
                                         10KB, 10MB, 100KB, 1MB, 10KB,
                                         10MB and 100KB

                                         DP match : operation + parameter
                                         (pathname)
                                         DP color : magenta → source; cyan
                                         → destination; black → other


                                                File Operation
                                                Source/Dst File Operation

                                                Registry Operation

14/Sep/2011            NUS SoC CSTalks                                    15
File Size

                                 File size is visible

                                 Two 1MB and 10MB are
                                 shown

                                 Two 10KB and two 100KB
                                 are visible only when
                                 zoomed in



14/Sep/2011    NUS SoC CSTalks                          16
Zooming in




                                  DP color : magenta → source;
                                  cyan → destination; black →
                                  other


14/Sep/2011     NUS SoC CSTalks                                  17
A Surprise: Registry Operations

                                     So many registry operations for
                                     a console application




                                            Registry Operation

14/Sep/2011        NUS SoC CSTalks                               18
Another Surprise: DLLs
              DLLs
                                          File, but not source or
                                          destination.

                                          Time on DLLs is more
                                          than a 1MB file.


                                                File Operation
                                                Source/Dst File Operation



14/Sep/2011             NUS SoC CSTalks                                19
Example 2: Software Build
X: succeed; Y: failed due to                                                             X: succeed
missing .c file

DP match : program + operation
+ value (pathname)



                                    Y: Failed due to missing .c file
DP color : black → any




Bar1 color : black → nmake.exe

Bar2 color : cyan → cl.exe;
magenta → link.exe

Bar3 color : cyan → reading .c
files; magenta → reading .h files
   14/Sep/2011                                                         NUS SoC CSTalks                20
Number of Executions
X: 4 compiles (cl.exe), 1 link
(link.exe)
Y: 3 compiles, 0 link




                                                                       Y: 3 compiler, 0 linker
Y: Third compile doesn’t read
.c or .h.




Bar2 color : cyan → cl.exe;
magenta → link.exe

Bar3 color : cyan → reading .c               X: 4 compiler, 1 linker
files; magenta → reading .h
files
    14/Sep/2011                  NUS SoC CSTalks                                       21
Similarity & Difference
Two traces are similar.

Y (failed) trace
terminates earlier.

Right before reading .c
file




  14/Sep/2011             NUS SoC CSTalks   22
Different Matching Rule




          Operation Type                     Program Name

14/Sep/2011                NUS SoC CSTalks                  23
Example 3: Two Idle Windows Machine
                      •    Time-ordered
                      •    1 hour each
                      •    Different time
                      •    About 750K events
                           each




14/Sep/2011    NUS SoC CSTalks                 24
Anomaly & Repeated Pattern
                          •    Periodic pattern
              R2          •    Most events in R1
        R1
                          •    Most time in R2 alike
                          •    Easily spot anomaly &
                               regular pattern




14/Sep/2011        NUS SoC CSTalks                 25
Zoom In


              R2
      R1




14/Sep/2011         NUS SoC CSTalks   26
R1: Windows Update
   • Similar events (darker
     area) are by Windows
     Auto Updater
   • More file operation,
     less registry operation

magenta → wuauclt.exe (Windows Update)

File Operation

Registry Operation
   14/Sep/2011                      NUS SoC CSTalks   27
14/Sep/2011   NUS SoC CSTalks   28
Visualizing Module Dependencies
• The problem
      – There’s vulnerability in X. Which software uses X?
      – Why my software uses X? I never call it.
      – Is it safe to uninstall X?
• Software module
      – Windows DLLs
      – UNIX .so
      – Java class, packages

14/Sep/2011                NUS SoC CSTalks                   29
Examples of dependencies (1)
•         Binaries used by notepad
      –       c:windowsapppatchacgenral.dll
      –       c:windowssystem32avgrsstx.dll
      –       c:windowssystem32imm32.dll
      –       c:windowssystem32lpk.dll
      –       c:windowssystem32msacm32.dll
      –       c:windowssystem32msctf.dll
      –       c:windowssystem32msctfime.ime
      –       c:windowssystem32shimeng.dll
      –       c:windowssystem32usp10.dll
      –       c:windowssystem32uxtheme.dll
      –       c:windowssystem32winmm.dll
      –       c:windowssystem32winspool.drv
      –       c:windowswinsxsx86_microsoft.windows.common-
              controls_6595b64144ccf1df_6.0.2600.5512_x-ww_35d4ce83comctl32.dll



14/Sep/2011                              NUS SoC CSTalks                           30
Examples of dependencies (2)
• Simple boot (only Windows installed)
      –   DLLs: 154
      –   EXEs: 10
      –   Drivers: 1
      –   Ime: 1
• Typical boot (Windows + applications)
      –   DLLs: 274
      –   EXEs: 15
      –   Telephony/Modem: 6
      –   Drivers: 3
      –   ActiveX: 2
      –   Ime: 1
14/Sep/2011                    NUS SoC CSTalks   31
Visualization (1)
• Basic dependency graph
• Graph is too dense




14/Sep/2011         NUS SoC CSTalks   32
Binary Dependency Visualization
• Two types of nodes: EXE, DLL + etc
• Three types of directed edges
      1.      EXE X launches another EXE Y
      2.      EXE X load a DLL Y
      3.      A function in binary X calls a function in binary Y
• How are binaries shared among programs?
      –       EXE Dependency Graph
      –       Only Type 1 and 2 edge
      –       Group DLLs by loader
• How binaries interact?
      –       DLL Dependency Graph
      –       Only Type 2 and 3 edge
      –       Group DLLs manually by functionality or software vendor


14/Sep/2011                             NUS SoC CSTalks                 33
Visualization (1)
• Basic dependency graph
• Graph is too dense




14/Sep/2011         NUS SoC CSTalks   34
A more usable Visualization: EXE
                   Dependency Graph
• Grouped dependency graph
                                             1

                                                     2

                                                 1


                                                         2




                                         1


14/Sep/2011                NUS SoC CSTalks                   35
Comparing Microsoft Word and Open
                 Office Writer




14/Sep/2011          NUS SoC CSTalks       36
DLL Dependency Graph: actual binary
                   usage
• Some definitions:
      – An EXE-DLL dependency in a DLL Dependency Graph is
        when there is has a control transfer from code in
        executable x to code in DLL y. We say that x has an EXE-DLL
        dependency on y.
      – A DLL-DLL dependency in a DLL Dependency Graph is
        when there is has a control transfer from code in DLL x to
        code in DLL y. We say that x has a DLL-DLL dependency on
        y




14/Sep/2011                   NUS SoC CSTalks                     37
wget: DLL dependency without grouping




14/Sep/2011             NUS SoC CSTalks          38
wget: DLL dependency group by fnctionality




14/Sep/2011           NUS SoC CSTalks            39
Examples of grouping
              By functionality (GIMP)




14/Sep/2011           NUS SoC CSTalks   40
Examples of grouping
              By software vendor (GIMP)




14/Sep/2011            NUS SoC CSTalks    41
Two Operations
• Diff
      – Compare two graphs.
              • E.g. from same program but different environment/input
              • E.g. from two related programs
      – Diff graph G1 and G2 to get G3.
• Projection
      – Focus on a particular module X
      – Only show modules that calls X or called by X
        (recursive defination)
      – Project graph G1 on module M to get G2
      – Not a simple subgraph problem

14/Sep/2011                        NUS SoC CSTalks                       42
Diff of DLL dependency graph of Internet
        Explorer with Flash and without




14/Sep/2011         NUS SoC CSTalks           43
Projection of the DLL dependency
      graph of Internet Explorer on Flash




14/Sep/2011          NUS SoC CSTalks        44
Firefox using tortoisesvn




14/Sep/2011             NUS SoC CSTalks   45
Questions?




14/Sep/2011      NUS SoC CSTalks   46
Visualizing binaries executed
• Call graph is large.
• Group functions to images => DLL dependency
  graph.
• DLL dependency graph is still large.
• Group DLLs by properties:
      – By functionality: graphics, audio, network…
      – By vendor: microsoft, adobe…
      – By path: C:windowssystem32*.dll,
        D:vmware*.dll…

14/Sep/2011                NUS SoC CSTalks            47
Visualizing binaries executed (1)
• Generate call tree, call graph, DLL dependency graph
• PIN tool to collect execution trace
      – Trace include call, return, thread, context, system call
        events
      – Call and return records stack pointer, PC and target
        address.
• Not trivial to maintain call stack by tracking call and
  return
      –   Non-return function (long jump)
      –   Thread, fiber
      –   Context
      –   Kernel callback

14/Sep/2011                    NUS SoC CSTalks                     48
Projection
void main (void) {                          Full Graph
  A();
                                               A
  B(1);                                                   C
}                                   main
void A (void) {                                B
                                                          D
  B(0);
}
void B (int i) {
  if (i) D();                              Project on A
  else C();                                   A
}                                                         C
                                   main
void C (void) {}
                                              B
void D (void) {}

14/Sep/2011            NUS SoC CSTalks                        49

Contenu connexe

En vedette

CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Augcstalks
 
CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Novcstalks
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Augcstalks
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Augcstalks
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th Maycstalks
 

En vedette (6)

CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
 
CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Nov
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Aug
 
Repair dagstuhl
Repair dagstuhlRepair dagstuhl
Repair dagstuhl
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th May
 

Similaire à CSTalks-Visualizing Software Behavior-14Sep

2014_DPDK_slides.pdf
2014_DPDK_slides.pdf2014_DPDK_slides.pdf
2014_DPDK_slides.pdfeceschmidt
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudHelix Nebula The Science Cloud
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...gogo6
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopEvert Lammerts
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...ambitlick
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...ambitlick
 
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNFUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNPankaj Singh
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software BenchmarkAkira Shibata
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesPere Urbón-Bayes
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicEric Verhulst
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 

Similaire à CSTalks-Visualizing Software Behavior-14Sep (20)

2014_DPDK_slides.pdf
2014_DPDK_slides.pdf2014_DPDK_slides.pdf
2014_DPDK_slides.pdf
 
Interactive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science CloudInteractive Data Analysis for End Users on HN Science Cloud
Interactive Data Analysis for End Users on HN Science Cloud
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Parallel io
Parallel ioParallel io
Parallel io
 
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
Energy-Efficient Protocol for Deterministic and Probabilistic Coverage In Sen...
 
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNFUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Arrays in database systems, the next frontier?
Arrays in database systems, the next frontier?Arrays in database systems, the next frontier?
Arrays in database systems, the next frontier?
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Bcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph DatabasesBcn On Rails May2010 On Graph Databases
Bcn On Rails May2010 On Graph Databases
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
Ns2pre
Ns2preNs2pre
Ns2pre
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 Altreonic
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 

Plus de cstalks

CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Marcstalks
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Marcstalks
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Marcstalks
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Marcstalks
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Febcstalks
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Febcstalks
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Febcstalks
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jancstalks
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jancstalks
 

Plus de cstalks (9)

CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Mar
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Mar
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Mar
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Mar
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Feb
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Feb
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Feb
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jan
 
CSTalks - GPGPU - 19 Jan
CSTalks  -  GPGPU - 19 JanCSTalks  -  GPGPU - 19 Jan
CSTalks - GPGPU - 19 Jan
 

Dernier

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

CSTalks-Visualizing Software Behavior-14Sep

  • 1. Visualizing Software Behavior Wu Yongzheng 14/Sep/2011 NUS SoC CSTalks 1
  • 2. Problems • Software is complex – Large codebase – Interaction between components – Components from different vendor – Closed source, closed API • Why understand software? – As developer => less bugs – As administrator => diagnosis – Curiosity? • Execution trace contains software behavior information, but it’s huge. 14/Sep/2011 NUS SoC CSTalks 2
  • 3. Software Traces • Types of traces – Instruction trace: records machine instructions – Call trace: records function calls – System call trace: records system calls – Software logs: important events • System trace – System call trace from all processes – Mainly resource usage, system & process interaction 14/Sep/2011 NUS SoC CSTalks 3
  • 4. WinResMon • WinResMon: our trace recorder. • Works in Windows • Types of events: – File: open, read, write, close, rename, … – Registry: open, get value, set value, delete, … – Network: connect, listen, send, receive, … – Process/thread: create, terminate. 14/Sep/2011 NUS SoC CSTalks 4
  • 5. Information (fields) in an Event • PID/TID Process/thread ID • Program name Path of program’s EXE • User name/group Process’ owner • Start/end time Event timing in CPU ticks • Operation type E.g. file open • Parameter Type dependent. E.g. – file path, system call flags, registry path – IP address • Call stack trace Call stack in user process 14/Sep/2011 NUS SoC CSTalks 5
  • 6. Why visualize System Traces • Software is complex – Interaction between modules, other software • Software can be closed source, but interaction is open • Human is good at detecting – Repeated pattern – Anomaly 14/Sep/2011 NUS SoC CSTalks 6
  • 7. What is DotPlot? Trace X E A C B E E E D C A C B C D E Trace Y B C E 14/Sep/2011 NUS SoC CSTalks 7
  • 8. What is DotPlot? Trace X E A C B E E E D C A C B C D E Trace Y B C E 14/Sep/2011 NUS SoC CSTalks 8
  • 9. An Example Visualization comparing: MS PowerPoint, MS Word, OO Word, and OO PowerPoint. 14/Sep/2011 NUS SoC CSTalks 9
  • 10. Elements of VDP 2 1: Extended DotPlot 2,3: Axis Histogram 3 1 4 4,5: Barcode 3 14/Sep/2011 NUS SoC CSTalks 10
  • 11. Extended DotPlot • Matching Rule – Define whether two events match – By fields: e.g. “if PIDs and resource paths are the same”, “if program names are the same” • DP Coloring Rule – Define color for matched events – Traditional DP uses black only – Use RGB model on black background, CMY on white background – Use regular expression to specify events – E.g. “.*file_open.*”→blue. “.*reg_.*”→cyan 14/Sep/2011 NUS SoC CSTalks 11
  • 12. Event-ordered and Time-ordered • Each event takes different time • The meaning/unit of each axis Event-ordered Time-ordered 14/Sep/2011 NUS SoC CSTalks 12
  • 13. Axis Histogram – Ticks mark unit time (e.g. 1 second) – Histogram • Event density (time-ordered) • Time spent (event-ordered) 14/Sep/2011 NUS SoC CSTalks 13
  • 14. Barcode • One dimensional • Highlight user chosen events • E.g. file_open → red • One or more (e.g. three below) • Barcode coloring rules 14/Sep/2011 NUS SoC CSTalks 14
  • 15. Example 1: File Copying Self-comparison, event-ordered xcopy copying 8 files: 1MB, 10KB, 10MB, 100KB, 1MB, 10KB, 10MB and 100KB DP match : operation + parameter (pathname) DP color : magenta → source; cyan → destination; black → other File Operation Source/Dst File Operation Registry Operation 14/Sep/2011 NUS SoC CSTalks 15
  • 16. File Size File size is visible Two 1MB and 10MB are shown Two 10KB and two 100KB are visible only when zoomed in 14/Sep/2011 NUS SoC CSTalks 16
  • 17. Zooming in DP color : magenta → source; cyan → destination; black → other 14/Sep/2011 NUS SoC CSTalks 17
  • 18. A Surprise: Registry Operations So many registry operations for a console application Registry Operation 14/Sep/2011 NUS SoC CSTalks 18
  • 19. Another Surprise: DLLs DLLs File, but not source or destination. Time on DLLs is more than a 1MB file. File Operation Source/Dst File Operation 14/Sep/2011 NUS SoC CSTalks 19
  • 20. Example 2: Software Build X: succeed; Y: failed due to X: succeed missing .c file DP match : program + operation + value (pathname) Y: Failed due to missing .c file DP color : black → any Bar1 color : black → nmake.exe Bar2 color : cyan → cl.exe; magenta → link.exe Bar3 color : cyan → reading .c files; magenta → reading .h files 14/Sep/2011 NUS SoC CSTalks 20
  • 21. Number of Executions X: 4 compiles (cl.exe), 1 link (link.exe) Y: 3 compiles, 0 link Y: 3 compiler, 0 linker Y: Third compile doesn’t read .c or .h. Bar2 color : cyan → cl.exe; magenta → link.exe Bar3 color : cyan → reading .c X: 4 compiler, 1 linker files; magenta → reading .h files 14/Sep/2011 NUS SoC CSTalks 21
  • 22. Similarity & Difference Two traces are similar. Y (failed) trace terminates earlier. Right before reading .c file 14/Sep/2011 NUS SoC CSTalks 22
  • 23. Different Matching Rule Operation Type Program Name 14/Sep/2011 NUS SoC CSTalks 23
  • 24. Example 3: Two Idle Windows Machine • Time-ordered • 1 hour each • Different time • About 750K events each 14/Sep/2011 NUS SoC CSTalks 24
  • 25. Anomaly & Repeated Pattern • Periodic pattern R2 • Most events in R1 R1 • Most time in R2 alike • Easily spot anomaly & regular pattern 14/Sep/2011 NUS SoC CSTalks 25
  • 26. Zoom In R2 R1 14/Sep/2011 NUS SoC CSTalks 26
  • 27. R1: Windows Update • Similar events (darker area) are by Windows Auto Updater • More file operation, less registry operation magenta → wuauclt.exe (Windows Update) File Operation Registry Operation 14/Sep/2011 NUS SoC CSTalks 27
  • 28. 14/Sep/2011 NUS SoC CSTalks 28
  • 29. Visualizing Module Dependencies • The problem – There’s vulnerability in X. Which software uses X? – Why my software uses X? I never call it. – Is it safe to uninstall X? • Software module – Windows DLLs – UNIX .so – Java class, packages 14/Sep/2011 NUS SoC CSTalks 29
  • 30. Examples of dependencies (1) • Binaries used by notepad – c:windowsapppatchacgenral.dll – c:windowssystem32avgrsstx.dll – c:windowssystem32imm32.dll – c:windowssystem32lpk.dll – c:windowssystem32msacm32.dll – c:windowssystem32msctf.dll – c:windowssystem32msctfime.ime – c:windowssystem32shimeng.dll – c:windowssystem32usp10.dll – c:windowssystem32uxtheme.dll – c:windowssystem32winmm.dll – c:windowssystem32winspool.drv – c:windowswinsxsx86_microsoft.windows.common- controls_6595b64144ccf1df_6.0.2600.5512_x-ww_35d4ce83comctl32.dll 14/Sep/2011 NUS SoC CSTalks 30
  • 31. Examples of dependencies (2) • Simple boot (only Windows installed) – DLLs: 154 – EXEs: 10 – Drivers: 1 – Ime: 1 • Typical boot (Windows + applications) – DLLs: 274 – EXEs: 15 – Telephony/Modem: 6 – Drivers: 3 – ActiveX: 2 – Ime: 1 14/Sep/2011 NUS SoC CSTalks 31
  • 32. Visualization (1) • Basic dependency graph • Graph is too dense 14/Sep/2011 NUS SoC CSTalks 32
  • 33. Binary Dependency Visualization • Two types of nodes: EXE, DLL + etc • Three types of directed edges 1. EXE X launches another EXE Y 2. EXE X load a DLL Y 3. A function in binary X calls a function in binary Y • How are binaries shared among programs? – EXE Dependency Graph – Only Type 1 and 2 edge – Group DLLs by loader • How binaries interact? – DLL Dependency Graph – Only Type 2 and 3 edge – Group DLLs manually by functionality or software vendor 14/Sep/2011 NUS SoC CSTalks 33
  • 34. Visualization (1) • Basic dependency graph • Graph is too dense 14/Sep/2011 NUS SoC CSTalks 34
  • 35. A more usable Visualization: EXE Dependency Graph • Grouped dependency graph 1 2 1 2 1 14/Sep/2011 NUS SoC CSTalks 35
  • 36. Comparing Microsoft Word and Open Office Writer 14/Sep/2011 NUS SoC CSTalks 36
  • 37. DLL Dependency Graph: actual binary usage • Some definitions: – An EXE-DLL dependency in a DLL Dependency Graph is when there is has a control transfer from code in executable x to code in DLL y. We say that x has an EXE-DLL dependency on y. – A DLL-DLL dependency in a DLL Dependency Graph is when there is has a control transfer from code in DLL x to code in DLL y. We say that x has a DLL-DLL dependency on y 14/Sep/2011 NUS SoC CSTalks 37
  • 38. wget: DLL dependency without grouping 14/Sep/2011 NUS SoC CSTalks 38
  • 39. wget: DLL dependency group by fnctionality 14/Sep/2011 NUS SoC CSTalks 39
  • 40. Examples of grouping By functionality (GIMP) 14/Sep/2011 NUS SoC CSTalks 40
  • 41. Examples of grouping By software vendor (GIMP) 14/Sep/2011 NUS SoC CSTalks 41
  • 42. Two Operations • Diff – Compare two graphs. • E.g. from same program but different environment/input • E.g. from two related programs – Diff graph G1 and G2 to get G3. • Projection – Focus on a particular module X – Only show modules that calls X or called by X (recursive defination) – Project graph G1 on module M to get G2 – Not a simple subgraph problem 14/Sep/2011 NUS SoC CSTalks 42
  • 43. Diff of DLL dependency graph of Internet Explorer with Flash and without 14/Sep/2011 NUS SoC CSTalks 43
  • 44. Projection of the DLL dependency graph of Internet Explorer on Flash 14/Sep/2011 NUS SoC CSTalks 44
  • 46. Questions? 14/Sep/2011 NUS SoC CSTalks 46
  • 47. Visualizing binaries executed • Call graph is large. • Group functions to images => DLL dependency graph. • DLL dependency graph is still large. • Group DLLs by properties: – By functionality: graphics, audio, network… – By vendor: microsoft, adobe… – By path: C:windowssystem32*.dll, D:vmware*.dll… 14/Sep/2011 NUS SoC CSTalks 47
  • 48. Visualizing binaries executed (1) • Generate call tree, call graph, DLL dependency graph • PIN tool to collect execution trace – Trace include call, return, thread, context, system call events – Call and return records stack pointer, PC and target address. • Not trivial to maintain call stack by tracking call and return – Non-return function (long jump) – Thread, fiber – Context – Kernel callback 14/Sep/2011 NUS SoC CSTalks 48
  • 49. Projection void main (void) { Full Graph A(); A B(1); C } main void A (void) { B D B(0); } void B (int i) { if (i) D(); Project on A else C(); A } C main void C (void) {} B void D (void) {} 14/Sep/2011 NUS SoC CSTalks 49