3. https://github.com/ashish-
gehani/SPADE/wiki
• Strace Reporter
– Programs run under strace. Produced log is parsed
to extract provenance.
• LLVMTrace
– Instrumentation added to function boundaries at
compile time.
• DataTracker
– Dynamic Taint Analysis. Bytes associated with
metadata which are propagated as the program
executes.
3
SPADEv2 – Provenance
Collection
7. Incomplete Picture
• Faster, but how much?
• What is the performance “price” for fewer
false positives?
• Does a compile-time solution worth the
effort?
7
8. How can one get more
insight?
Run a benchmark!
8
9. Which one?
• LMBench, UnixBench, Postmark, BLAST,
SPECint…
• [Traeger 08]: “Most popular benchmarks
are flawed.”
• No-matter what you chose, there will be
blind spots.
9
10. Start simple: UnixBench
• Well understood sub-benchmarks.
• Emphasizes on performance of system calls.
• System calls are commonly used for the
extraction of provenance.
• More insight on which collection backend
would suit specific applications.
• We’ll have a performance baseline to
improve the specific implementations.
10
13. Performance vs. Integration
Effort
• Capturing provenance from completely
unmodified programs may degrade
performance.
• Modification of either the source
(LLVMTrace) or the platform (LPM, Hi-Fi)
should be considered for a production
deployment.
13
14. Performance vs. Provenance
Granularity
• We couldn’t verify this intuition for the case
of strace reoporter compared to
LLVMTrace.
– Strace reporter implementation is not optimal.
• Tracking fine-grained provenance may
interfere with existing optimizations.
– E.g. buffering I/O does not benefit
DataTracker.
14
15. Performance vs.
False Positives/Analysis Scope
• “Brute-forcing” a low false-positive ratio with the
“track everything” approach of DataTracker is
prohibitively expensive.
• Limiting the analysis scope gives a performance
boost.
• If we exploit known semantics, we can have the
best of both worlds.
– Pre-existing semantic knowledge: LLVMTrace
– Dynamically acquired knowledge: ProTracer [Ma
2016]
15
19. Takeaway: Taint Analysis
• Prohibitively expensive
for computation-
intensive programs.
• Likely to remain so,
even after optimizations.
• Reserved for
provenance analysis of
unknown/legacy
software.
• Offline approach
(Stamatogiannakis
TAPP’15)
19
20. Generalizing the Results
• Only one implementation
was tested for each method.
• Repeating testing with
alternative implementations
will provide confidence for
the insights gained.
• More confidence when
choosing a specific collection
method.
20
Different methods
Differentimplementations
21. Implementation Details Matter
• Our results are influenced by the specifics
of the implementation.
• Anecdote: The initial implementation of
LLVMTrace was actually slower than
strace reporter.
21
22. Provenance Quality
• Qualitative features of the
provenance are also very
important.
• How many vertices/edges are
contained in the generated
provenance graph?
• Precision/Recall based on
provenance ground truth.
22
Performance Benchmarks
QualitativeBenchmarks
23. Where to go next?
• UnixBench is a basic benchmark.
• SPEC: Comprehensive in terms of
performance evaluation.
– Hard to get the provenance ground truth –
assess quality of captured provenance.
• Better directions:
– Coreutils based micro-benchmarks.
– Macro-benchmarks (e.g. Postmark,
compilation benchmarks).
23
24. Conclusion
• Automatic provenance capture is an
important part of the ecosystem
• Trade-offs in different capture modes
• Benchmarking – to inform
• Common platforms are essential
24
Traeger was focusing on using benchmarks for measuring filesystem/storage performance.
His observation is pretty much valid for using benchmarks for measuring other types of performance.
SPEC includes several sub-benchmarks which may be atypical for provenance analysis. E.g. discrete event simulator or quantum computer simulator.
If we want to also measure precision/recall, it is hard to get the ground truth for the provenance generated by these benchmarks.
1. execl-xput: How fast the current process image can be replaced with a new one, as a result of an execve system call.
2. fcopy-256, fcopy-1024, fcopy-4096: Speed of a file-to-file copy using dif- ferent buffer sizes.
3. pipe-xput, pipe-cs: Speed of communication over pipes. In the first test, the read and writes on the pipe happen from a single process. In the second test a second process is spawned, so the communication also includes a context switch between the two.
4. spawn-xput: A simple fork-wait loop to measure how much time is needed to create and then destroy a process.
5. shell-1, shell-8: Execution speed for the processing of a data file. The processing is implemented using common unix utilities, wrapped in a shell script. The two tests differ in the number of concurrently executing scripts.
6. syscall: System call overhead. The test uses getpid to measure this. The specific system call is chosen because it requires minimal in-kernel processing, so its main overhead comes from the switch between kernel and user mode.
Degradation depends on method used.
LPM [Bates 15] already supports the SPADE DSL reporter.
Reason for being slower: Lack of buffering. A new connection was opened each time we needed to output a piece of provenance.
SPADEv2 provides easy interfacing with other provenance systems via the DSL Report.
Linux Provenance Modules [Bates 15] already support it.
This makes it a good platform for measuring qualitative features (such as # of edges/vertices) and also to run queries that would verify if the ground truth was captured.
Execl: excve speed
Fcopy-*: file copy with different buffer sizes
Pipe-*: pipe communcation
Spawn: fork-wait loop (process creation/destruction speed)
Shell-*: Unix utilities wrapped in a script. Similar to what coreutils testing would yield.
Syscall: system call overhead (uses getpid as the most “lightweight” system call)