Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโs ...
ย
Learning on Deep Learning
1. Learning about Deep Learning
Applications for OpenJDK verification
@ShelleyMLambert and Longyu Zhang
AdoptOpenJDK, Eclipse OpenJ9, Eclipse OMR Verification
2. Intro & Motivation
โข Early days, thought-starter
โข Explore and experiment, determine feasibility
โข Revive projects in waiting
โข Make test better
*from Professor Andrew Ng
AQA
Test
Data
โVirtuous Circle for AIโ* as applied to verification
โข Create/gather tests
โข Run tests
โข Gather data from runs
โข Make test better
3. What is Deep Learning?
Deep learning is a subset of ML
algorithms distinguished by:
โข Loosely based on structure and
function of the brain, use
artificial neural networks (ANN)
โข Multiple layers of processing
units, โneuronsโ, output of a
layer is input to another layer
โข Modes of learning, supervised
(regression, classification) or
unsupervised (pattern analysis)
4. Remember Your Math
Computationally Heavy
โข Non-linear functions applied at
each layer (sigmoid, tanh, ReLU,
etc)
โข Forward/Backward propagation,
derivatives, gradient descent
โข Weight adjustment (the action to
improve)
โข Human level error
โข Training set error
โข Dev set error
โข Next step is known, when in
doubt add more data
bias
variance
6. Guidance for Problem Selection
โข Anything that a human can do with a second of
thought, can be automated with AI*
โข Where are you data-rich?
โข Parked ideas, waiting for the right
tool/approach
โข Outputs that help drive next actions
*from Professor Andrew Ng
7. Data-hungry DL
what we can feed it?
โข Code reviews
โข Static analysis
โข Pull requests
โข Code coverage values
โข Test output
โ Verbose console, result status, exceptions, trace info,
benchmark results, GC/JIT logs, cores, instrumented data
โข Github issues / cores
โข Job schedules, execution times
โข Machine config info / status
static
dynamic
peripheral
11. zlinuxzlinux
QA is Swimming in Data
Test output:
โข Vast amounts of data per day:
โ 6 impls (openj9/15, hotspot/18, ibm/22, sap/1, corretto/3, upstream/3)
โ sum([15,18,22,1,3,3]) = 62 impl_spec value
โ 250,000+ unique tests
โ 6 versions (8, 9, 10, 11, 12, 13, 14, 15, 16, Valhalla)
โ ~36 variants (unique inputs / commandline options)
โข Impls_specsTotal x numTests x versions
โ 62x250000x6=93,000,000
โ With variants -> 93,000,000x36=3,348,000,000 tests run
OpenJ9 Hotspot SAP IBM
8 11 14 15 16 +
osxosx aix win xlinux plinux zlinux
openjdk functional perf system external
Corretto Upstream R.H.
aarch64 riscv
Valhalla
Conservative
estimate, excludes
PR & Docker
image testing
12. Activities (and Questions) Related to Test
Plan Implement Automate Execute
Triage Exclude
Report
What? How?
How
often?
How
easy?
How
few?
How
fast?
What failed?
Why? What
next?
Decompose into a set of services by test activity, services to help answer the questions
and take next actions.
14. Past Prototypes Revisited
ResultAnalytics
Data
Services
UI Layer
Cores
raw refined
custom dashboardother clients
TestGeneration BenchEngineCoreAnalytics TestSelection
BugPrediction
InputOptions ResultSummary
ResultCompare
github
repos
Jenkins
servers
TRSS
15. Core Analytics Service
โข visualize & analyze data from cores
โข predict crashes based on data mined from core files
16. Bug Prediction Service
โข Scores per file based on โrecentโ changes due to defects (github PRs/issues),
predict based on change & defect history, other features?
17. Input Options Service
โข Input Options Service โ grabs input options defined in tests
at start of a build, names and stores them (unique sentences
of options), for sharing with other builds/tests, can reply
whether inputs are valid for particular platforms
18. Deep Learning Service
ResultAnalytics
Data
Services
UI Layer
Cores
raw refined
custom dashboardother clients
TestGeneration BenchEngineCoreAnalytics TestSelection
BugPrediction
InputOptions ResultSummary
ResultCompare
github
repos
Jenkins
servers
DL service
19. Areas of Interest
โข Test generation (write tests for me)
โข Find or predict defects
โ fuzz testing to verify compilers, to find security vulnerabilities
โ bug prediction
โข Triage failures
โ Categorization, which component is root cause of failure?
โข Next action post-failure
โ Binary classifier, is โrealโ defect or not
โข Analyze performance
โ Predict if changes will improve performance or not, by how much?
โข Optimize machine usage, lab reqs
โ Optimize automation, scheduling
โ Predict test execution time, predict if a test run will fail
โข Replace myself with automation
20. Model Building
Version
Variants used
Things we know
(input layer)
Failure expression
Platform
Impl
Machine โageโ
Failure age
PR list
Find/predict bugs
Things we want to know
(output layer)
Bug prediction scores
Triage failures
Predict perf
Optimize usage
Write tests
Next best action
22. DeepSmith (CAS project*)
โข Automatically generating test programs with deep learning
technology to verify compilers & find security vulnerabilities.
*with Professor Hugh Leather from University of Edinburgh
Scraped 400 GB
Java Program
from GitHub
Train DL model with LSTM
(Long Short-Term Memory)
to automatically generate
more Java Tests
Compare outputs
to verify
compilers & find
vulnerabilities
A/B Testing with different
JDKs/JVMs on Jenkins
JDK 8 with OpenJ9
JDK 11 with OpenJ9
JDK 11 with Hotspot
JDK 11 with Corretto
A/B Testing with various
JIT settings on Jenkins
JIT enabled
JIT disabled
23. โข Analyzing test outputs with deep learning to classify test
result types: success or failure (compiler crash, build
timeout, build failure, program crash, wrong output).
*consulting with IBM Machine Learning Hub
Archive test
outputs and
results from
Jenkins
Store test
data into
database
Pre-process
data (use tf-idf
to generate
vocabulary)
Train DL model to classify
test results (weighted
model, dropout layer,
early stopping)
Evaluate DL model
with metrics
(precision, recall)
Continuous improvement with
more data and models
Apply developed
DL model to
analyze test
outputs
Test Output Analysis*
24. โข Utilize deep learning model to recommend possible GitHub
issues related to test failures.
Collect Issues
from GitHub
repos
Pre-process
issue contents
Train DL model
to classify
multiple issues
Evaluate DL model
with TRSS/Jenkins
output
Continuous improvement with
more data and models
Deploy DL model in
TRSS to recommend
related issues
Recommend Related GitHub Issues
27. References
โข Papers from Hugh Leather
โ Compiler Fuzzing through Deep Learning
โ End to End Deep Learning of Optimization Heuristics
โ Synthesizing Benchmarks for Predictive Modeling
โข Videos & course work from Prof Andrew Ng
โ Artificial Intelligence is the New Electricity
โ Coursera: Deep Learning Specialization courses
โข Bug prediction paper: *BugCache for Inspections: Hit or Miss?