Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

Raffi Khatchadourian
Raffi KhatchadourianAssociate Professor of Computer Science à City University of New York (CUNY) Hunter College
Introduction Motivation Optimization Approach Conclusion
Towards Safe Automated Refactoring of
Imperative Deep Learning Programs to Graph
Execution
Raffi Khatchadourian1,2
Tatiana Castro Vélez2
Mehdi
Bagherzadeh3
Nan Jia2
Anita Raja1,2
1
City University of New York (CUNY) Hunter College, USA
2
City University of New York (CUNY) Graduate Center, USA
3
Oakland University, USA
International Conference on Automated Software Engineering
September 14, 2023, Kirchberg, Luxembourg
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 1 / 12
Introduction Motivation Optimization Approach Conclusion
Deep Learning Systems & Run-time Performance
Machine Learning (ML), including Deep Learning (DL), systems are
pervasive.
As datasets grow, efficiency becomes essential to support
responsiveness [Zhou et al., 2020].
For efficiency, DL frameworks have traditionally embraced a deferred
execution-style supporting graph-based (DNN) computation.
Scalable, but development is . . .
Error-prone.
Cumbersome.
Produces programs that are difficult to debug.
Because graph computation executes statements in a non-imperative
order, traditional SE tools cannot help troubleshoot bugs [Arpteg
et al., 2018].
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 2 / 12
TensorFlow Deferred Execution-style Code
1 # Build a graph.
2 a = tf.constant(5.0)
3 b = tf.constant(6.0)
4 c = a * b
5
6 # Launch graph in a session.
7 sess = tf.Session()
8
9 # Evaluate the tensor `c`.
10 print(sess.run(c)) # prints 30.0
Lines 2–4 build a computation graph.
Line 4 does not execute until the Session is run on line 10.
No native support common imperative program constructs, e.g.,
iteration.
TensorFlow Deferred Execution-style Code
1 # Build a graph.
2 a = tf.constant(5.0)
3 b = tf.constant(6.0)
4 c = a * b
5
6 # Launch graph in a session.
7 sess = tf.Session()
8
9 # Evaluate the tensor `c`.
10 print(sess.run(c)) # prints 30.0
Lines 2–4 build a computation graph.
Line 4 does not execute until the Session is run on line 10.
No native support common imperative program constructs, e.g.,
iteration.
TensorFlow Deferred Execution-style Code
1 # Build a graph.
2 a = tf.constant(5.0)
3 b = tf.constant(6.0)
4 c = a * b
5
6 # Launch graph in a session.
7 sess = tf.Session()
8
9 # Evaluate the tensor `c`.
10 print(sess.run(c)) # prints 30.0
Lines 2–4 build a computation graph.
Line 4 does not execute until the Session is run on line 10.
No native support common imperative program constructs, e.g.,
iteration.
Introduction Motivation Optimization Approach Conclusion
Imperative DL Programming, Eager Execution, &
Hybridization
Imperative DL frameworks (e.g., TensorFlow Eager, Keras, PyTorch)
encouraging eager execution are more natural, less error-prone, and
easier to debug.
Sacrifices run-time performance.
Thus, hybrid approaches (e.g., Hybridize, TorchScript, AutoGraph)
have surfaced that:
Execute imperative DL programs as static graphs at run-time.
Are integrated into mainstream DL frameworks (e.g.,
TensorFlow, MXNet, PyTorch).
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 4 / 12
Eager TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
Hybridized TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10 @tf.function(...) # Executes model as graph (optional args).
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
On line 10, AutoGraph used to potentially enhance performance.
Decorates model’s call() method with @tf.function, possibly
providing optional yet influential decorator arguments.
At run-time, call()’s execution will be “traced” (∼9.22 speedup).
Hybridized TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10 @tf.function(...) # Executes model as graph (optional args).
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
On line 10, AutoGraph used to potentially enhance performance.
Decorates model’s call() method with @tf.function, possibly
providing optional yet influential decorator arguments.
At run-time, call()’s execution will be “traced” (∼9.22 speedup).
Hybridized TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10 @tf.function(...) # Executes model as graph (optional args).
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
On line 10, AutoGraph used to potentially enhance performance.
Decorates model’s call() method with @tf.function, possibly
providing optional yet influential decorator arguments.
At run-time, call()’s execution will be “traced” (∼9.22 speedup).
Introduction Motivation Optimization Approach Conclusion Drawbacks
Hybridization Drawbacks
Needs non-trivial, specialized metadata [Jeong et al., 2019].
Exhibit limitations and known issues with native program constructs.
Subtle considerations required to:
Specify (decorate) the functions to be migrated.
Make code amenable to safe, accurate, and efficient graph execution.
Avoid performance bottlenecks and semantically inequivalent
results [Cao et al., 2021, Castro Vélez et al., 2022].
Manual analysis and refactoring (semantics-preserving,
source-to-source transformation) for optimal results can be error-
and omission-prone [Dig et al., 2009].
Further complicated by:
Increasing Object-Orientation (OO) in DL model code [Chollet,
2020].
Dynamically-typed languages (e.g., Python).
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 7 / 12
Introduction Motivation Optimization Approach Conclusion Problem Insight Goals Approach
Key Insight
Although imperative DL code is sequentially executed, hybridizing code
resembles parallelizing sequential code.
Example
To void unexpected behavior, like concurrent programs, hybrid functions
should avoid side-effects.
Idea
Adapt concepts from automated refactorings that parallelize sequential
code, e.g., Streaming APIs [Khatchadourian et al., 2019].
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 8 / 12
Work In Progress
Two new, fully-automated refactorings are in-progress:
Convert Eager Function to Hybrid Transforms otherwise
eagerly-executed imperative (Python) DL code for
enhanced run-time performance.
Automatically specifies (decorates) whether and how
code could be reliably and efficiently executed as
graphs at run-time.
Avoids hybridizing code under certain conditions
(e.g., side-effecting code) to preserve semantics.
Optimize Hybrid Function Transforms code already running as
graphs for optimal run-time performance.
Modifies existing decorator parameters (e.g., tensor
shape specs).
Potentially restructures code to be more amenable to
graph transformation.
Possibly dehybridize code when eager execution could
be faster (e.g., graph “retracing”).
Approach Highlights
Novel tensor analysis for imperative DL code.
Current analyzers work on only procedural (TF 1) code.
Modernization of WALA Ariadne [Dolby et al., 2018] for imperative
(TF 2) code underway.
Implemented as a PyDev Eclipse IDE plug-in [Zadrozny, 2023].
Integrates Ariadne for tensor type inference and shape (static)
analysis.
Introduction Motivation Optimization Approach Conclusion Problem Insight Goals Approach
Approach Challenges
Lack of static type information.
Needed to determine candidate functions (at least one Tensor
parameter).
Unlike, e.g., Java, Python has no restrictions on decorator
(annotation) arguments.
tf.function may be called as a function instead of a decorator.
Example
hyb_call = tf.function(call)
hyb_call()
Determine tensor shapes.
Existing analyses only for procedural (TF 1) code.
Working towards statically resolving imperative (TF 2) code.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 11 / 12
Introduction Motivation Optimization Approach Conclusion
Conclusion
Imperative Deep Learning code is easier to debug, write, and
maintain than traditional DL code that runs in a deferred execution.
However, it comes at the expense of (run-time) performance.
Hybrid approaches bridge the gap between eager and graph
execution.
Using hybrid techniques to achieve optimal performance and
semantics preservation is non-trivial.
Future Work
Automated client-side analyses and transformations to use
hybridization APIs correctly and optimally is in-progress.
Evaluation using dataset from our MSR ’22 [Castro Vélez et al.,
2022] empirical study.
More details in paper! http://bit.ly/tf2-ase23
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
Introduction Motivation Optimization Approach Conclusion
For Further Reading I
Abadi, Martı́n et al. (2016). “TensorFlow: A System for Large-Scale Machine Learning”. In: Symposium on Operating Systems
Design and Implementation.
Agrawal, Akshay et al. (2019). TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning. arXiv:
1903.01855 [cs.PL].
Apache (Apr. 8, 2021). Hybridize. Apache MXNet documentation. url:
https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/packages/gluon/blocks/hybridize.html (visited
on 04/08/2021).
Arpteg, A., B. Brinne, L. Crnkovic-Friis, and J. Bosch (2018). “Software Engineering Challenges of Deep Learning”. In: Euromicro
Conference on Software Engineering and Advanced Applications. IEEE, pp. 50–59. doi: 10.1109/SEAA.2018.00018.
Cao, Junming, Bihuan Chen, Chao Sun, Longjie Hu, and Xin Peng (Dec. 3, 2021). Characterizing Performance Bugs in Deep
Learning Systems. arXiv: 2112.01771 [cs.SE].
Castro Vélez, Tatiana, Raffi Khatchadourian, Mehdi Bagherzadeh, and Anita Raja (May 2022). “Challenges in Migrating
Imperative Deep Learning Programs to Graph Execution: An Empirical Study”. In: International Conference on Mining Software
Repositories. MSR ’22. ACM/IEEE. ACM. doi: 10.1145/3524842.3528455. arXiv: 2201.09953 [cs.SE].
Chen, Tianqi, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang
(2015). “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”. In: Workshop on
Machine Learning Systems at NIPS. arXiv: 1512.01274 [cs.DC].
Chollet, François (2020). Deep Learning with Python. 2nd ed. Manning.
Dig, Danny, John Marrero, and Michael D. Ernst (2009). “Refactoring sequential Java code for concurrency via concurrent
libraries”. In: International Conference on Software Engineering. IEEE, pp. 397–407. doi: 10.1109/ICSE.2009.5070539.
Dilhara, Malinda, Ameya Ketkar, Nikhith Sannidhi, and Danny Dig (2022). “Discovering Repetitive Code Changes in Python ML
Systems”. In: International Conference on Software Engineering. ICSE ’22. To appear.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
Introduction Motivation Optimization Approach Conclusion
For Further Reading II
Dolby, Julian, Avraham Shinnar, Allison Allain, and Jenna Reinen (2018). “Ariadne: Analysis for Machine Learning Programs”. In:
International Workshop on Machine Learning and Programming Languages. MAPL 2018. ACM SIGPLAN. Philadelphia, PA, USA:
Association for Computing Machinery, pp. 1–10. isbn: 9781450358347. doi: 10.1145/3211346.3211349.
Facebook Inc. (2019). PyTorch Documentation. TorchScript. en. url: https://pytorch.org/docs/stable/jit.html (visited on
02/19/2021).
Jeong, Eunji, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Taebum Kim, and Byung-Gon Chun (July 2019).
“Speculative Symbolic Graph Execution of Imperative Deep Learning Programs”. In: SIGOPS Oper. Syst. Rev. 53.1, pp. 26–33.
issn: 0163-5980. doi: 10.1145/3352020.3352025.
Khatchadourian, Raffi, Yiming Tang, Mehdi Bagherzadeh, and Syed Ahmed (2019). “Safe Automated Refactoring for Intelligent
Parallelization of Java 8 Streams”. In: International Conference on Software Engineering. ICSE ’19. IEEE Press, pp. 619–630. doi:
10.1109/ICSE.2019.00072.
Kim, Miryung, Thomas Zimmermann, and Nachiappan Nagappan (Nov. 2012). “A Field Study of Refactoring Challenges and
Benefits”. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software
Engineering. FSE ’12. Cary, North Carolina: ACM. isbn: 9781450316149. doi: 10.1145/2393596.2393655.
Moldovan, Dan, James M. Decker, Fei Wang, Andrew A. Johnson, Brian K. Lee, Zachary Nado, D. Sculley, Tiark Rompf, and
Alexander B. Wiltschko (2019). AutoGraph: Imperative-style Coding with Graph-based Performance. arXiv: 1810.08061 [cs.PL].
Negara, Stas, Nicholas Chen, Mohsen Vakilian, Ralph E. Johnson, and Danny Dig (2013). “A Comparative Study of Manual and
Automated Refactorings”. In: European Conference on Object-Oriented Programming. Ed. by Giuseppe Castagna. Berlin,
Heidelberg: Springer Berlin Heidelberg, pp. 552–576. isbn: 978-3-642-39038-8.
OpenAI, Inc. (Aug. 18, 2023). ChatGPT. url: https://chat.openai.com (visited on 08/18/2023).
Paszke, Adam et al. (Dec. 3, 2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv: 1912.01703
[cs.LG].
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
Introduction Motivation Optimization Approach Conclusion
For Further Reading III
Zadrozny, Fabio (Apr. 15, 2023). PyDev. url: https://www.pydev.org (visited on 05/31/2023).
Zhou, Weijie, Yue Zhao, Guoqiang Zhang, and Xipeng Shen (2020). “HARP: Holistic Analysis for Refactoring Python-Based
Analytics Programs”. In: International Conference on Software Engineering, pp. 506–517. doi: 10.1145/3377811.3380434.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
Appendix Static Analysis Refactoring LLMs Notebooks
Appendix
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 1 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Why Static Analysis?
Refactorings must operate on (at least some) static information.
Must eventually transform the source code.
May eventually integrate hybrid analyses to resolve difficult static
cases.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 2 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Why Automated Refactoring?
In general, such problems may also be handled by compilers or
runtimes; however, refactoring has several benefits:
Gives developers more control over where the optimizations take
place and making graph execution explicit.
Can be issued multiple times, e.g., prior to major releases.
Unlike static checkers, they transform source code, a task that can
be otherwise error-prone and involve subtle nuances.
Refactorings can act like recommendation systems, which is
important for analyzing and transforming programs written in
dynamic languages where static assumptions may be easily violated!
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 3 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Refactoring Developer Adoption
Developers generally underuse automated refactorings [Kim et al.,
2012, Negara et al., 2013].
Data scientists and engineers may be more open to using automated
(refactoring) tools.
Our approach will be fully automated with minimal barrier to entry.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 4 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
LLMs & Big Data Refactoring
LLMs [OpenAI, Inc., 2023] can also perform refactorings.
Other Big Data-driven refactorings [Dilhara et al., 2022] are exciting
and promising.
Obtaining a (correct) dataset large enough to automatically extract
the proposed refactorings is challenging as developers struggle with
(manually) migrating DL code to graph execution [Castro Vélez
et al., 2022].
LLM inference capabilities are currently limited.
LLMs have a token limitation.
Hybridization requires interprocedural analysis.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 5 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Notebook Support
We plan to investigate notebook support in the future.
We envision the approach to be used on (larger) DL systems,
consisting of multiple files.
Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 6 / 6
1 sur 25

Recommandé

Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:... par
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Raffi Khatchadourian
302 vues58 diapositives
Deep Learning, Scala, and Spark par
Deep Learning, Scala, and SparkDeep Learning, Scala, and Spark
Deep Learning, Scala, and SparkOswald Campesato
313 vues59 diapositives
Deep Learning and TensorFlow par
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlowOswald Campesato
114 vues63 diapositives
Automatic Task-based Code Generation for High Performance DSEL par
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
1.4K vues54 diapositives
Natural language processing open seminar For Tensorflow usage par
Natural language processing open seminar For Tensorflow usageNatural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usagehyunyoung Lee
188 vues48 diapositives
Keras and TensorFlow par
Keras and TensorFlowKeras and TensorFlow
Keras and TensorFlowNopphawanTamkuan
86 vues11 diapositives

Contenu connexe

Similaire à Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

Go faster with_native_compilation Part-2 par
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
546 vues44 diapositives
Go Faster With Native Compilation par
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native CompilationPGConf APAC
1K vues44 diapositives
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B... par
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
2.5K vues39 diapositives
H2 o berkeleydltf par
H2 o berkeleydltfH2 o berkeleydltf
H2 o berkeleydltfOswald Campesato
246 vues67 diapositives
Deep Learning in Your Browser par
Deep Learning in Your BrowserDeep Learning in Your Browser
Deep Learning in Your BrowserOswald Campesato
165 vues66 diapositives
Overview of TensorFlow For Natural Language Processing par
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
6K vues11 diapositives

Similaire à Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution(20)

Go Faster With Native Compilation par PGConf APAC
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
PGConf APAC1K vues
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B... par Databricks
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
Databricks2.5K vues
Overview of TensorFlow For Natural Language Processing par ananth
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth6K vues
190111 tf2 preview_jwkang_pub par Jaewook. Kang
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub
Jaewook. Kang510 vues
Introduction to Deep Learning and Tensorflow par Oswald Campesato
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
Oswald Campesato203 vues
Standardizing on a single N-dimensional array API for Python par Ralf Gommers
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers119 vues
Introduction to Deep Learning, Keras, and TensorFlow par Sri Ambati
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati2.1K vues
Introduction to Deep Learning, Keras, and Tensorflow par Oswald Campesato
Introduction to Deep Learning, Keras, and TensorflowIntroduction to Deep Learning, Keras, and Tensorflow
Introduction to Deep Learning, Keras, and Tensorflow
Oswald Campesato397 vues
Pydiomatic par rik0
PydiomaticPydiomatic
Pydiomatic
rik0924 vues
Bring your neural networks to the browser with TF.js - Simone Scardapane par MeetupDataScienceRoma
Bring your neural networks to the browser with TF.js - Simone ScardapaneBring your neural networks to the browser with TF.js - Simone Scardapane
Bring your neural networks to the browser with TF.js - Simone Scardapane
Language translation with Deep Learning (RNN) with TensorFlow par S N
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
S N2.2K vues

Plus de Raffi Khatchadourian

Automated Evolution of Feature Logging Statement Levels Using Git Histories a... par
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Raffi Khatchadourian
58 vues21 diapositives
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o... par
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o...A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o...
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o...Raffi Khatchadourian
433 vues16 diapositives
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U... par
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Raffi Khatchadourian
362 vues28 diapositives
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys... par
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
1.3K vues26 diapositives
Automated Evolution of Feature Logging Statement Levels Using Git Histories a... par
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Raffi Khatchadourian
120 vues49 diapositives
An Empirical Study on the Use and Misuse of Java 8 Streams par
An Empirical Study on the Use and Misuse of Java 8 StreamsAn Empirical Study on the Use and Misuse of Java 8 Streams
An Empirical Study on the Use and Misuse of Java 8 StreamsRaffi Khatchadourian
439 vues73 diapositives

Plus de Raffi Khatchadourian(20)

Automated Evolution of Feature Logging Statement Levels Using Git Histories a... par Raffi Khatchadourian
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o... par Raffi Khatchadourian
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o...A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o...
A Tool for Rejuvenating Feature Logging Levels via Git Histories and Degree o...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U... par Raffi Khatchadourian
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
Actor Concurrency Bugs: A Comprehensive Study on Symptoms, Root Causes, API U...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys... par Raffi Khatchadourian
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a... par Raffi Khatchadourian
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
Automated Evolution of Feature Logging Statement Levels Using Git Histories a...
An Empirical Study on the Use and Misuse of Java 8 Streams par Raffi Khatchadourian
An Empirical Study on the Use and Misuse of Java 8 StreamsAn Empirical Study on the Use and Misuse of Java 8 Streams
An Empirical Study on the Use and Misuse of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams par Raffi Khatchadourian
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsSafe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams par Raffi Khatchadourian
Safe Automated Refactoring for Intelligent Parallelization of Java 8 StreamsSafe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams ... par Raffi Khatchadourian
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams ...Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams ...
Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams ...
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring par Raffi Khatchadourian
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
Porting the NetBeans Java 8 Enhanced For Loop Lambda Expression Refactoring t... par Raffi Khatchadourian
Porting the NetBeans Java 8 Enhanced For Loop Lambda Expression Refactoring t...Porting the NetBeans Java 8 Enhanced For Loop Lambda Expression Refactoring t...
Porting the NetBeans Java 8 Enhanced For Loop Lambda Expression Refactoring t...
Towards Safe Refactoring for Intelligent Parallelization of Java 8 Streams par Raffi Khatchadourian
Towards Safe Refactoring for Intelligent Parallelization of Java 8 StreamsTowards Safe Refactoring for Intelligent Parallelization of Java 8 Streams
Towards Safe Refactoring for Intelligent Parallelization of Java 8 Streams
Proactive Empirical Assessment of New Language Feature Adoption via Automated... par Raffi Khatchadourian
Proactive Empirical Assessment of New Language Feature Adoption via Automated...Proactive Empirical Assessment of New Language Feature Adoption via Automated...
Proactive Empirical Assessment of New Language Feature Adoption via Automated...
Defaultification Refactoring: A Tool for Automatically Converting Java Method... par Raffi Khatchadourian
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method... par Raffi Khatchadourian
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Defaultification Refactoring: A Tool for Automatically Converting Java Method...
Automated Refactoring of Legacy Java Software to Default Methods Talk at ICSE... par Raffi Khatchadourian
Automated Refactoring of Legacy Java Software to Default Methods Talk at ICSE...Automated Refactoring of Legacy Java Software to Default Methods Talk at ICSE...
Automated Refactoring of Legacy Java Software to Default Methods Talk at ICSE...
Poster on Automated Refactoring of Legacy Java Software to Default Methods par Raffi Khatchadourian
Poster on Automated Refactoring of Legacy Java Software to Default MethodsPoster on Automated Refactoring of Legacy Java Software to Default Methods
Poster on Automated Refactoring of Legacy Java Software to Default Methods
Automated Refactoring of Legacy Java Software to Default Methods Talk at GMU par Raffi Khatchadourian
Automated Refactoring of Legacy Java Software to Default Methods Talk at GMUAutomated Refactoring of Legacy Java Software to Default Methods Talk at GMU
Automated Refactoring of Legacy Java Software to Default Methods Talk at GMU
Towards Improving Interface Modularity in Legacy Java Software Through Automa... par Raffi Khatchadourian
Towards Improving Interface Modularity in Legacy Java Software Through Automa...Towards Improving Interface Modularity in Legacy Java Software Through Automa...
Towards Improving Interface Modularity in Legacy Java Software Through Automa...

Dernier

SAP FOR CONTRACT MANUFACTURING.pdf par
SAP FOR CONTRACT MANUFACTURING.pdfSAP FOR CONTRACT MANUFACTURING.pdf
SAP FOR CONTRACT MANUFACTURING.pdfVirendra Rai, PMP
13 vues2 diapositives
Agile 101 par
Agile 101Agile 101
Agile 101John Valentino
9 vues20 diapositives
ShortStory_qlora.pptx par
ShortStory_qlora.pptxShortStory_qlora.pptx
ShortStory_qlora.pptxpranathikrishna22
5 vues10 diapositives
The Era of Large Language Models.pptx par
The Era of Large Language Models.pptxThe Era of Large Language Models.pptx
The Era of Large Language Models.pptxAbdulVahedShaik
7 vues9 diapositives
AI and Ml presentation .pptx par
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptxFayazAli87
13 vues15 diapositives
Software evolution understanding: Automatic extraction of software identifier... par
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...Ra'Fat Al-Msie'deen
10 vues33 diapositives

Dernier(20)

AI and Ml presentation .pptx par FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8713 vues
Software evolution understanding: Automatic extraction of software identifier... par Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
Ports-and-Adapters Architecture for Embedded HMI par Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Dapr Unleashed: Accelerating Microservice Development par Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
360 graden fabriek par info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492143 vues
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... par TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 vues
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx par animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm15 vues
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action par Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok15 vues

Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

  • 1. Introduction Motivation Optimization Approach Conclusion Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Raffi Khatchadourian1,2 Tatiana Castro Vélez2 Mehdi Bagherzadeh3 Nan Jia2 Anita Raja1,2 1 City University of New York (CUNY) Hunter College, USA 2 City University of New York (CUNY) Graduate Center, USA 3 Oakland University, USA International Conference on Automated Software Engineering September 14, 2023, Kirchberg, Luxembourg Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 1 / 12
  • 2. Introduction Motivation Optimization Approach Conclusion Deep Learning Systems & Run-time Performance Machine Learning (ML), including Deep Learning (DL), systems are pervasive. As datasets grow, efficiency becomes essential to support responsiveness [Zhou et al., 2020]. For efficiency, DL frameworks have traditionally embraced a deferred execution-style supporting graph-based (DNN) computation. Scalable, but development is . . . Error-prone. Cumbersome. Produces programs that are difficult to debug. Because graph computation executes statements in a non-imperative order, traditional SE tools cannot help troubleshoot bugs [Arpteg et al., 2018]. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 2 / 12
  • 3. TensorFlow Deferred Execution-style Code 1 # Build a graph. 2 a = tf.constant(5.0) 3 b = tf.constant(6.0) 4 c = a * b 5 6 # Launch graph in a session. 7 sess = tf.Session() 8 9 # Evaluate the tensor `c`. 10 print(sess.run(c)) # prints 30.0 Lines 2–4 build a computation graph. Line 4 does not execute until the Session is run on line 10. No native support common imperative program constructs, e.g., iteration.
  • 4. TensorFlow Deferred Execution-style Code 1 # Build a graph. 2 a = tf.constant(5.0) 3 b = tf.constant(6.0) 4 c = a * b 5 6 # Launch graph in a session. 7 sess = tf.Session() 8 9 # Evaluate the tensor `c`. 10 print(sess.run(c)) # prints 30.0 Lines 2–4 build a computation graph. Line 4 does not execute until the Session is run on line 10. No native support common imperative program constructs, e.g., iteration.
  • 5. TensorFlow Deferred Execution-style Code 1 # Build a graph. 2 a = tf.constant(5.0) 3 b = tf.constant(6.0) 4 c = a * b 5 6 # Launch graph in a session. 7 sess = tf.Session() 8 9 # Evaluate the tensor `c`. 10 print(sess.run(c)) # prints 30.0 Lines 2–4 build a computation graph. Line 4 does not execute until the Session is run on line 10. No native support common imperative program constructs, e.g., iteration.
  • 6. Introduction Motivation Optimization Approach Conclusion Imperative DL Programming, Eager Execution, & Hybridization Imperative DL frameworks (e.g., TensorFlow Eager, Keras, PyTorch) encouraging eager execution are more natural, less error-prone, and easier to debug. Sacrifices run-time performance. Thus, hybrid approaches (e.g., Hybridize, TorchScript, AutoGraph) have surfaced that: Execute imperative DL programs as static graphs at run-time. Are integrated into mainstream DL frameworks (e.g., TensorFlow, MXNet, PyTorch). Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 4 / 12
  • 7. Eager TensorFlow Imperative (OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x
  • 8. Hybridized TensorFlow Imperative (OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 @tf.function(...) # Executes model as graph (optional args). 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x On line 10, AutoGraph used to potentially enhance performance. Decorates model’s call() method with @tf.function, possibly providing optional yet influential decorator arguments. At run-time, call()’s execution will be “traced” (∼9.22 speedup).
  • 9. Hybridized TensorFlow Imperative (OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 @tf.function(...) # Executes model as graph (optional args). 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x On line 10, AutoGraph used to potentially enhance performance. Decorates model’s call() method with @tf.function, possibly providing optional yet influential decorator arguments. At run-time, call()’s execution will be “traced” (∼9.22 speedup).
  • 10. Hybridized TensorFlow Imperative (OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 @tf.function(...) # Executes model as graph (optional args). 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x On line 10, AutoGraph used to potentially enhance performance. Decorates model’s call() method with @tf.function, possibly providing optional yet influential decorator arguments. At run-time, call()’s execution will be “traced” (∼9.22 speedup).
  • 11. Introduction Motivation Optimization Approach Conclusion Drawbacks Hybridization Drawbacks Needs non-trivial, specialized metadata [Jeong et al., 2019]. Exhibit limitations and known issues with native program constructs. Subtle considerations required to: Specify (decorate) the functions to be migrated. Make code amenable to safe, accurate, and efficient graph execution. Avoid performance bottlenecks and semantically inequivalent results [Cao et al., 2021, Castro Vélez et al., 2022]. Manual analysis and refactoring (semantics-preserving, source-to-source transformation) for optimal results can be error- and omission-prone [Dig et al., 2009]. Further complicated by: Increasing Object-Orientation (OO) in DL model code [Chollet, 2020]. Dynamically-typed languages (e.g., Python). Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 7 / 12
  • 12. Introduction Motivation Optimization Approach Conclusion Problem Insight Goals Approach Key Insight Although imperative DL code is sequentially executed, hybridizing code resembles parallelizing sequential code. Example To void unexpected behavior, like concurrent programs, hybrid functions should avoid side-effects. Idea Adapt concepts from automated refactorings that parallelize sequential code, e.g., Streaming APIs [Khatchadourian et al., 2019]. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 8 / 12
  • 13. Work In Progress Two new, fully-automated refactorings are in-progress: Convert Eager Function to Hybrid Transforms otherwise eagerly-executed imperative (Python) DL code for enhanced run-time performance. Automatically specifies (decorates) whether and how code could be reliably and efficiently executed as graphs at run-time. Avoids hybridizing code under certain conditions (e.g., side-effecting code) to preserve semantics. Optimize Hybrid Function Transforms code already running as graphs for optimal run-time performance. Modifies existing decorator parameters (e.g., tensor shape specs). Potentially restructures code to be more amenable to graph transformation. Possibly dehybridize code when eager execution could be faster (e.g., graph “retracing”).
  • 14. Approach Highlights Novel tensor analysis for imperative DL code. Current analyzers work on only procedural (TF 1) code. Modernization of WALA Ariadne [Dolby et al., 2018] for imperative (TF 2) code underway. Implemented as a PyDev Eclipse IDE plug-in [Zadrozny, 2023]. Integrates Ariadne for tensor type inference and shape (static) analysis.
  • 15. Introduction Motivation Optimization Approach Conclusion Problem Insight Goals Approach Approach Challenges Lack of static type information. Needed to determine candidate functions (at least one Tensor parameter). Unlike, e.g., Java, Python has no restrictions on decorator (annotation) arguments. tf.function may be called as a function instead of a decorator. Example hyb_call = tf.function(call) hyb_call() Determine tensor shapes. Existing analyses only for procedural (TF 1) code. Working towards statically resolving imperative (TF 2) code. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 11 / 12
  • 16. Introduction Motivation Optimization Approach Conclusion Conclusion Imperative Deep Learning code is easier to debug, write, and maintain than traditional DL code that runs in a deferred execution. However, it comes at the expense of (run-time) performance. Hybrid approaches bridge the gap between eager and graph execution. Using hybrid techniques to achieve optimal performance and semantics preservation is non-trivial. Future Work Automated client-side analyses and transformations to use hybridization APIs correctly and optimally is in-progress. Evaluation using dataset from our MSR ’22 [Castro Vélez et al., 2022] empirical study. More details in paper! http://bit.ly/tf2-ase23 Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
  • 17. Introduction Motivation Optimization Approach Conclusion For Further Reading I Abadi, Martı́n et al. (2016). “TensorFlow: A System for Large-Scale Machine Learning”. In: Symposium on Operating Systems Design and Implementation. Agrawal, Akshay et al. (2019). TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning. arXiv: 1903.01855 [cs.PL]. Apache (Apr. 8, 2021). Hybridize. Apache MXNet documentation. url: https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/packages/gluon/blocks/hybridize.html (visited on 04/08/2021). Arpteg, A., B. Brinne, L. Crnkovic-Friis, and J. Bosch (2018). “Software Engineering Challenges of Deep Learning”. In: Euromicro Conference on Software Engineering and Advanced Applications. IEEE, pp. 50–59. doi: 10.1109/SEAA.2018.00018. Cao, Junming, Bihuan Chen, Chao Sun, Longjie Hu, and Xin Peng (Dec. 3, 2021). Characterizing Performance Bugs in Deep Learning Systems. arXiv: 2112.01771 [cs.SE]. Castro Vélez, Tatiana, Raffi Khatchadourian, Mehdi Bagherzadeh, and Anita Raja (May 2022). “Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study”. In: International Conference on Mining Software Repositories. MSR ’22. ACM/IEEE. ACM. doi: 10.1145/3524842.3528455. arXiv: 2201.09953 [cs.SE]. Chen, Tianqi, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang (2015). “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”. In: Workshop on Machine Learning Systems at NIPS. arXiv: 1512.01274 [cs.DC]. Chollet, François (2020). Deep Learning with Python. 2nd ed. Manning. Dig, Danny, John Marrero, and Michael D. Ernst (2009). “Refactoring sequential Java code for concurrency via concurrent libraries”. In: International Conference on Software Engineering. IEEE, pp. 397–407. doi: 10.1109/ICSE.2009.5070539. Dilhara, Malinda, Ameya Ketkar, Nikhith Sannidhi, and Danny Dig (2022). “Discovering Repetitive Code Changes in Python ML Systems”. In: International Conference on Software Engineering. ICSE ’22. To appear. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
  • 18. Introduction Motivation Optimization Approach Conclusion For Further Reading II Dolby, Julian, Avraham Shinnar, Allison Allain, and Jenna Reinen (2018). “Ariadne: Analysis for Machine Learning Programs”. In: International Workshop on Machine Learning and Programming Languages. MAPL 2018. ACM SIGPLAN. Philadelphia, PA, USA: Association for Computing Machinery, pp. 1–10. isbn: 9781450358347. doi: 10.1145/3211346.3211349. Facebook Inc. (2019). PyTorch Documentation. TorchScript. en. url: https://pytorch.org/docs/stable/jit.html (visited on 02/19/2021). Jeong, Eunji, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Taebum Kim, and Byung-Gon Chun (July 2019). “Speculative Symbolic Graph Execution of Imperative Deep Learning Programs”. In: SIGOPS Oper. Syst. Rev. 53.1, pp. 26–33. issn: 0163-5980. doi: 10.1145/3352020.3352025. Khatchadourian, Raffi, Yiming Tang, Mehdi Bagherzadeh, and Syed Ahmed (2019). “Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams”. In: International Conference on Software Engineering. ICSE ’19. IEEE Press, pp. 619–630. doi: 10.1109/ICSE.2019.00072. Kim, Miryung, Thomas Zimmermann, and Nachiappan Nagappan (Nov. 2012). “A Field Study of Refactoring Challenges and Benefits”. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. FSE ’12. Cary, North Carolina: ACM. isbn: 9781450316149. doi: 10.1145/2393596.2393655. Moldovan, Dan, James M. Decker, Fei Wang, Andrew A. Johnson, Brian K. Lee, Zachary Nado, D. Sculley, Tiark Rompf, and Alexander B. Wiltschko (2019). AutoGraph: Imperative-style Coding with Graph-based Performance. arXiv: 1810.08061 [cs.PL]. Negara, Stas, Nicholas Chen, Mohsen Vakilian, Ralph E. Johnson, and Danny Dig (2013). “A Comparative Study of Manual and Automated Refactorings”. In: European Conference on Object-Oriented Programming. Ed. by Giuseppe Castagna. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 552–576. isbn: 978-3-642-39038-8. OpenAI, Inc. (Aug. 18, 2023). ChatGPT. url: https://chat.openai.com (visited on 08/18/2023). Paszke, Adam et al. (Dec. 3, 2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv: 1912.01703 [cs.LG]. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
  • 19. Introduction Motivation Optimization Approach Conclusion For Further Reading III Zadrozny, Fabio (Apr. 15, 2023). PyDev. url: https://www.pydev.org (visited on 05/31/2023). Zhou, Weijie, Yue Zhao, Guoqiang Zhang, and Xipeng Shen (2020). “HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs”. In: International Conference on Software Engineering, pp. 506–517. doi: 10.1145/3377811.3380434. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 12 / 12
  • 20. Appendix Static Analysis Refactoring LLMs Notebooks Appendix Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 1 / 6
  • 21. Appendix Static Analysis Refactoring LLMs Notebooks Why Static Analysis? Refactorings must operate on (at least some) static information. Must eventually transform the source code. May eventually integrate hybrid analyses to resolve difficult static cases. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 2 / 6
  • 22. Appendix Static Analysis Refactoring LLMs Notebooks Why Automated Refactoring? In general, such problems may also be handled by compilers or runtimes; however, refactoring has several benefits: Gives developers more control over where the optimizations take place and making graph execution explicit. Can be issued multiple times, e.g., prior to major releases. Unlike static checkers, they transform source code, a task that can be otherwise error-prone and involve subtle nuances. Refactorings can act like recommendation systems, which is important for analyzing and transforming programs written in dynamic languages where static assumptions may be easily violated! Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 3 / 6
  • 23. Appendix Static Analysis Refactoring LLMs Notebooks Refactoring Developer Adoption Developers generally underuse automated refactorings [Kim et al., 2012, Negara et al., 2013]. Data scientists and engineers may be more open to using automated (refactoring) tools. Our approach will be fully automated with minimal barrier to entry. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 4 / 6
  • 24. Appendix Static Analysis Refactoring LLMs Notebooks LLMs & Big Data Refactoring LLMs [OpenAI, Inc., 2023] can also perform refactorings. Other Big Data-driven refactorings [Dilhara et al., 2022] are exciting and promising. Obtaining a (correct) dataset large enough to automatically extract the proposed refactorings is challenging as developers struggle with (manually) migrating DL code to graph execution [Castro Vélez et al., 2022]. LLM inference capabilities are currently limited. LLMs have a token limitation. Hybridization requires interprocedural analysis. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 5 / 6
  • 25. Appendix Static Analysis Refactoring LLMs Notebooks Notebook Support We plan to investigate notebook support in the future. We envision the approach to be used on (larger) DL systems, consisting of multiple files. Khatchadourian, Vélez, Bagherzadeh, Jia, Raja Towards Refactoring of Imperative DL Programs to Graphs 6 / 6