Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas

Hire a machine to code
Michael Arthur Bucko, Aurélien Nicolas
1

We are learning the relation between human
communication and source code to take
communication to the next level.
2

Agenda
● What is Deckard? Our vision and products
● Software team’s and developer’s perspectives
● Problems and solutions in coding
● Understanding source code
● Our work
● Demo!
3

Our vision
Step 1: Machines joining regular software teams to help them
create better code faster
Step 2: First large-scale code transplants
Step 3: First machines writing their own code without humans
4

Our product
● Deckard is building a framework for making code-based
interactions between human and intelligent machines more relevant
○ We approach the problem from (at least) two angles:
■ Enriching human software developer’s and team’s context
■ Learning novel code representations to enable more
effective communication between machines and humans
5

Team
Engineer
Brain
Helping single developer
By enriching individual
Contexts and
communication
Helping single
developer
And enriching their
context
IDEs
Independent of IDE
Not only finding all information relevant on time, but
Also enabling a completely new interaction with
software
Ensemble-based decisions using novel
representations of problems, users and
source code data
Teams and developers
6

Problems in communication using code
7

Problems in communication using code
Connecting humans with code by
creating innovative code exploration
Enriching human-human interaction
(real-time)
Learning better code representations
Researching code transplantation
Code context Understanding developer’s
preferences
Code understanding Understanding current code
in real-time
Code navigation Understanding where to go
next, what to do
Knowledge sharing Sharing code intelligence
8

Software team’s perspective
● Small teams define and build products that people love
● Not only engineers in teams, even engineers have diverse skills sets
● Team members share knowledge using a variety of channels
● Engineers learn from many sources of data
9

Software developer’s perspective
● Developers are overwhelmed by data in their current contexts -- they need
assistants who do part of their job
● Developers lack the right data -- they should know better ways of solving
their problems to avoid tweaking and patching
● Assistants should be able to provide highly relevant data in real-time
10

Step 1
Step 1: Machines joining regular software teams to help them
create better code faster
Step 2: First large-scale code transplants
Step 3: First machines writing their own code without humans
12

Plan for Step 1
PROBLEM SOLUTION
Ineffective interaction between human
members of software teams
Profiling developers and making information more
relevant
Ineffective interaction between humans and
code
Requires understanding code better
Augmenting “working memory” (navigation)
Better knowledge sharing (dd protocol)
Relevant information on time
13

Plan for Step 1
PROBLEM SOLUTION
Coding faster Better real-time navigation (using summarization)
Sharing code knowledge more effectively (dd
protocol)
Making code more re-usable (transplantation)
Understanding software development better
(learning paths, code exploration modes, diversity
of technology and skills)
14

Ensemble
- Understanding source code requires is more than regular text
summarization
- Regular: sentence reduction, sentence combination, syntactic transformation,
paraphrasing, generalisation etc.[1]
- Source-code-related concepts: code folding, code execution flow, code re-usability, etc.
- Source code data: var names, method names, logic, comments, git commits, types, etc.
- NLG: Generating project metadata [2]
- We create an ensemble with source code-related features (novel
representation of code)
1. A Neural Attention Model for Abstractive Sentence Summarization,Alexander M. Rush, Sumit Chopra, Jason Weston
2. Automatic Documentation Generation via Source Code Summarization of Method Context,Paul W. McBurney and Collin McMillan 16

...
Data
- Who you are in the team,
- What you do,
- What you know about codebase,
- What is known about your problem in the web,
- Who might be able to help you.
...
Representations
Creating novel representations of source code:
- diverse programming languages with different
syntaxes
- we not only want to understand the current
code, but also create better programming
languages
Understanding source code
requires novel representations
17

Features- Introduction
- We experiment with SWUM (Software Word Usage Model) and NLG
- We model source code using call graphs
- We use both abstractive and extractive summarization used for
understanding source code
- Focus on abstractive methods -- we experiment with building source representation
- For user profiling: we have access to programmer’s interaction with code,
but also needs, settings, code styles, search results
18
1. Autofolding for Source Code Summarization, Jaroslav Fowke, Razvan Ranca , Miltiadis Allamanis , Mirella Lapata and Charles Sutton
2. Automatic Documentation Generation via Source Code Summarization of Method Context, Paul W. McBurney and Collin McMillan

Features- 1/6
- We use a tree-based TASSAL (using scoped topic model) for creating
some of the source code summarization features
- We use NAMAS (attention-based summarization) for creating some of the
code summarization features
- We test code execution tools like code2flow or pycallgraph for creating
code flow features
19

Features- 2/6
- We use NAMAS (attention-based summarization) for creating some of
the code summarization features
- We test code execution tools like code2flow or pycallgraph for creating
code flow features
20

Features- 3/6
- We use NAMAS (attention-based summarization) for creating some of the
code summarization features
- We experiment with code execution tools like code2flow or pycallgraph
for creating code flow features
21

Features- 4/6
- We use our proprietary file tree-based parser independent of language to
create:
- Call graph feature
- Code flow-related features
- Code meaning features
- Complexity-related features
- We use multi-class classification for learning about specific files
- We use RAKE (rapid automatic keyword extraction)
22

Features- 5/6
- We also use our proprietary file tree-based parser independent of language
to create:
23

Features- 6/6
- We also use our proprietary file tree-based parser independent of language
to create:
24

We are also researching novel approaches to
dealing with source code
25

Summarization leads to
transplantation
● Summarization is going to make everything clear, clarity is going to make
more code re-usable
○ Re-usability can lead to successful code transplantation attempts
● Making code transplantation easier is going to boost software development
○ We are researching how to transplant source code to increase the capabilities of virtual
assistants
26

Summarization needs navigation
● When we show new (and more relevant) data to developers, they will be
solving different problems (in different ways)
○ We need to give them new ways of traversing the code and sharing code information
● Current navigation you can see in the demo!
27

Understanding code requires
learning paths
● All problems have follow-up problems
○ Example: searching for more specific terms like “collision detection” often indicates that
you will be trying to create a computer game or simulation
● Deckard learns not only about the current code context, but also about the
bigger picture related to the problem
○ We come up with numerous metrics measuring source code’s performance from novel
perspectives
28

Understanding code requires
assistance
- Why is coding machines (currently) “difficult” for humans?
- Making machines do what we imagine is tough, because we speak different languages
- Things are started, but not finished, then no one can use them
- Lots of code and no one knows all of it, make code simpler, document it
- Many capabilities of programming languages are unknown, patching != solving
- There’s many problems in software engineering that machines can solve
- Machines are already among us, but now they will be more proactive and have more
serious responsibilities
29

Our work
- We want machines to work in software teams together with people, so we
create proactive assistants
- We also want to transform coders into supercoders, so we re-invent source
code navigation
- Finally, we want to make source code re-usable, so we work on
summarization and code transplantation tools
31

Suggestions:
unrelated
random ()
otherone
something
weeksago
duh
String
String
String
boolean
int
Your thing
</>
Thanks
Google search
Autocomplete
IDE code search
Search tickets/commits
Ask someone
Time consuming > provides pages
Limited > too little documentation
IDE search > keyword based, no relevancy
Messy search > few code references
Efficient...but high cost
32

Click on/highlight
any part of your code...
...and get contextual insights
dynamically & in real time
Click on any link and navigate
through the code in both
directions
Ask task-related questions &
get code recommendation
(from own or open code)
TEXT EDITOR / IDE DECKARD
34

API
- deckardSummarise: deckard summarises your source code.
- deckardClarity: deckard recognises typical reusable code vs unique logic.
- deckardGraph: deckard turns your source code into knowledge graph.
- We are working on our API!
36

DCODE for sharing source code
information
Team
IDEs
dcode:// code link
Sees code
in own IDE
37
Use cases:
- Chats
- Tickets
- In-code
hyperlinks
Code
Reads code
Shares code

github.com / deckardai
DCODE A URL scheme for sharing source
code information
CodeSearch A get-started tool for discovering
code using graph representations
PuppyParachute A semi-automated testing helper for
Python
YaP A modern shell language derived
from Python
38

Thank you!
Let’s revolutionise development
39

Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas

Similaire à Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas (20)

Plus de WithTheBest

Plus de WithTheBest (20)

Dernier

Dernier (20)

Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas