Google Cloud Platform (GCP) is one of the leaders among cloud APIs. It has gained notable expansion due to its suite of public cloud services that it based on a huge, solid infrastructure. GCP allows developers to use these services by accessing GCP RESTful API that is described through HTML pages on its website. However, the documentation of GCP API is written in natural language (English prose) and therefore shows several drawbacks, such as Informal Heterogeneous Documentation, Imprecise Types, Implicit Attribute Metadata, Hidden Links, Redundancy and Lack of Visual Support. To avoid confusion and misunderstandings, the cloud developers obviously need a precise specification of the knowledge and activities in GCP. Therefore, this paper introduces GCP MODEL, an inferred formal model-driven specification of GCP which describes without ambiguity the resources offered by GCP.
DSPy a system for AI to Write Prompts and Do Fine Tuning
Automated Reverse-Engineering of a Cloud API
1. Automated Reverse-Engineering
of a Cloud API
Stéphanie Challita, PhD
Associate Professor @University of Rennes 1, ESIR & IRISA/Inria, DiverSE team
30/09/2020 Stéphanie Challita @ENS seminar
2. Brief bio
▪ Degree of Systems and Networks Engineering, 2010 - 2015
▪ Research Master’s degree in Computer Science, 2014 - 2015
▪ PhD in Computer Science, 2015 - 2018
▪ Spirals team (Lille, CRIStAL, Inria)
▪ Postdoctoral Researcher, 2019 - 2020
▪ Kairos team (Sophia Antipolis, I3S, Inria)
▪ Associate Professor, since September 2020
▪ DiverSE team (Rennes, IRISA, Inria)
30/09/2020 Stéphanie Challita @ENS seminar 2/41
3. Research project
Towards the automatic construction of reliable and co-evolving software systems
30/09/2020 Stéphanie Challita @ENS seminar 3/41
Infer
Verify
Models for Cloud, IoT…
Tests
Co-evolution
…
Scientific challenges
o Inference of Domain-Specific Modeling Languages (DSML) from APIs
• Precision & Genericity
• Learning
o Verification
• Generation of instances of inferred DSMLs
• Generation of oracles
o Co-evolution of APIs and languages
• Identify the impact of API changes on DSML
• Co-evolve the impacted components
4. Research team
30/09/2020 Stéphanie Challita @ENS seminar 4/41
▪ Modeling & Language
Engineering
▪ Advanced testing
▪ DevOps for distributed
and heterogeneous
systems
▪ Variability Engineeringhttps://www.diverse-team.fr/
5. Cloud computing
30/09/2020 Stéphanie Challita @ENS seminar
Created by Sam Johnston, downloaded from https://en.wikipedia.org/wiki/Cloud_computing
5/41
13. Research question
RQ: Is it possible to automatically extract precise models from cloud APIs and to
synchronize them with the cloud evolution?
- How to provide an accurate description for a cloud API?
- How to correct the existing drawbacks in a cloud API documentation?
- How to analyze a cloud API documentation?
30/09/2020 Stéphanie Challita @ENS seminar
Research topics: API mining, reverse-engineering, NLP
13/41
15. Cloud API Documentation
30/09/2020 Stéphanie Challita @ENS seminar
Cloud
developer/architect
Cloud
provider
An agreement with the developer on exactly how the system will operate
conformsto
Cloud
documentation
Cloud documentations are written in natural language
→ human errors and/or semantic confusions
15/41
16. Vision
▪Inferring models from cloud APIs
▪Work of API mining, reverse-engineering
▪ HTML Model
▪Model refinement (NLP techniques, graphical output…)
30/09/2020 Stéphanie Challita @ENS seminar
Text
Analysis
Engine
Generic
parser
Model
Generator
Model
Validator
Cloud API
Cloud
DSML
16/41
17. Google Cloud Platform (GCP) use case
30/09/2020 Stéphanie Challita @ENS seminar
Is partner withIs adopted by
17/41
18. List of GCP documentation drawbacks
▪Informal heterogeneous documentation
▪Imprecise types
▪Implicit attribute metadata
▪Hidden links
▪Redundancy
▪Lack of visual support
30/09/2020 Stéphanie Challita @ENS seminar 18/41
20. Implicit attribute metadata
30/09/2020 Stéphanie Challita @ENS seminar
Available at
https://cloud.google.com/compute/docs/reference/latest/networks
Available at
https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets
20/41
21. GCP snapshot
30/09/2020 Stéphanie Challita @ENS seminar
▪ GCP engineers could
update/correct GCP
documentation
▪ Continuously following up with
GCP documentation is costly
▪ Snapshot of GCP API
A
Snapshot
GCP
HTML pages
GCP
documentation
21/41
22. GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar
GCP
Crawler
A B
Snapshot
GCP
HTML pages
GCP
documentation
▪ GCP Crawler to extract all GCP resources, their
attributes and actions
▪ GCP Model for a better description of the GCP
resources
GCP
Model
C
22/41
32. GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar
GCP
Crawler
A B
Snapshot
GCP
HTML pages
GCP
documentation
▪ GCP Crawler to extract all GCP resources, their
attributes and actions
▪ GCP Model for a better description of the GCP
resources
GCP
Model
C
OCCIware
Metamodel
GCP
configuration
conforms to
represented by
Ecore
Metamodel
conforms to
M0
M1
M2
M3
GCP
model
GCP
doc
conforms to
32/41
33. GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar
GCP
Crawler
A B
Snapshot
GCP
HTML pages
GCP
documentation
GCP
Model
C
Implicit Attribute
Metadata Detection
Link Identification
Redundancy Removal
Model
Transformations
Type Refinement
Model Visualization
33/41
34. Implicit attribute metadata detection
▪ To explicitly store information into additional attributes defined in the ATTRIBUTE
concept of our GCP MODEL
▪ We use Natural Language Processing (NLP) techniques
▪ Word Tagging/Part-of- Speech (PoS)
▪ We declare pre-defined tags for some GCP specific attribute properties:
▪ mu tab le = tru e if [In p u t -O n ly ]
▪ mu tab le = false if [O u tp u t -on ly ]/ read on ly
▪ req u ired = tru e if [Req u ired ]
▪ req u ired = false if [O ption al]
▪ default = X if The default value is X
30/09/2020 Stéphanie Challita @ENS seminar 34/41
39. Perspectives
▪ More generic crawler to support any API
▪ Do not conform to OCCI anymore, other than HTML pages
▪ Implement GCP studio, a model-based framework that relies on this approach
to design and deploy GCP applications
▪ Automated approach that would automatically handle the evolution of GCP
▪ Incrementally detect streaming modifications, by calculating and modifying only the
differences between the initially processed version and the newly modified one
30/09/2020 Stéphanie Challita @ENS seminar 39/41
40. Research internship for M2 students
30/09/2020 Stéphanie Challita @ENS seminar
Research
needs you!
40/41
41. Thank you!
Stéphanie Challita, Faiez Zalila, Christophe Gourdin, Philippe Merle. “A Precise Model for Google Cloud Platform" .
IEEE International Conference on Cloud Engineering (IC2E). 2018.
https://github.com/occiware/GCP-Model
stephanie.challita@inria.fr
https://stephaniechallita.github.io/
30/09/2020 Stéphanie Challita @ENS seminar