A webinar on the Generation and Transformation of Virtualized Assets (GeToVA) Specific Enabler (SE) developed in FITMAN project. This SE is aiming to support Virtual Factories (VF) in semi-automatic generation and clustering of Virtualized intangible Assets (VAaaS) from real-world semi-structured enterprise and network resources. GeToVa enables as well multi-format ontology transformation between various representations of Virtualized in /tangible Assets. Presented by Ioan Toma from STI Innsbruck
Fitman webinar 2015 09-21 Generation and Transformation of Virtualized Assets (GeToVA)
1. GeToVAGeneration and Transformation of Virtualized Assets
Ioan Toma, Benjamin Hiltpolt
STI Innsbruck, University of Innsbruck
Ioan.toma@sti2.at
benjamin.hiltpolt@sti2.at
FITMAN GeToVA - UIBK
1
2. Introduction
• GeToVA is aiming to support Virtual Factories (VF) in semi-
automatic generation and clustering of Virtualized intangible Assets
(VAaaS) from real-world semi-structured enterprise and network
resources.
• GeToVa enables multi-format ontology transformation between
various representations of Virtualized in-/tangible Assets.
• The GeToVa specific enabler allows:
– extraction, creation, transformation, searching and
clustering of asset data to reduces the manual effort
FITMAN GeToVA - UIBK 2
9/22/2015
8. Transformation (CV)
• CVs (in any supported format) are added to the DB (using a
RESTful API) and receive an ID
• Those IDs are globally used by the different components to identify
a certain CV
• If a requested format for a CV is not existing, it is created using
either the EuropassFormatHandler or the Converter.
9/22/2015 FITMAN ASSET-KIT - UIBK 8
10. Europass Format:
• Objectives:
–help citizens communicate their skills and qualifications effectively when
looking for a job or training
–help employers understand the skills and qualifications of the workforce
–help education and training authorities define and communicate the content
of curricula.
• Consists of:
–Curriculum vitae
–Language Passport
–Europass Mobility
–Certificate supplement
–Diploma supplement
• REST-API to convert between formats and languages (XML, ODT, PDF, DOC,
JSON)http://interop.europass.cedefop.europa.eu/web-services/rest-api-
reference/
9/22/2015 FITMAN ASSET-KIT - UIBK 10
11. Europass Format Handler
9/22/2015 FITMAN ASSET-KIT - UIBK 11
• Transforms between different Europass formats using the Europass
Webservices (to be implemented)
• Transforms from the Europass JSON representation to JSON-LD to directly
convert it to RDF (called Base RDF)
• Checks if JSON Files are valid Europass JSON files
13. Europass Format Handler
_:b0 <http://fitman.sti2.at/base/SkillsPassport> _:b1 .
_:b1 <http://fitman.sti2.at/base/LearnerInfo> _:b2 .
_:b2 <http://fitman.sti2.at/base/Identification> _:b26 .
_:b26 <http://fitman.sti2.at/base/PersonName> _:b41 .
_:b41 <http://fitman.sti2.at/base/FirstName> "Betty" .
_:b41 <http://fitman.sti2.at/base/Surname> "Smith" .
The JSON-LD can then be transformed into Base RDF and
be stored inside the database
9/22/2015 FITMAN ASSET-KIT - UIBK 13
14. Converter
• The RDF created by the FormatHandler is not really structured
• To enable a meaningful reasoning over CVs ontology knowledge is
added
• The converter is able transform between different ontologies using
SPARQL Constructs
• At the current state the RDF created by the FormatHandler can be
transformed to the Resume Ontology
9/22/2015 FITMAN ASSET-KIT - UIBK 14
15. Resume Ontology
• developed to express information contained in a personal Resume
or Curriculum Vitae (CV) on the Semantic Web. This includes
information about work and academic experience, skills, etc.
(http://rdfs.org/resume-rdf/)
• Suitable for our needs as there are several similarities to Europass
• Drawback: It is not fully compatible with Europass (therefore some
data can not be transformed)
9/22/2015 FITMAN ASSET-KIT - UIBK 15
16. SPARQL Construct
• As shown it is straightforward to transform JSON to JSON-LD to
RDF
• An easy way to work with RDF is to use SPARQL
• SPARQL Constructs offer a neat way to transform between different
RDF representations
• (Same thing could be done with XSLT working with Europass XML)
9/22/2015 FITMAN ASSET-KIT - UIBK 16
20. Apache Mahout is used to cluster Companies based on Wordterm frequency
Results are visualized and available as JSON
FITMAN ASSET-KIT - UIBK 20
9/22/2015
22. REST-API
• The GeToVa SE is indented to be used via its REST-API.
• E.g. For our TANET Use-case the extracted Tenders, Linkedin profiles are
fetched via our REST-API
• Most of the functionality GeToVa provides is accessible via its rich REST-
API
• The Webfront-End GeToVa provides is using the REST-API as well
9/22/2015 22
23. REST-API
• GET POST PATCH PUT DELETE /tanet_linkedins
– Manages extracted LinkedIn profile resources
• GET /scrape_linkedin
– Scrapes a Linkedin profile requires to send JSON with the following structure: {'url' =>
"https://at.linkedin.com/in/ioantoma"}
• GET POST PATCH PUT DELETE /tenders
– Manages extracted Tenders from sell2wales resources
• GET POST PATCH PUT DELETE /complus
– Manages extracted companies from led-info resources
• POST /fetch_led_company
– Fetches data from downloaded HTML from led-info
• GET POST PATCH PUT DELETE /tanets
– Manages data required for the Tanet usecase
9/22/2015 23
24. REST-API
• GET /companies/run_clustering
– Starts the clustering and gives feedback of whether the clustering is finished. Once the clustering is finished,
detailed results of the clustering are returned.
• GET /companies/clustering_visual
– Returns a compact version of the cluster results. This service is used by the visualization as well
• GET /companies/search/:search
– Run a search for a keyword specified in :search. It returns a list of companies matching the criteria
• GET POST PATCH PUT DELETE /companies
– Manages the companies resources
• GET POST PATCH PUT DELETE /individuals
– Manages all the CV resources
• GET POST PATCH PUT DELETE /representations
– Manages the CV representations resources (e.g. concrete formats of a certain CV. Like the CV represented
as JSON)
• GET POST PATCH PUT DELETE /individual_formats
– Manages all the formats supported by the platform for transformation of people profiles
9/22/2015 24
25. Technical Details
• Ruby on Rails (ruby 2.1.1p76, Rails 4.1.1)
• Using https://github.com/ruby-rdf to handle Semantics
• SQLite 3 Database
• Clustering: https://mahout.apache.org/
• Visualization: http://d3js.org/
• Knowledge extraction: https://gate.ac.uk/
• Searching: ElasticSearch Server https://www.elastic.co/
• Docker / Docker compose https://www.docker.com/
• Apache with Passenger Server on production
9/22/2015 25