The "Babelfish" system is built with Scala and runs in the Java Virtual Machine. For graph persistence, a neo4j database with Lucene index is used. A generic importer module reads data from various data sources and persists them in a version-aware way, using the domain model as a schema. The schema is used by our domain specific language to statically verify queries. Query results can either be in the form of graphs or tables. For the latter, an additional step uses an in-memory SQL-Database for further processing of the results. Queries in the generated DSL can be submitted via a REST interface. The server uses json4s for serialization of the results. This interface as well as the deployable war-file is generated by the web framework Scalatra.
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Project "Babelfish" - A data warehouse to attack complexity
1. Project
"Babelfish"
A
data
warehouse
to
a5ack
complexity
Prof.
Dr.
Christoph
Denzler
&
Daniel
Kröni
{christoph.denzler,
daniel.kroeni}@Inw.ch
2. StarKng
PosiKon
• Finnova
is
a
soOware
house
developing
a
bankware
soluKon
for
universal
banks.
• About
300
employees,
200
of
them
in
development,
engineering,
applicaKon
management
and
customer
care
• Banking
System
– more
than
7
million
lines
of
code
– controlled
by
15'000
parameters
– around
2000
UI
screens
3. IncepKon
• SoOware
grew
over
past
15
years
– approx.
13
person
years
of
development
per
month
• Architectural
challenges
– new
business
models
– new
regulaKons
– internaKonal
customers
– bigger
customers
– new
technologies
→ How
to
keep
track
of
– architecture
– code
– tests
– customers
parametrizaKon
– bug
reports
– change
requests
– developers
output
?
5. Concrete
Problems
• The
business
logic
is
changed.
In
which
GUIs
will
this
be
visible?
• A
customer
reports
a
bug
on
screen
XY.
Which
parts
of
the
code
do
handle
this
screen
and
its
data?
Which
developer
is
resoponsible
for
this
code?
• Does
a
new
funcKon
break
architectural
guidelines?
E.g.
does
it
introduce
dependency
loops?
• Which
modules
of
the
soOware
do
not
have
to
be
taken
offline
during
a
system
upgrade?
• which
tests
need
to
be
rerun
aOer
a
change
in
code?
6. ExpectaKons
• Improve
quality
of
bankware
soluKon
by
– earlier
detecKon
of
architecture
violaKons
• Improve
issue
handling
– faster
locality
determinaKon
of
bugs
• Improve
tesKng
by
– tesKng
only
what
has
changed
• Improve
stability
by
– reliable
dependency
informaKon
during
deployment
and
producKon
10. Core
System
• Version
aware
API
• access
graph
as
of
a
specific
version
• Allows
to
query
what
changed
• when,
most
oOen,
together,
...
• Mapping
of
versioned
nodes
to
DB
nodes
Versioning"
Schema"
DSL"
name:
"Credit"
LOC:
832
name:
"Credit"
LOC:
832
from:
13
to:
_
LOC:
750
from:
1
to:
12
Logical
QuanKKes
#Nodes
2'046'128
#Edges
4'292'867
Storage
QuanKKes
#Nodes
~
6'300'000
#Edges
>
15'000'000
11. Core
System
• Domain
model
• Common
vocabulary
with
the
partner
• Index
• Query
language
Versioning"
Schema"
DSL"
Package
name:
String
LOC:
Long
Release
id:
Long
name:
String
Calls
Contains
12. Core
System
• Custom
Query
Language
• Schema
aware
• Version
aware
• Fast
graph
traversal
• Describing
the
structure
of
paths
as
with
a
formal
grammar
• CollecKng
properKes
on
the
way
• SQL
postprocessing
• Implemented
as
an
internal
Scala
DSL
• Easy
to
extend
Versioning"
Schema"
DSL"
13. Query
Language:
Basics
• Schema
aware
– Refer
to
nodes
/
edges
/
properKes
• Graph
navigaKon
primiKves
– V,
E,
inE,
outV,
outE,
inV
• Grammar
style
combinators
– ~,
|,
?,
*,
+
outE
inV
inE
outV
out
in
V(Package)
~
where(Package.Name)("Log")
~
in(_Calls_).+
18. Query
Language:
Extensions
• Labeling
– Name
values
for
later
processing
• ExtracKon
– Select
what
you
want
in
your
table
• SQL
Postprocessing
– SQL
is
nice
for
aggregaKon
from
{
V(Package)
~
in(_Calls_).+
~
get(Package.Name).as("n")
}
extract
{
"n"
}
sql
{
"SELECT
n
FROM
t1
ORDER
BY
n
DESC"
}