Creating a Data Distribution Knowledge Base using Neo4j, UBS

Public
11th May 2017
Syed Haniff
Creating a Data Distribution
Knowledge Base using Neo4j
Using graph technologies to map and manage data flows within the Bank

1
 Reference data at UBS
 Building an integrated data distribution platform
 Creating a Knowledge Base using Neo4j
Overview

2
 Founded 1854
 Headquarters: Zurich, Switzerland
 Operates in 50+ countries
 Around 60,000 employees
 6 Businesses
– Wealth Management
– Wealth Management Americas
– Personal & Corporate Banking
– Asset Management
– Investment Bank
– Corporate Centre
About UBS

3
GDS manages the mastering and distribution of reference data to consumers
within the Bank.
About Group Data Services

4
 Externally and internally sourced non-transactional data:
Reference Data at UBS
Account Book Calendar Client
Confirms
Financial
Instrument
Legal Entity
Group
Dictionary
Prices Product
Trading
Agreement
Settlement
Instruction
Account Book Calendar Client
Legal Entity

5
12 Data Domains
18 Datasets
7 Distribution Channels
400+ Integrations
000s Attributes
Group Data Services in Numbers

6
Providing timely, accurate, and complete reference data to users, systems, and
processes through a number of channels.
Reference Data Distribution

7
 Masters send normalized, canonical
datasets.
 Consumers land and join datasets
themselves
 Good for producers (master data
sources) … Not so good for consumers
FeaturesOverview
Data Distribution – Previously

8
Example – Consumer joins
Consumers store multiple messages from multiple domains and resolve joins
themselves

9
Driver Situation Impact
Simplification Multiple components doing the same
/ similar tasks.
Cost+
Complexity+
Risk Reduction Consumers have to store and join
reference data
Data Staleness+
Potential for errors+
Efficiency Consumers have to receive updates
where they are not interested
Storage volumes+
Processing volumes+
Business Drivers for Change

10
 Single platform consuming
data from masters
 Platform integrates datasets
 Custom or normalized
datasets sent via
standardized channels
FeaturesOverview
Distribution Platform – Blueprint

11
Example – Platform joins
Data joined at source and available for multiple consumers – simplifies consumption

12
 Single Platform
 Pre-joined datasets
 Flexible subscription to attributes
 More consumer-oriented …
But there are still things we'd like to know …
Platform Benefits

13
What datasets and
attributes do we
provide?
Data Distribution – Questions

14
What datasets and
attributes do we
provide?
How are the
different datasets
related?

15
What datasets and
attributes do we
provide?
How are the
different datasets
related?
How are users
receiving our data?

16
What datasets and
attributes do we
provide?
How are the
different datasets
related?
How are users
receiving our data?
Which consumers
are using which
attributes?

17
What datasets and
attributes do we
provide?
How are the
different datasets
related?
How are users
receiving our data?
Which consumers
are using which
attributes?
Knowledge
Base

18
A system component that lets us describe the journey of the
datasets and attributes from master systems to consumers
What is the Knowledge Base?

19
Building the Knowledge Base – Example Model

20
 Initially, platform (not human) requirements
 XLS + custom DSL (Domain Specific Language)
 E.g. composite INSTRUMENT dataset
– BOND_BONDRATING, EQUITY_EQUITYRATING
,  union between two data sets
_ join between two datasets
  Innovative and allowed us to build platform
  Limited, Complex, Inflexible
Physical Model – 1.0

21
Can it answer our questions …?

22
 Challenging making a relational model that answers all the (diverse) questions
 Lots of different entities …
 Lots of different relationships …
 Not all data flows are the same …
 Tough to get performance needed with a generic relational model
… Not really or easily anyway

23
The "Eureka!" moment …
Looks like a graph …
maybe we should store
as a graph(!)

24
 Store the metamodel in a graph database
 Neo4j
– Used in the Bank
– Mature
– Comprehensive resources online
– Drivers / Adapters matching language choices
Physical Model – 2.0

26
Answers to the questions …
What datasets
and attributes
do we provide?
MATCH
(d:Dataset)-[:OWNS]->(a:PhysicalAttribute)
RETURN d, a;
CYPHER QUERY

27
How are the
different
datasets
related?
MATCH
(d1:Dataset)<-[:JOINS]-(j:JoinRelation),
(d2:Dataset)<-[:JOINS]-(j)
RETURN d1,j,d2;
CYPHER QUERY

28
How are users
receiving our
data
MATCH (c:Consumer)-
[:RECEIVES_VIA|:INTERESTED_IN]->(v)
RETURN c, v
CYPHER QUERY

29
Which
consumers are
using which
attributes?
MATCH (c:Consumer)-[:INTERESTED_IN]->(view:Dataset),
(view)-[:SELECTS]->(output:Dataset),
(output)<-[:TARGET_OF]-(aggregation:Transformer)-
[:SOURCE_OF]->(aggregate:Dataset),
(aggregate)-[:OWNS]->(parts:Dataset),
(parts)-[:OWNS]->(a:PhysicalAttribute)
RETURN c, view, output, aggregation, aggregate,
parts, a
CYPHER QUERY

30
 Single source of truth
 Governance and lineage easier
 New insights for consumers
 New insights for producers!
Knowledge Base – Benefits

31
 Coverage – not all datasets entered yet
 Lots of data – we store source, interim, target datasets
 Concept can be a bit intangible at times
Knowledge Base - Challenges

32
 Data Distribution is a natural "flow" from one processing node to another
 Ad-hoc relationship traversal difficult in relational databases
 Flexibility essential
– New sources, datasets, consumers, rules, …
 Everything is an instance
– Model very organic by focusing on relationship between processing nodes rather than structure
How did a graph database help?

33
 Answers our questions … and more
 Flexible schema  Can model different flows
 Easy(-ish) Query Language  Cypher
 Easy to create platform service layer
 Good performance
 Good support from vendor
Neo4j – Benefits

34
 Loading data required manual work
 No out-of-the-box tools to manage the data
 Skills rare … but easy to grow
Neo4j – Challenges

35
 Focus on human interactions
– Better search
– Better visualisation
 Widen coverage of datasets
 Offer to other parts of Bank
 Impact Analysis tools
 Self-service data integration
Next steps

Creating a Data Distribution Knowledge Base using Neo4j, UBS

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Creating a Data Distribution Knowledge Base using Neo4j, UBS

Similaire à Creating a Data Distribution Knowledge Base using Neo4j, UBS (20)

Plus de Neo4j

Plus de Neo4j (20)

Dernier

Dernier (20)

Creating a Data Distribution Knowledge Base using Neo4j, UBS