Van Oord, a 150 year old family owned business, build windmill parks in the sea, lay cables on sea surface, dredging, as well as infrastructure (Dike, etc) operates world-wide, often facilitating self-owned specialized vessels. A well-known prestigious project is the creation of the palm island at the coast of Dubai.
Data Management in Van Oord is still in its infancy. The current operation is based on bilateral data exchange, without an Enterprise Service Bus or mayor data warehouse infrastructure. In 2020 Van Oord started a PoC with Confluent Kafka, executing a wide range of uses cases and requirements, followed by the formal program implementing a sustainable data platform.
Data owners are publishing an information product, i.e. a set of Kafka topics to communicate change (a la CDC) and topics for sharing state of a data source (Kafka tables). The information product owner is responsible for granting access, assuring data quality, data linage and governance. The set of all information products forms the enterprise data model.
This talk outlines why Van Oord requires data governance and enterprise architecture models integrated with Confluent Kafka, and demo how an open-source based data governance tool is integrated with Confluent Kafka to fulfil these requirements.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
What does an event mean? Manage the meaning of your data! | Andreas Wombacher, Aurelius Enterprise B.V and Van Oord, Marlon Hiralal
1. (Leeg, niet verwijderen s.v.p.)
What Does an Event mean?
Manage the Meaning of Your Data
KAFKA SUMMIT AMERICAS
15 September 2021
2. (Leeg, niet verwijderen s.v.p.)
Introduction Video:
Look around - Van Oord has changed the world around you
(
L
e
e
g
,
n
i
e
t
v
e
r
w
i
j
d
e
r
e
n
s
.
v
.
p
.
)
2
4. (Leeg, niet verwijderen s.v.p.)
Marlon Hiralal
Enterprise Architect
20+ years of experience
Role in the Project: Overall Architect
• Extensive experience with (real-time) data management
• Led several Industrial Digital Transformation initiatives, such as
Smart Factory/End2End Digital Supply Chain, IT-OT
Architecture and Industrial Internet of Things/Connected
Machines Things
• Worked on several streaming platforms as product manager
and system/enterprise architect
5. (Leeg, niet verwijderen s.v.p.)
Andreas Wombacher
CTO and Co-Founder at Aurelius Enterprise B.V.
20+ years of experience
Role in the Project: Data Architect
• Expertise in workflow and data management, ranging
from data integration, sensor data fusion, data mining,
event-based systems, and data analysis
• Worked with data on different scales and abstraction
levels from time series sensor data to information system
or human event data
7. (Leeg, niet verwijderen s.v.p.)
We have over 250 applications of which:
8 core business IT applications
+ Critical OT (Operational Technology) applications
• No Data Integration
• No Application Integration
• No Process Integration
• No OT/IT Integration
IT Applications
- HR
- Finance
- Fleet Logistics
- Procurement
- …
OT Applications
- Vessel Data Logger
- Bathymetric Survey
- GIS
- AIS
- …
8. (Leeg, niet verwijderen s.v.p.)
Insights on Operation, Production and Cost
are needed daily!
The challenges are:
• Ownership of data
• Quality
• Availability
• Combining batch and
streaming data
• Different data formats
10. (Leeg, niet verwijderen s.v.p.)
Process
Integration
Application
Integration
Data
Integration
Data Governance
Data
In
Motion
Data Management Platform – Goals
11. (Leeg, niet verwijderen s.v.p.)
Functional Solution
Stream
Processing
Data Storage &
Search
Analysis &
Visualization
Data
Governance Tool
Enterprise
Architecture
Platform
Data Producer
Data Producer
Data
Producer
Data Producer
Data Producer
Data
Consumer
13. (Leeg, niet verwijderen s.v.p.)
Digital Enterprise Architecture
(Models4Insight)
§ Architecture model is like a spider web
• All concepts are related with each other
• You are interested in one concept and all its
dependencies
§ Architecture model is built partly by:
• Team of architects in a collaboration
• Scripts in an automated way to reduce
maintenance effort and increase speed
14. (Leeg, niet verwijderen s.v.p.)
Model extraction reduces the maintenance
effort
Conceptual data
definitions and
ownership
Deployment of actual
infrastructure
Data governance
IaC
Micro service
Data architecture
Sync
15. (Leeg, niet verwijderen s.v.p.)
Change is constant
Consumer
Producer
Proof of Concept Enterprise Solution
- Complexity
- Number of changes
• Which data is available on the platform?
• Where did this data come from?
• What shape does this data have?
• What does this data mean?
16. (Leeg, niet verwijderen s.v.p.)
Digital Data Governance (a.k.a. DG 4.0) for
data in motion
Characteristics:
• Data as strategic asset
• Data in Rest vs Data in Motion
• Empower data users
• Trustable data & analytics
• Lineage
• Proactive
DG
1.0
No Meta data
Ownership
IT Driven:
Meta data Mgmt for IT
DG
3.0
DG
4.0
Process Driven:
Meta data dictionaries
for data stewards
Value Driven:
Contextualized meta data
for the diverse data users
Nineties Early 2000s
DG
2.0
2020 Today
Characteristics:
• Process focused
• Manual
• Data in Rest
• Descriptive
Characteristics:
• IT focused
• Manual
• Data in Rest
• Corrective
Characteristics:
• Manual
• Ad hoc
18. (Leeg, niet verwijderen s.v.p.)
§ A conceptual attributes is marked as PII and therefore all technical implementations of this
attribute are also PII relevant information
§ A classification of the attribute enables the propagation of the attribute to all related fields
Scenario 1 GDPR & PII – Data steward
18
Manually adding
conceptual information
(Excel file)
Publish governance and
classify PII attributes
(micro service)
PII classification is
propagated to technical
concepts (Apache Atlas)
19. (Leeg, niet verwijderen s.v.p.)
§ Data quality rules are specified for a technical field
§ Data quality is assessed automatically on a regular basis
§ A dataset is marked to have bad data quality
§ This classification is propagated along the data linage/data flow
§ Data quality results are summarized in a data quality dashboard
Scenario 2 Data Quality – Data steward
19
Data quality rules are
specified per technical
field (Apache Atlas)
Data quality is assessed
and documented
(micro service)
Bad quality classification
propagates along data
linage (Apache Atlas)
Bad quality results are
visualized in a dashboard
20. (Leeg, niet verwijderen s.v.p.)
§ Find information about values saying something about a "full-time equivalent"
§ It is found in the description of the FTE attribute, where the actual data can be found in
the FTE field
Scenario 3 Search – Data scientist
20
Search for a key phrase
(Apache Atlas)
Find the related
conceptual attribute
(Apache Atlas)
Find the related technical
fields
(Apache Atlas)
22. (Leeg, niet verwijderen s.v.p.)
Wrap up
• DMP captures meaning of data in motion
• Data Governance provides conceptual meaning
• Digital Enterprise Architecture provides contextual meaning
• HR: Van Oord is seen as an interesting employer for technical
talents within the Netherlands