Growing together for a better world with Rabobank's DataWorks Summit

Growing a better world together
Rabobank Group
DataWorks Summit, Berlin, 2018

Introduction
1996
2007
JeroenWolffensperger
Solution Architect Data
jeroen.wolffensperger@rabobank.nl

Introduction
martijn.groen@rabobank.nl
1995
1996
2007
2010
2015
Martijn Groen Msc. PMP
Rabobank Netherlands HQ
Delivery manager Data Lake & Delivery
Distribution (Client Data)

5Source: https://www.iea.org/newsroom/

What is needed to create all these new business
models?
Lot’s of
Data
Formats
Speed

Data Architecture
Data product types & Development styles
Raw & Defined data Information Product
Ad-Hoc Data R&D / Analytics
Source: Damhof Quadrant
OperationalizeOperationalize
Process
Dataflow
Data scientist,
Analysts
End-uses
Systematic
Opportunistic
Push/Supply/Source driven Pull/Demand/Product driven
Develop-
ment
Style
Push-Pull
point

Data Architecture
Data product types & Development styles
Raw & Defined data Information Product
Ad-Hoc Data R&D / Analytics
Systematic
Opportunistic
Push/Supply/Source driven Pull/Demand/Product driven
Develop-
ment
Style
Data Lab
Data Factory
Data Lake

Business-value
Provisioning
Data Lab
Data Architecture
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)

Data Architecture building blocks
• Based on manufacturing
production process
• Each building block is
replaceable.
Data Logistics
Data Storage
Meta Data Storage
Data Refinery
Data
Provisioning
Transport
Compute
Catalog
Provide
Secure
Store
Resource management Monitor
import export

Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Kafka: https://www.datanami.com/2017/08/15/kafka-helped-rabobank-modernize-alerting-system/
Data Logistics
Big Data Management

Why did we choose for: HDF – NiFi?
January 2017
• We compared: NiFi, Informatica Intelligent Streaming, Streamsets Data Collector
• NiFi has an open architecture, making it easy to create your own connectors.
• NiFi has the most functionality and is easy to use
• NiFi has the biggest user base and a very active community.
• Works well in combination with Cloudera.
• No data lineage and support for template deployment yet, but are on the roadmap
(release 3.2).
• Informatica’s first release of Intelligent Streaming* was December 2016. Product was
not yet mature enough.
• Streamsets is 100% in memory, where NiFi writes to disk. In our opinion less mature
than NiFi.
* Renamed to Big Data Streaming since January 2018

Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
HDFS
Data Storage

Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Data Refinery
Big Data Management

Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Data Provisioning

Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Data Governance
Enterprise Data Catalog
Navigator
Big Data Management

Business case: Bedrijfskompas
(Company Compass)
• We deliver insight in your financial position and a benchmark about the
performance of other companies within your own branch or sector.
• We will do this via:
• An online dashboard with a graphical presentation of your liquidity.
• Displaying the performance of your company compared to aggregated
benchmark data of peers from the sector.
• We first implemented the liquidity dashboard and is currently made
accessible as stand alone visual via our internet banking environment.

Liquidity dashboard
Growth Hack Prototype Final F&F Release
Concept Growth Hack
Prototype
Initial F&F
Release
Final F&F
Release
Pilot Bank
Release
Full scale
release
Data Lab
Start Data
Lake
Connection
real-time
transactie data
Security
First API
endpoint live
Full API live
Performance
tuning
First API
specification
OpenshiftBig Data
Cluster
Start Front-End
team

Some figures
• Business case implemented in 8 months including initial set-up
infrastructure and security.
• HortonWorks Data Flow (3.0.2):
• Able to process 100.000 events per sec.; 0,6 GB per sec.
• Initial load: 25 billion payment transactions; 7 years of history loaded in 7
hours.
• NRT load: average of 15 million transactions per day
• Current average response time API call: < 100 ms
• Initial set-up costs are earned back via other business cases making use
of the infrastructure.

Key takeaways
• Fail fast: Experimental approach gives quick insights of possible fit within the
overall data architecture.
• Every technology component must be replaceable when choices made earlier
are proved to be not as good as expected and Hadoop technologies change fast.
• Hire (professional services) expertise for securing your cluster. Kerberos is a
headache but necessary.We thought we secured everything: NOT.
• Stay in control of the data provisioned via API’s.
• Data Governance is key to keep an overview of your Data Lake and also to
comply with all regulations like GDPR and BCBS239. A good Data Catalog is a
must.

2929
Thank you for your attention!

Growing together for a better world with Rabobank's DataWorks Summit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Growing together for a better world with Rabobank's DataWorks Summit

Similar to Growing together for a better world with Rabobank's DataWorks Summit (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Growing together for a better world with Rabobank's DataWorks Summit