Rabobank is a worldwide food- and agri-bank from the Netherlands. Rabobank wants to make a substantial contribution to welfare and prosperity in the Netherlands and to feeding the world sustainably. Rabobank Group operates through Rabobank and its subsidiaries in 40 countries.
Rabobank is active in both retail and wholesale banking. For our wholesale clients we provide real-time business insight information by making use of Cloudera and Hortonworks technology. An example is our recently launched service that gives insight in market performance of Rabobank customers, starting with the dairy farmers market segment, by making use of benchmark information. Our current technology stack contains Hortonworks Data Flow (HDF) and Cloudera Hadoop (CDH). Our real-time data stream is implemented by making use of Kafka and Nifi from HDF. Cloudera is used to store the data needed for the business insight information, mainly in HDFS and HBase.
During our presentation we will provides insight about the project approach, the architecture and actual implementation.
Speaker
Jeroen Wolffensperger, Solution Architect Data, Rabobank
Martijn Groen, Delivery Manager Data , Rabobank Netherlands
13. What is needed to create all these new business
models?
Lot’s of
Data
Formats
Speed
14. Data Architecture
Data product types & Development styles
Raw & Defined data Information Product
Ad-Hoc Data R&D / Analytics
Source: Damhof Quadrant
OperationalizeOperationalize
Process
Dataflow
Data scientist,
Analysts
End-uses
Systematic
Opportunistic
Push/Supply/Source driven Pull/Demand/Product driven
Develop-
ment
Style
Push-Pull
point
15. Data Architecture
Data product types & Development styles
Raw & Defined data Information Product
Ad-Hoc Data R&D / Analytics
Systematic
Opportunistic
Push/Supply/Source driven Pull/Demand/Product driven
Develop-
ment
Style
Data Lab
Data Factory
Data Lake
16. Business-value
Provisioning
Data Lab
Data Architecture
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
17. Data Architecture building blocks
• Based on manufacturing
production process
• Each building block is
replaceable.
Data Logistics
Data Storage
Meta Data Storage
Data Refinery
Data
Provisioning
Transport
Compute
Catalog
Provide
Secure
Store
Resource management Monitor
import export
18. Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Kafka: https://www.datanami.com/2017/08/15/kafka-helped-rabobank-modernize-alerting-system/
Data Logistics
Big Data Management
19. Why did we choose for: HDF – NiFi?
January 2017
• We compared: NiFi, Informatica Intelligent Streaming, Streamsets Data Collector
• NiFi has an open architecture, making it easy to create your own connectors.
• NiFi has the most functionality and is easy to use
• NiFi has the biggest user base and a very active community.
• Works well in combination with Cloudera.
• No data lineage and support for template deployment yet, but are on the roadmap
(release 3.2).
• Informatica’s first release of Intelligent Streaming* was December 2016. Product was
not yet mature enough.
• Streamsets is 100% in memory, where NiFi writes to disk. In our opinion less mature
than NiFi.
* Renamed to Big Data Streaming since January 2018
20. Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
HDFS
Data Storage
21. Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Refinery
Big Data Management
22. Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Provisioning
23. Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Governance
Enterprise Data Catalog
Navigator
Big Data Management
24. Business case: Bedrijfskompas
(Company Compass)
• We deliver insight in your financial position and a benchmark about the
performance of other companies within your own branch or sector.
• We will do this via:
• An online dashboard with a graphical presentation of your liquidity.
• Displaying the performance of your company compared to aggregated
benchmark data of peers from the sector.
• We first implemented the liquidity dashboard and is currently made
accessible as stand alone visual via our internet banking environment.
25. Liquidity dashboard
Growth Hack Prototype Final F&F Release
Concept Growth Hack
Prototype
Initial F&F
Release
Final F&F
Release
Pilot Bank
Release
Full scale
release
Data Lab
Start Data
Lake
Connection
real-time
transactie data
Security
First API
endpoint live
Full API live
Performance
tuning
First API
specification
OpenshiftBig Data
Cluster
Start Front-End
team
27. Some figures
• Business case implemented in 8 months including initial set-up
infrastructure and security.
• HortonWorks Data Flow (3.0.2):
• Able to process 100.000 events per sec.; 0,6 GB per sec.
• Initial load: 25 billion payment transactions; 7 years of history loaded in 7
hours.
• NRT load: average of 15 million transactions per day
• Current average response time API call: < 100 ms
• Initial set-up costs are earned back via other business cases making use
of the infrastructure.
28. Key takeaways
• Fail fast: Experimental approach gives quick insights of possible fit within the
overall data architecture.
• Every technology component must be replaceable when choices made earlier
are proved to be not as good as expected and Hadoop technologies change fast.
• Hire (professional services) expertise for securing your cluster. Kerberos is a
headache but necessary.We thought we secured everything: NOT.
• Stay in control of the data provisioned via API’s.
• Data Governance is key to keep an overview of your Data Lake and also to
comply with all regulations like GDPR and BCBS239. A good Data Catalog is a
must.