The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
TechEvent DWH Modernization
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG
KOPENHAGEN LAUSANNE MANNHEIM MÜNCHEN STUTTGART WIEN ZÜRICH
DWH Modernization
Do I need a data lake? If yes, why?
Jan Ott
@jan_ott_ch https://janottblog.com
2. Jan Ott
Working at Trivadis 20 years
Principal Consultant BI
Speaker at Conferences
Consultant, Trainer, Software Architect for BI: DWH & Big Data
More than 20 years of software development experience
Contact: jan.ott@trivadis.com
TechEvent September 20182 26.09.2018
3. Agenda
TechEvent September 20183 26.09.2018
1. Initial situation at the customer
2. DWH - Big Data Architecture
3. Lizences & Knowledge
4. Summary - Do I need a data lake?
5. Current and desired status
TechEvent September 20185 26.09.2018
Current:
• 1 x load per week full load
• 1 x load per day CRM delta load
• Loading window getting to small
• ...
Desired:
• 1 x per day delta load
• Streaming of some data
• Plattform for analytics team
• Methodes to add public available data
• ...
6. Data Warehouse Architecture
Data Warehouse
Staging Area Cleansing Area Core Data Marts
Meta Data
BI PlatformSources
ETL
6 TechEvent September 201826.09.2018
7. The „Big“ Shift in Analytical Data Management
TechEvent September 201826.09.2018
Stable and Consolidated Data
DWH as Single Point of Truth
Business Driven Analytical Schema
Assured Data Quality and Data History
Preservation
Governed and Secure Data to meet
Compliance
Agile to support New Business Demands
Support of Self-Service features
Right-time (near real-time, not batch)
Scales to support More Data, New
Sources and Broader Use Cases
Simplified from modelling, quality and
development
It´s much more an enrichment than a substitution of the requirements!
Traditional BI/DWH Requirements Emerging Analytical Requirements
Velocity
Volume
Variety
7
8. Data Lake = DWH + Möglichkeiten - Komplexität?
8
…
Inter-
net
CRM
Event
ERP
26.09.2018 TechEvent September 2018
10. Reference Architecture
Analytical Platform Automation
Meta DataGeneratorTemplate
Generate
Artefact
Data Lineage
Generate
Tracing info
26.09.2018 TechEvent September 201810
11. Reference Architecture
Analytical Platform Automation
Meta DataGeneratorTemplate
Generate
Artefact
Data Lineage
Generate
Tracing info
1
3
2
4
5
0
26.09.2018 TechEvent September 201811
12. How to do Big Data?
26.09.2018 TechEvent September 201812
13. Big Data Ecosystem – many choices ….
26.09.2018 TechEvent September 201813
14. Reference Architecture
Analytical Platform Automation
Meta DataGeneratorTemplate
Generate
Artefact
Data Lineage
Generate
Tracing info
1
3
2
4
5
0 0
1
2
3
4
5
1
1
2
2
3
3
2
CONNECT
26.09.2018 TechEvent September 201814
* DB is a logical standby
15. Key Success Factors for a Big Data Project
1. Support from Business Sponsor
2. Start with Outcome Answer First
3. Involve Real Users and Create Effective Use Cases
4. Define Quick-Win and Phasing
5. Sufficient Data Source
6. Choose the Open Technology Platform
7. Identify SLA for Service Operation
8. Project Review
15 TechEvent September 201826.09.2018
16. Big Data is still “work in progress”
Choosing the right architecture is key for any (big data) project
Big Data is still quite a young field and therefore there are no standard architectures
available which have been used for years
In the past few years, a few architectures have evolved and have been discussed online
Know the use cases before choosing your architecture
To have one/a few reference architectures can help in choosing the right components
16 TechEvent September 201826.09.2018
17. StreamSets Data Collector
Founded by ex-Cloudera, Informatica
employees
Continuous open source, intent-driven, big
data ingest
Visible, record-oriented approach fixes
combinatorial explosion
Batch or stream processing
• Standalone, Spark cluster, MapReduce
cluster
IDE for pipeline development by ‘civilians’
Relatively new - first public release
September 2015
So far, vast majority of commits are from
StreamSets staff
17 TechEvent September 201826.09.2018
18. Apache Avro
• Row-based Data Serialization system
• Uses JSON based schemas
• Uses RPC calls to send data
• Schema’s sent during data exchange
• Integrated with many languages
• Fast binary data format or encode
with JSON
{
"namespace": "trimazon.schema.customer",
"type": "record",
"name": "customer",
"fields": [
{"name": "firstName", "type":"string"},
{"name": "lastName", "type":"string"},
{"name": "age", "type":"int"},
{"name": "email", "type":"string"}
]
}
18 TechEvent September 201826.09.2018
20. DWH Challenges & Key Issues – Data Warehouse Automation
TechEvent September 201820
Drive development performance ensure standardization
Automation of development tasks& generator based standardization
Close the gap in Requirements-Development-Governance
Closed loop Design & Development process in one application
Manage the change: Lifecycle Management
Extensive Version Management for documentation and impact analysis
Agility - agile data warehousing
Automation enables short Release Cycles and Sandboxing approaches
Achieve Flexibility – support for individual architecture options
Configurable generator is able to support real world DWH-architecture
26.09.2018
21. Drive development performance ensure standardization
TechEvent September 201821
Reduced Testing effort
substantial time and
cost savings
Standardization
Generator
Data Base
Objects
Mappings
Data Flow
Model Meta-
definition
Staging
Cleansing
DWH-Core
Data Mart
Source Source Source
Automation of
Development tasks
Huge amount of
recurring and monotonic
development tasks.
Standards/
Best Practices
1
26.09.2018
26. Summary
TechEvent September 201826 26.09.2018
Pro:
Streaming
Plattform for data analysis
Flexibility
• Different data formats
• Add new data quickly
Basis to build on
Ready for the future
More Data available
• More years
• Higher granularity
Contra:
• Cost
• Complexity
• New Knowledge required
28. Session Feedback – now
TechEvent September 201828 26.09.2018
Please use the Trivadis Events mobile app to give feedback on each session
Use "My schedule" if you have registered for a session
Otherwise use "Agenda" and the search function
If the mobile app does not work (or if you have a Windows smartphone), use your
smartphone browser
– URL: http://trivadis.quickmobileplatform.eu/
– User name: <your_loginname> (such as "svv")
– Password: sent by e-mail...