SlideShare une entreprise Scribd logo
1  sur  15
Bridging the Gap
John “Colonel” Kuchmek
Adam “Chief” Michalsky
WHO WE ARE
We serve a broad national footprint and a strong
local presence.
We provide services to approximately 15 million
people in 46 states and Ontario, Canada.
We employ 6,900 dedicated and active employees
and support ongoing community support and
corporate responsibility.
We treat and deliver more than one billion gallons
of water daily.
We are the largest and most geographically
diverse publicly traded water and wastewater
service provider in the Unites States.
Problem Statement
Achieve fast change data capture from SAP while providing de-
normalized data sets to end consumers without impacting the
source transactional systems.
Hana table replication maintains source system normalization
which can be a problem for business logic design in application
use
No Hana change data capture existed using denormalized table
structures
Environment
4 Management Nodes:
(32 Cores x 78 GB)
8 Compute Nodes
(32 Cores x 128 GB)
2 Management Nodes:
(6 Cores x 16 GB)
5 NiFi Nodes
(16 Cores x 64 GB)
Data Ingestion - Architecture
Runtime
SLT
SOURCE INGEST STORAGE ANALYTICS UI/UX
Adding Timestamp to SLT
Dataset Denormalization
CDC Process – High Level
Staging Delta Base
Data Ingestion
Metrics (Average Merge Time)
maintenancenotificati
onacts
meterataglance crmlongtext interactionrecords ecclongtext
maintenanceordersta
tus
contractaccountdoch
eaderfb
records 2764 15248 19958 18970 20235 8183 175433
base table 36220184 13589752 18753324 74356523 143224450 166172977 398561392
seconds 141 121 178 99 139 152 449
1
10
100
1000
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
#OFRECORDSLOGBASE10
Average Concurrent Merges per Table on LLAP
Metrics (Max, Min & Mean)
min(simultaneous) average(simultaneous) max(simultaneous)
records 18970 37255.85714 175433
base table 74356523 121554086 398561392
seconds 99 182.7142857 449
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
#OFRECORDSLOGBASE10
Min,Avg, Max Time for 7 Concurrent Merges on LLAP
Source System Load
DURATION (Seconds)
0
50,000
100,000
150,000
200,000
250,000
300,000
5/18/1813:41
5/18/1813:40
5/18/1813:40
5/18/1813:36
5/18/1813:36
5/18/1813:35
5/18/1813:35
5/18/1813:35
5/18/1813:35
5/18/1813:35
5/18/1813:02
5/18/1813:00
5/18/1812:58
5/18/1812:56
5/18/1812:56
5/18/1812:55
5/18/1812:54
5/18/1812:54
5/18/1812:54
5/18/1810:10
5/18/1810:08
5/18/1810:07
5/18/1810:04
5/18/1810:04
5/18/1810:03
5/18/1810:03
5/18/1810:03
5/18/1810:03
5/18/1810:03
#OFRECORDS
TIME
Source System Load (snapshot)
250,000-300,000
200,000-250,000
150,000-200,000
100,000-150,000
50,000-100,000
0-50,000
Average CPU Utilization
0
10
20
30
40
50
60
%UTILIZATION
TIME
Average CPU Usage Accross 8 Node Cluster
Average of Minimum CPU Load
Average of Average CPU Load
Average of Peak CPU Load
Average Memory Used (hourly)
0
10
20
30
40
50
60
70
80
90
100
MEMORYINGB
TIME
Average Memory Used Across 8 Node Cluster
Average of Minimum Memory Used
Average of Average Memory Used
Average of Peak Memory Used
THANK YOU

Contenu connexe

Tendances (20)

Big Data
Big DataBig Data
Big Data
 
Appunti di big data
Appunti di big dataAppunti di big data
Appunti di big data
 
07 big data sgbd
07 big data sgbd07 big data sgbd
07 big data sgbd
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into Production
 
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, TechviserBig Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
Big Data - Yesterday, Today and Tomorrow by John Mashey, Techviser
 
01 intro
01 intro01 intro
01 intro
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
Big data
Big dataBig data
Big data
 
memory management of windows vs linux
memory management of windows vs linuxmemory management of windows vs linux
memory management of windows vs linux
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Similaire à Bridging the gap: achieving fast data synchronization from SAP HANA by leveraging Hadoop

SAP CDC and NiFi Flow as a Service
SAP CDC and NiFi Flow as a ServiceSAP CDC and NiFi Flow as a Service
SAP CDC and NiFi Flow as a ServiceJohn Kuchmek
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsDataWorks Summit
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jNeo4j
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformAmazon Web Services
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtGenoveva Vargas-Solar
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATAInfluxData
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
SAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance FeaturesSAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance FeaturesSAP Technology
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Matt Stubbs
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
 
Storage Efficiency Customer Success Stories Sept 2010 power point
Storage Efficiency Customer Success Stories Sept 2010 power pointStorage Efficiency Customer Success Stories Sept 2010 power point
Storage Efficiency Customer Success Stories Sept 2010 power pointMichael Hudak
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingSantosh Sahoo
 

Similaire à Bridging the gap: achieving fast data synchronization from SAP HANA by leveraging Hadoop (20)

SAP CDC and NiFi Flow as a Service
SAP CDC and NiFi Flow as a ServiceSAP CDC and NiFi Flow as a Service
SAP CDC and NiFi Flow as a Service
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management Things
 
Teradata a z
Teradata a zTeradata a z
Teradata a z
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
Netapp Storage
Netapp StorageNetapp Storage
Netapp Storage
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
SAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance FeaturesSAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance Features
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Storage Efficiency Customer Success Stories Sept 2010 power point
Storage Efficiency Customer Success Stories Sept 2010 power pointStorage Efficiency Customer Success Stories Sept 2010 power point
Storage Efficiency Customer Success Stories Sept 2010 power point
 
Realtime Reporting using Spark Streaming
Realtime Reporting using Spark StreamingRealtime Reporting using Spark Streaming
Realtime Reporting using Spark Streaming
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Bridging the gap: achieving fast data synchronization from SAP HANA by leveraging Hadoop

  • 1. Bridging the Gap John “Colonel” Kuchmek Adam “Chief” Michalsky
  • 2. WHO WE ARE We serve a broad national footprint and a strong local presence. We provide services to approximately 15 million people in 46 states and Ontario, Canada. We employ 6,900 dedicated and active employees and support ongoing community support and corporate responsibility. We treat and deliver more than one billion gallons of water daily. We are the largest and most geographically diverse publicly traded water and wastewater service provider in the Unites States.
  • 3. Problem Statement Achieve fast change data capture from SAP while providing de- normalized data sets to end consumers without impacting the source transactional systems. Hana table replication maintains source system normalization which can be a problem for business logic design in application use No Hana change data capture existed using denormalized table structures
  • 4. Environment 4 Management Nodes: (32 Cores x 78 GB) 8 Compute Nodes (32 Cores x 128 GB) 2 Management Nodes: (6 Cores x 16 GB) 5 NiFi Nodes (16 Cores x 64 GB)
  • 5. Data Ingestion - Architecture Runtime SLT SOURCE INGEST STORAGE ANALYTICS UI/UX
  • 8. CDC Process – High Level Staging Delta Base
  • 10. Metrics (Average Merge Time) maintenancenotificati onacts meterataglance crmlongtext interactionrecords ecclongtext maintenanceordersta tus contractaccountdoch eaderfb records 2764 15248 19958 18970 20235 8183 175433 base table 36220184 13589752 18753324 74356523 143224450 166172977 398561392 seconds 141 121 178 99 139 152 449 1 10 100 1000 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 #OFRECORDSLOGBASE10 Average Concurrent Merges per Table on LLAP
  • 11. Metrics (Max, Min & Mean) min(simultaneous) average(simultaneous) max(simultaneous) records 18970 37255.85714 175433 base table 74356523 121554086 398561392 seconds 99 182.7142857 449 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 #OFRECORDSLOGBASE10 Min,Avg, Max Time for 7 Concurrent Merges on LLAP
  • 12. Source System Load DURATION (Seconds) 0 50,000 100,000 150,000 200,000 250,000 300,000 5/18/1813:41 5/18/1813:40 5/18/1813:40 5/18/1813:36 5/18/1813:36 5/18/1813:35 5/18/1813:35 5/18/1813:35 5/18/1813:35 5/18/1813:35 5/18/1813:02 5/18/1813:00 5/18/1812:58 5/18/1812:56 5/18/1812:56 5/18/1812:55 5/18/1812:54 5/18/1812:54 5/18/1812:54 5/18/1810:10 5/18/1810:08 5/18/1810:07 5/18/1810:04 5/18/1810:04 5/18/1810:03 5/18/1810:03 5/18/1810:03 5/18/1810:03 5/18/1810:03 #OFRECORDS TIME Source System Load (snapshot) 250,000-300,000 200,000-250,000 150,000-200,000 100,000-150,000 50,000-100,000 0-50,000
  • 13. Average CPU Utilization 0 10 20 30 40 50 60 %UTILIZATION TIME Average CPU Usage Accross 8 Node Cluster Average of Minimum CPU Load Average of Average CPU Load Average of Peak CPU Load
  • 14. Average Memory Used (hourly) 0 10 20 30 40 50 60 70 80 90 100 MEMORYINGB TIME Average Memory Used Across 8 Node Cluster Average of Minimum Memory Used Average of Average Memory Used Average of Peak Memory Used

Notes de l'éditeur

  1. In HANA studio we can de-normalize the datasets.
  2. The end result in HANA will look like this. UPDATE_TS is our timestamp field. Special Notes: A timestamp will only be updated once a change occurs. After initial replication timestamps will be null or 0. If you want to add a timestamp on a table that already exists on SLT then it needs to be re-replicated.
  3. In HANA studio we can de-normalize the datasets.