Modernize & Automate Analytics Data Pipelines with Attunity and Microsoft Azure

Modernize & Automate Analytics
Data Pipelines
Attunity and Microsoft Azure

2© 2018 Attunity 2© 2017 Attunity
AGENDA
09:00 AM - 09:15 AM Introductions
09:15 AM - 09:45 AM The Business Value of Real-Time Analytics
09:45 AM - 10:30 AM Why you should use Microsoft Azure as your Analytics Platform
10:30 AM - 10:45 AM Break
10:45 AM - 11:30 AM Delivering Real-Time Data to the Azure Cloud
11:30 AM - 12:15 PM Data Warehouse Automation for Azure (without ETL coding!)
12:15 PM - 01:00 PM Automate Analytics Ready Data Sets in Azure Data Lake
01:00 PM - 02:00 PM Lunch and Q&A with the Data Platform experts

DATA AS THE NEW OIL
https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data

4© 2018 Attunity
INSIGHT-DRIVEN BUSINESSES

5© 2018 Attunity
FRAUD-DETECTION BEFORE
POS Backend
System(s)
Analytics
Repository
X X
Fraud Analytics Model
$$$
Cost

6© 2018 Attunity
FRAUD-DETECTION REAL-TIME FLOW
HDFS
POS Backend
System(s)
Analytics
Repository
X
Fraud Prediction Model
Fraud Prediction Model
Fraudulent TransactionX

Predict energy
prices
Reduce energy
consumption and
outages
Predictive
maintenance
Forward capacity
planning
Personalization
Contextual
Recommendations
Dynamic Pricing
REAL-TIME ANALYTICS IMPACTS ALL
INDUSTRIES
Retail &
Consumer Services
Manufacturing &
Supply Chain
Utilities
Fraud Detection
Portfolio Analysis
Risk Management
Financial Services
Social media
sentiment analysis
Customer churn
Customer 360

REAL-TIME DASHBOARDS – LATE
SHIPMENTS

9© 2019 Attunity
CURRENT ANALYTICS REPOSITORY
CHALLENGES
Source
Systems
Existing Analytics Repository Current Challenges
• Real-time data needs are not being met
• Can not support the pace of new strategic
business initiatives
• Costly to maintain
• Inflexible infrastructure / must support largest
analytics workloads
• Supports multiple problem domains –
operational reporting & analytics

Analyze a broader set of data structures
as well as structured data
Faster and improved decision making
Leverage AI/ML, IoT and decision
automation for a competitive advantage
Requires managed Data Lake creation
and Big Data processing at scale
Requires real-time data from on-
premise systems and cloud platforms
Next Generation Analytics
Reduce the costs associated with legacy
EDW’s and provide elasticity
Meet new business requirements
Support more advanced analytics
Replace traditional ETL with modern
self-service capabilities
Requires real-time data from on-
premise systems and cloud platforms
Data Warehouse Modernization
SaaS
IaaS
PaaS
DB
MF
EDW
FILES
DWaaS
TRENDS DRIVING INTEGRATION MODERNIZATION &
AUTOMATION
DATA
CONSUMPTION
& ANALYTICS
DB
MF
EDW
FILES

Data Delivery
SQL DW
DATA WAREHOUSE
Data Lake
ADLS
Operational Management
MONITORINGALERTS ANALYTICS METADATACENTRALIZED
RAW  ASSEMBLED  PROVISIONED
The Attunity Platform
Accelerate your Azure Analytics Journey
Enterprise
Data Sources
APPS / OTHER
RDBMS
FILE
SAP
MAINFRAME
Data Streaming
KAFKAEVENT HUB
Operational Management
MONITORINGALERTS ANALYTICS METADATACENTRALIZEDMONITORINGALERTS ANALYTICS METADATACENTRALIZED
E-
LT
Data
Ingestion
REAL-TIME
DATA MOVEMENT AUTOMATION
Data Science Processing
PREDICTIVEAL / ML
DATA LAKE / DATA WAREHOUSE AUTOMATION
Consumers
POWER BI
Analysis Services

Agenda
09:00 AM - 09:15 AM Introductions
09:15 AM - 09:45 AM The Business Value of Real-Time Analytics
09:45 AM - 10:30 AM Why you should use Microsoft Azure as your Analytics Platform
10:30 AM - 10:45 AM Break
10:45 AM - 11:30 AM Delivering Real-Time Data to the Azure Cloud
11:30 AM - 12:15 PM Data Warehouse Automation for Azure (without ETL coding!)
12:15 PM - 01:00 PM Automate Analytics Ready Data Sets in Azure Data Lake
01:00 PM - 02:00 PM Lunch and Q&A with the Data Platform experts

Real-Time Data Ingestion
to the Azure Cloud

14© 2017 Attunity
Data Ingestion – Attunity Replicate

15© 2017 Attunity
Universal Solution
for the Microsoft Data Platform
EASY NO DOWNTIME
HETEROGENEOU
S
MIGRATION
LOW IMPACT OPTIMIZED PERFORMANCE
ANALYTICS/BI
REAL-TIME REPLICATION
ON PREM
CLOUD
MAINFRAMES
SQL Database
SQL Data Warehouse
ADLS & BLOB
Event Hubs
2012
Parallel Data Warehouse
Analytics Platform System
Azure DB for MySQL
Azure DB for PostgreSQL

Pre-packaged
automation of complex
tasks
Modern user
experience
Zero source footprint
Change data
capture (CDC)
Stream live updates
Optimized for high-
performance
movement
All major platforms
DB | DW | Hadoop |
Legacy
On Premises | Cloud
SAP | Mainframe
Simplified Real-Time Universal
Data Delivery with Attunity Replicate

Attunity Replicate Architecture
TRANSFER
IN-MEMORY
FILTER
DATA LAKE
RDBMS
DATA
WAREHOUSE
FILES
MAINFRAME
TRANSFORM
PERSISTENT
STORE
LOG BASED
CDC
BATCH
INCREMENTAL
BATCH
RDBMS
DATA
WAREHOUSE
STREAMING
FILES
ATTUNITY POWERPOINT ASSET LIBRARY
DATA
LAKE

18© 2017 Attunity
TARGET SCHEMA
CREATION
SAP
RDBMS
EDW
FILE
MAINFRAME
HETEROGENEOUS
DATA TYPE MAPPING
BATCH TO CDC
TRANSITION
DDL CHANGE
PROPAGATION
FILTERING
TRANSFORMATIONS
RDBMS
EDW
FILES
STREAMING
REPLICATE
Attunity Replicate - End to End Automation
DATA LAKE

SOURCES
CLOUD
Amazon RDS
(SQL Server, Oracle,
MySQL, Postgres)
Amazon Aurora
(MySQL)
Amazon Redshift
Azure SQL Server
M1 (Q1)
ATTUNITY - PLATFORM SUPPORTABILITY MATRIX
SAP
ECC
ERP
CRM
SRM
GTS
MDG
S/4HANA
(on Oracle, SQL,
DB2, HANA)
DATABASE
Oracle
SQL Server
DB2 iSeries
DB2 z/OS
DB2 LUW
MySQL
PostgeSQL
Sybase ASE
Informix
ODBC
EDW
Exadata
Teradata
Netezza
Vertica
Pivotal
MAINFRAME
DB2 z/OS
IMS/DB
VSAM
FLAT FILES
Delimited
(e.g., CSV, TSV)
TARGETS
FLAT FILES
Delimited
(e.g., CSV, TSV)
STREAMING
Kafka
Amazon Kinesis
Azure Event Hubs
MapR Streams
SAP
HANA
EDW
Exadata
Teradata
Netezza
Vertica
Sybase IQ
SAP HANA
Microsoft PDW
GOOGLE
Cloud SQL (MySQL,
Postgres)
Cloud Storage
Dataproc
PubSub (‘19)
Big Query (Q2)
DATA LAKE
Hortonworks
Cloudera
MapR
Amazon EMR
Azure HDInsight
Google Dataproc
DATABASE
Oracle
SQL Server
DB2 LUW
MySQL
PostgreSQL
Sybase ASE
Informix
MemSQL
Compose support
AZURE
DBaaS (SQL DB)
DBaaS (MySQL,
Postgres)
ADLS
BLOB
HDInsight
Event Hub
SQL DW
Snowflake (Q1)
Databricks (Q2)
AWS
RDS (MySQL,
Postgres, MariaDB,
Oracle, SQL Server)
Aurora (MySQL,
Postgres)
S3
EMR
Kinesis
Redshift
Snowflake (Q1)
Databricks (Q2)
SaaS
Salesforce (Q2)

Attunity Replicate Demo

Attunity Data Warehouse Automation
Solution Overview

22© 2017 Attunity
Data Warehouse Automation– Attunity Compose

STAGING
AREA
--------
Trunc &
Load
EDW
-------
3NF
DATA MART
-------
Star Schema
CRM
ERP
FINANCE
LEGACY
SOURCES
ETL
ETL
ETL
ETL
ETL
ETL
ETL
ETL
ETL
ETL
That’s not
exactly what I
wanted
Why is my
data always
X day(s) old
BUSINESS / CONSUMER / REQUIREMENTS CHANGES
IMPACT IMPACT IMPACT
We don’t
need that
anymore
Complex transformations
Requirement / Source chg.
Data quality & validation
Manual Modelling
Complex ETL design
DevOps design
Why Data Warehouse Automation ?
Traditional Data Warehousing Methods are failing the business
Complex Design
Impact to source
Bulk – not change data
Long running extracts
Batch/EOD Based
Complex Build
Long, manual coding effort
Long testing cycles
Slow to react to changes
Time to Market

AUTOMATED WORKFLOW
Real-Time
Extract
Auto Extraction,
Loading,
Mapping
Auto Generated
Transformations
Change
Propagation
Auto Design with
Best Practices
“DWA will accomplish an initial BI implementation up to five times faster than traditional methods”*
*TDWI Data Warehouse Automation Course
REAL-TIME
ODS
STAGING EDW MARTS
Data Pipeline for data warehouses
commit to model architecture
Azure SQL DW
Oracle
SQL Server
RedShift
Snowflake **
DATA MOVEMENT AUTOMATION DATA LAKE / DATA WAREHOUSE AUTOMATION

Compose for Data Warehouse Demo
EDW MARTS
MDM
Sales
Sales
Service
Ticket
What can we do
to better manage
late shipments?

CUSTOM MAPPINGS
DATA MART
DESIGN
SOURCE
MODEL
Data Warehouse Model Generation
Automated Mapping Generation
Data Warehouse ETL Generation
Error Mart Generation
Data Mart (Star Schema) ETL Generation
Workflow Generation & Orchestration
Automates
 Native CDC integration
 E-LT Set based, best practice data loads
 Transparent, editable E-LT
 Surrogate Key
 Type 1 / Type 2
 Referential integrity (late arriving dimensions)
 Error mart automation
 Data validation
 Automated Workflow & Dependencies
DW MODEL
Automates
 Flexible physical model
 3NF / Data Vault methodology
 Transparent, editable DDL
ROBUST E-LT
DATA VALIDATION /
QUALITY RULES
STAR
SCHEMAS
Automates
 Type 1 / Type 2 conformed dimensions
 Automatic Incremental processing from DW
 Automated flattening of dimensions
 Granular, Aggregate & Time-Oriented Fact support
Documentation & Deployment
Compose for data warehouses
Automation of complex data processing requirements

Attunity Data Lake Automation
Solution Overview

The Attunity Difference
Automating Data Lake Ingestion
DATA LAKE AUTOMATION
“Current view example”

1. Land 2. Store 3. Provision Consume
CAPTURE
PARTITION
ENRICH
SUBSET
STANDARDIZE
MERGE
FORMAT
ANALYZE
PREPARE
CLEANSE
JOIN
Raw
Deltas
Full
Change
History
ODS
HDS
Snapshot
Data sets
SAP
RDBMS
DATA
WAREHOUSE
FILES
MAINFRAME
Source
Deliver Analytics Optimized Data Sets
FOR DATA LAKES
 Real-time high volume delivery
 Consistent data
 Write optimized format
 Standardized historical view
 Read optimized format
 Automated at scale 1,000’s of source
entities
 Current / Type 2 / Snapshot
 Read optimized format
 Automated loads w/ Spark & Hive

Data Lake Storage <bucket/container/folder>
Source -> Landing -> Storage Data Flow
Source Landing
customer
customer__ct
.seq / .csv
customer
update customers
set name = ‘Maria Anders’
where id = 1;
Delete from
customers where id = 2;
Insert into customers
values (3, ‘New Customer’);
Storage
customer
customer__delta
.snappy.parquet

Data Lake Storage
Provisioning *Storage
Storage -> Provisioning Data Flow
customer
customer__delta
HDS (“Type 2”)
Snapshot (Point-in-time)
ODS (“current”)
.orc
.snappy.parquet
.snappy.parquet / .orc /.avro
Compactor
Task <Spark>
*Each provisioning task has its own
bucket / container / storage location

Azure Data Lake Storage
Azure Data Lake Architecture with Attunity
Automated Data Ingestion and Provisioning
Enterprise
Data Sources
APPS / OTHER
RDBMS
FILE
SAP
MAINFRAME
Raw
Transactiona
l Data
<SEQ/CSV>
Standardized
Historical
Raw Data
<Parquet>
Provisioned
Data Sets
<Parquet/
ORC/Avro>
HDInsight
Attunity
Replicate
Batch Load
CDC
Attunity Compose
for Data Lakes
(Cloud VM or on prem)
Compose
Agent
 Industry leading CDC and data ingestion with Attunity Replicate
 Automate standardization and provisioning of consumer ready data sets with Attunity Compose
 Automated handling of schema evolution across 1,000’s of entities
Metadata / Instruction
channel
Data flow
BI / Data
Science etc.
 Azure ADLS
 Google Storage
 HDFS
 AWS S3
 Azure HDInsight
 Azure Databricks**
 AWS EMR
 Google DataProc
 Hortonworks
 Cloudera

Compose for Data Lakes Demo

AGILE DATA DELIVERY
WHAT YOU CAN ACHIEVE WITH ATTUNITY
COMPOSE
High levels of satisfaction for business
Significantly improved utilization of resources
Maximized productivity
Rapid adaption to business changes
Vastly improved data quality, delivered real-time

Trusted by 2000 Customers Worldwide
And Half the Fortune 100
FIN. SERVICES MANUF. / INDUS. GOVERNMENTHEALTH CARE
TECHNOLOGY / TELECOM OTHER INDUSTRIESRETAIL

Modernize & Automate Analytics Data Pipelines with Attunity and Microsoft Azure

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Modernize & Automate Analytics Data Pipelines with Attunity and Microsoft Azure

Similaire à Modernize & Automate Analytics Data Pipelines with Attunity and Microsoft Azure (20)

Dernier

Dernier (20)

Modernize & Automate Analytics Data Pipelines with Attunity and Microsoft Azure

Notes de l'éditeur