These are some of the industries and types of use cases where we’ve enabled digital transformation
While Customer 360 isn’t an industry, it’s a use case that goes across all industries
Industry standard architecture Kappa / Lambda architecture for on-prem or cloud based analytics
Many customers don’t implement this entire architecture – only components that fit their use cases.
Eg. Only a data warehouse
Only a data lake
A combination – a data lake that feeds a dw
Typical challenge for implementing components of this architecture
How do we get data ingested quickly?
How do we conform data so its analytics and data scientist ready?
How do we become agile in our data warehouse and data integration architecture?
How can we automate these end to end processes?
Attunity’s solutions provide easy to use, standardized methods for creating automated data pipelines for any aspect of this architecture. Ensuring you can meet your business needs while also providing flexibility to evolve your architecture over time.
While our solutions don’t typically integrate or interact with the data consumers or data scientist community – those we do impact those data consumers and their ability to leverage right-time information that we automate and curate for them.
Discuss the Attunity components and where they fit.
Let’s look briefly at the architecture. Attunity Replicate is hosted on an intermediate Windows or Linux server that sits between one or more sources and one or more targets.
We support one to one (one way or two way), one to many/many to one (hub and spoke) and logically independent bi-directional replication topologies. Data transfer is executed in memory. Attunity Replicate is primary focused on extracting and loading data, but does perform light filtering and transformations. Complex transformations are handled by Attunity Compose. We support a range of end points both on premises and in the cloud. In almost all cases we require no software to be installed on either source or target, which simplifies administration and minimizes impact on production applications. More on that to come.
Attunity Replicate automatically generates target databases based on metadata definitions in the source schema.
You can use a graphical task map to configure database schema mappings between heterogeneous sources and targets.
CDC can run concurrently with a batch load, then continue upon batch completion to ensure targets remain up to date.
Any DDL changes made to source schema, such as table/column additions or changes to data types, can be replicated dynamically to the target.
You can define which data to replicate, filtering by column, value range or data type
Users also can perform transformations such as the addition, deletion or renaming of target columns or the changing of data types.
To understand why we have invested so much in our Data Warehouse Automation technology you have to understand the issues with the traditional method of deploying a dw.
Traditional data warehouse processing doesn’t meet today’s business needs.
Data is often consumed in batch with a large impact to source systems and only providing eod analytics.
Modelling is manual process which often leads to a complex etl design and build.
DW architects have to build custom frameworks to support DevOps and data quality and data validation
All this results in a delayed time to market with long often manual coding efforts and long testing cycles
By the time the business sees the output its often not what they truly wanted, not what they need or the data is not timely enough for them operationally.
This leads to changes to requirements and a feedback loop that in turn impacts the end to end dw process.
When we look at what delivering analytics and consumer ready data sets mean, we started by looking at our customers need.
Ingest the data with low impact capture mechanisms and deliver in real-time to the lake.
This requires a write-optimized format to keep up with data changes
Customers also insist that as data is delivered even to data lakes – there is consistency there. We handle this via our built in partitioning mechanism.
This is all handled by our best of breed cdc solution – Replicate.
Customers want a standardized set of historical data that they can leverage to provision other data sets. This is our storage or assembly zone.
It provides a standardized historical view of data delivered by Replicate
But in a READ-OPTIMIZED Parquet format.
We need to deliver this at scale and we leverage Spark to do so which is an increasing customer requirement.
Customers also want to provision data sets and provide subset or enriched data to the consumers.
This means being able to treat the data lake like a database and provide a current view, or a type 2 historical view with effective and end dates as well as point in time snapshots.
For analytics consumers this means read-optimized again. Columnar formats like parquet or ORC.
Automated at scale.
This is handled by Compose. It understands the data delivered consistently by Replicate and automates the generation of spark flows to assemble and provision data – fulfilling customers analytics and read optimized requirements