Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture

These notes describe a generalised data integration architecture framework and set of capabilities.

With many organisations, data integration tends to have evolved over time with many solution-specific tactical approaches implemented. The consequence of this is that there is frequently a mixed, inconsistent data integration topography. Data integrations are often poorly understood, undocumented and difficult to support, maintain and enhance.

Data interoperability and solution interoperability are closely related – you cannot have effective solution interoperability without data interoperability.

Data integration has multiple meanings and multiple ways of being used such as:

- Integration in terms of handling data transfers, exchanges, requests for information using a variety of information movement technologies

- Integration in terms of migrating data from a source to a target system and/or loading data into a target system

- Integration in terms of aggregating data from multiple sources and creating one source, with possibly date and time dimensions added to the integrated data, for reporting and analytics

- Integration in terms of synchronising two data sources or regularly extracting data from one data sources to update a target

- Integration in terms of service orientation and API management to provide access to raw data or the results of processing

There are two aspects to data integration:

1. Operational Integration – allow data to move from one operational system and its data store to another

2. Analytic Integration – move data from operational systems and their data stores into a common structure for analysis

  • Soyez le premier à commenter

Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture

  1. 1. Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture Alan McSweeney http://ie.linkedin.com/in/alanmcsweeney https://www.amazon.com/dp/1797567616
  2. 2. Data Integration, Access, Flow, Exchange, Transfer, Load, Share And Extract • Set of data movements between data entities - data sources and data targets - across the organisation’s data landscape • Data integration is more than just extracting data from operational systems to populate data warehouses and long-term data stores • The movement, creation, transfer and exchange of data breathes life into the set of organisation solutions • Data integration is the combination of all these data flows, transfers, exchanges, loads, extracts that occurs across the data landscape and the tools, methods and approaches to facilitating and achieving them • Data integration is an enterprise-level capability that should be available to all applications and solutions • The organisation’s data fabric should include infrastructural components and tools that deliver these data integration facilities • Individual solution and applications and their implementation projects should not have to create (additional) point-to-point custom integrations • Data interoperability and solution interoperability are closely related – you cannot have effective solution interoperability without data interoperability March 22, 2021 2
  3. 3. Evolution Of Data Integration • With many organisations, data integration tends to have evolved over time with many solution-specific tactical approaches implemented • The consequence is that there is frequently a mixed, inconsistent data integration topography • Data integrations are often poorly understood, undocumented and difficult to support, maintain and enhance March 22, 2021 3
  4. 4. Current State Of Data Integration March 22, 2021 4
  5. 5. Data Integration • Data integration has multiple meanings and multiple ways of being used such as: − Integration in terms of handling data transfers, exchanges, requests for information using a variety of information movement technologies − Integration in terms of migrating data from a source to a target system and/or loading data into a target system − Integration in terms of aggregating data from multiple sources and creating one source, with possibly date and time dimensions added to the integrated data, for reporting and analytics − Integration in terms of synchronising two data sources or regularly extracting data from one data sources to update a target − Integration in terms of service orientation and API management to provide access to raw data or the results of processing March 22, 2021 5
  6. 6. Two Aspects Of Data Integration • Overall data integration architecture needs to handle both types March 22, 2021 6 Operational System Operational System Operational System Operational Integration – allow data to move from one operational system and its data store to another Analytic Integration – move data from operational systems and their data stores into a common structure for retrieval, reporting and analysis Operational System Operational System Analytic Data Store Data Retrieval
  7. 7. Data Integration And Organisation Data Plumbing March 22, 2021 7 Organisation Technology Solutions Landscape Data Plumbing Required to Support Solutions Landscape and Solution Interoperability
  8. 8. Data Fabric, Data Landscape And Data Entities • The data landscape is an integrated view of all data entities within (core) and outside (extended) the organisation that the organisation obtains, shares and provides data • The data fabric is the aggregation of the data entities and their data flows across the core and extended organisation • Data entities are data assets that are involved in the provisioning, storage, processing and transfer of organisation data − Data entities perform data-related activities across the spectrum of data actions and events − A data entity is a hardware or software technology component involved in any form of data processing March 22, 2021 8
  9. 9. Importance Of Data Integration In IT Architecture • Enterprise Architecture – defines overall IT architecture for the organisation • Data Architecture – defines the data architecture for the organisation, of which data integration and interoperability is one element • Solution Architecture – designs solutions in the context of overall enterprise and data architectures and the need for solutions to access, integrate, exchange, transfer and extract data − Effective data integration is key to solution interoperability • Data Integration Architecture – defines a common approach to and set of enabling and implementing technologies in the areas of data integration, access, flow, exchange, transfer, load and extract that can be used by all IT solutions March 22, 2021 9 Enterprise Architecture Data Architecture Data Integration Architecture Solution Architecture
  10. 10. Business And Information Technology Architecture March 22, 2021 10 Business Strategy Business Architecture Business Governance Information Technology Governance Information Technology Strategy Information Technology Architecture Data Architecture Information Technology Security Architecture Application, Solution, Infrastructure and Service Architecture
  11. 11. Overall Data Architecture And Capabilities March 22, 2021 11 Data Infrastructure and Storage Data Security, Protection, Access Control, Authentication, Authorisation Data Management, Governance, Architecture, Operations, Supporting Processes Data Reporting and Analytics, Visualisation Tools and Facilities Data Design, Modelling, Operational Data Stores Master and Reference Data Management Metadata Data Management Data Integration, Access, Flow, Exchange, Transfer, Transformation, Load And Extract Data Warehouse, Data Marts, Data Lakes Unstructured Data and Document Management External Data Sources and Interacting Parties
  12. 12. Data Integration Architecture March 22, 2021 12 Data Sources Data Channels Data Integration Security, Authentication, Authorisation Data Integration Operations Management, Administration Data Integration Development, Testing and Deployment External Data Sources and Targets Data Integration Technologies Data Integration Scheduler and Rules Engine Internal Data Sources and Targets
  13. 13. Data Integration As Part Of Overall Information Technology Architecture March 22, 2021 13 Overall Business and IT Architecture Context Data Architecture Components Data Integration Architecture Components
  14. 14. Organisation Data Zones • Data zones are containers for data entities with similar access and location characteristics March 22, 2021 14 Central Data Entities and Infrastructure Zone Business Unit/Location Entities and Infrastructure Zone(s) Organisation Data Zone Secure External Organisation Access Zone Secure External Organisation Participation and Collaboration Zone Insecure External Organisation Presentation And Access Zone
  15. 15. Sample Organisation Data Zones • Central Data Infrastructure – this contains the central data applications and their associated data • Business Unit/Location Data Infrastructure – this is an individual organisation business unit or location and the data entities it contains • Organisation – this data zone represents the entire organisation and it contains all the locations and business units or functions within the organisation • Secure External Organisation Access – this zone contains data entities that enable secure access from outside the organisation • Secure External Organisation Participation and Collaboration – this is a location outside the physical organisation boundary where data entities that are provided by or too trusted external parties reside, including cloud platforms • Insecure External Organisation Presentation And Access – this represents a location where publicly accessible data entities reside. These entities are regarded as insecure and/or untrusted • Integration can occur within and between data zones March 22, 2021 15
  16. 16. Source Data Entity Target Data Entity Internal And External Data • Data can be defined as internal or external − Internal data is (logically) held within a source data entity − External data is data brought into or send out of a source data entity to a target data entity March 22, 2021 16 Internal Data Data Entity Data Load, Data Processing, New Data Generation External Data External Data
  17. 17. Internal And External Data • At its core, data integration is concerned with enabling the transition of data from internal to external states • The internal and external state of data is separate from the internal to external location of the source or target data entity − Internal – within the organisation data zones − External – outside the organisation data zones March 22, 2021 17
  18. 18. Data Integration Issues And Trends March 22, 2021 18 The data landscape has been broadened and there are more data entities that form part of the extended organisation data landscape as more applications are moved to the cloud and as cloud platforms are used for providing additional facilities not currently present in organisations such as data analytics and machine learning Initiatives and projects that are part digital transformation programmes involve integrating data between internal and external parties Need to reduce the latency of data integration as response time requirements are reduce Performance, resilience and availability integration requirements are increasing Need to deploy operational integrations more quickly to respond to business needs There is a wider range of data entities as the data landscape increases in complexity Process automation initiatives require an operational data integration platform Greater volume and complexity of data integrations represent a potential data loss risk unless actively monitored and managed There are more data demands within the organisation especially in the areas of analytics and the associated data integrations from operational data sources
  19. 19. Data Trends Affecting Data Integration Greater volumes of operational data from increasing numbers of different sources and providers Greater volumes of derived data More data sources both internal and external to the organisation Data in larger numbers of different formats Data with wider range of contents Data being generated at different rates Data being generated at different times Data being generated with varying degrees accuracy, reliability and greater fuzziness Data that changes constantly Data that is of different utility and value March 22, 2021 19
  20. 20. Data Integration, Access, Flow, Exchange, Transfer, Load And Extraction Processes March 22, 2021 20 Application Data Source Application Data Store Data Load Data Transfer Data Exchange Application Application Data Access Data Extraction Data Source Data Flow Data Migration Data Extraction Data Store Data Replication Location Data Publication Application Data Presentation Application Data Retrieval
  21. 21. Data Integration, Access, Flow, Exchange, Transfer, Load And Extraction Processes March 22, 2021 21 Application Data Source Application Data Store Data Load Data Transfer Data Exchange Application Application Data Access Data Extraction Data Source Data Flow Data Migration Data Extraction Data Store Data Replication Location Data Publication Application Data Presentation Application Data Retrieval Data Integration
  22. 22. Data Integration, Access, Flow, Exchange, Transfer, Load And Extraction Processes • Within any organisation, there will be many different data movements being performed in different ways using different technologies and approaches: − API/Web Service − SOAP − RPC − SOA/ESB − FTP − ETL/ELT − EDI − AS1/2/3 − SMTP − Database replication − Change data capture − IPaaS − Stream processing − Message queueing (MQSeries, MQTT, AMQP, Active MQ, JMS, Azure Queues, …) − DB link − Batch − DDS − OPC-UA/IEC 62541 − IEC 60870 − Proprietary technologies (such as SWIFT) − … And many others March 22, 2021 22 Proliferation of integration technologies and approaches indicates the long-standing and pervasive nature of data integration with information technology
  23. 23. Wider Data Integration Concerns March 22, 2021 23 Cloud Data Store (Lake, Warehouse) SaaS Application and Data Store On Premises Data Application and Data Store On Premises Data Warehouse Cloud Reporting and Analysis Application On Premises Reporting and Analysis Application On Premises Data Application and Data Store On Premises Data Application and Data Store SaaS Application and Data Store SaaS Application and Data Store SaaS Application and Data Store IaaS Hosted Application and Data Store External Collaborating Party External DMZ
  24. 24. Wider Data Integration Scenarios And Concerns • The data integration landscape is becoming more heterogenous leading to data integration across data zones − Between on-premises entities − Between on-premises and external collaborating parties − Between external collaborating parties and cloud-based entities − Between on-premises and cloud SaaS solutions − Between on-premises and cloud infrastructure IaaS solutions − Within the same cloud provider − Between different cloud providers • The approach to data integration and the technologies to use has changed from a purely internal use only solution to one encompassing a range of inter-zonal data movements March 22, 2021 24
  25. 25. Data Integration Scenarios March 22, 2021 25 Cloud Data Store (Lake, Warehouse) SaaS Application and Data Store On Premises Data Application and Data Store On Premises Data Warehouse Cloud Reporting and Analysis Application On Premises Reporting and Analysis Application On Premises Data Application and Data Store On Premises Data Application and Data Store SaaS Application and Data Store SaaS Application and Data Store SaaS Application and Data Store IaaS Hosted Application and Data Store External Collaborating Party External DMZ Between On-premises Entities Between On-premises Entities and External Collaborating Parties
  26. 26. Data Integration Logical Components • On Premises Data Integration − Performs integration within and between on-premises data entities • Data Integration Gateway − Enables data integration between internal and external data entities • External Data Integration − Enables data integration between internal and external data entitles − This includes between on-premises and cloud March 22, 2021 26
  27. 27. Data Integration Components March 22, 2021 27 Cloud Data Store (Lake, Warehouse) SaaS Application and Data Store On Premises Data Application and Data Store On Premises Data Warehouse Cloud Reporting and Analysis Application On Premises Reporting and Analysis Application On Premises Data Application and Data Store On Premises Data Application and Data Store SaaS Application and Data Store SaaS Application and Data Store SaaS Application and Data Store IaaS Hosted Application and Data Store External Collaborating Party On Premises Data Integration Data Integration Gateway External DMZ External Data Integration
  28. 28. Data Integration Platform March 22, 2021 28 Data Integration Logically Extends Across The Entire Data Span Data Integration Plugboard
  29. 29. Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture – Options • Options − Implement full data integration architecture − Implement a logical meta integration architecture combining multiple tools and technologies − Implement multiple separate (technology or application specific) integration platform, with or without overall management • Irrespective of the approach, creating and maintaining an inventory of data integrations in an essential activity March 22, 2021 29
  30. 30. Data Integration Mediation/Wrapper/Meta Tool • Rather than seek to have one big data integration solution, consider the option of using multiple tools that are (logically) integrated into a common integration architecture March 22, 2021 30 Individual Data Integration Tools/Applications Meta Data Integration Platform
  31. 31. Tool Or Meta Tool • Meta data integration tool approach can increase complexity without increasing flexibility or reducing cost • Overhead of managing multiple individual integration tools and integrating these with meta tool can be complex March 22, 2021 31
  32. 32. Core And Extended Dimensions Of Data Integration March 22, 2021 32 Data Sources and Data Ingestion, Data Ingestion Rules Data Targets and Data Mapping/ Transfer, Data Integration Rules Data Transport Technologies Data Transformations and Data Processing Rules Data Structures, Formats and Types Security and Access Control Speed, Volume, Throughput, Capacity, Scalability Development, Validation, Deployment and Maintenance Monitoring, Administration and Management Logging, Analysis, Reporting, Event and Alert Management Scheduling and Triggering Interim Data Storage/ Data Staging Capacity Management Availability and Continuity Management Platform Architecture Management Operations Management Governance and Knowledge Management, Data Semantics Service Level Management
  33. 33. Dimensions Of Data Integration • Three dimensions of data integration − Core – operational components – the core functionality of the data integration platform • Data Sources and Data Ingestion, Data Ingestion Rules • Data Targets and Data Mapping/Transfer, Data Integration Rules • Data Transport Technologies • Interim Data Storage/Data Staging • Data Structures, Formats and Types • Data Transformations and Data Processing Rules − Platform – management aspects – the operational elements of the data integration platform • Speed, Volume, Throughput, Capacity, Scalability • Security and Access Control • Development, Validation, Deployment and Maintenance • Monitoring, Administration and Management • Scheduling and Triggering • Logging, Analysis, Reporting, Event and Alert Management − Service – key supporting processes and enabling components – that need to be part of any usable data integration platform • Service Level Management • Capacity Management • Availability and Continuity Management • Platform Architecture Management • Governance and Knowledge Management, Data Semantics • Operations Management March 22, 2021 33
  34. 34. Data Integration Core Operational Characteristics • Data Sources and Data Ingestion, Data Ingestion Rules – the sources of data for data integration and the rules and technologies for processing • Data Targets and Data Mapping/Transfer, Data Integration Rules – the targets of data for data integration and the rules and technologies for processing • Data Transport Technologies – support for the range of data integration technologies • Interim Data Storage/Data Staging – provision of a data staging area for asynchronous data retrieval • Data Structures, Formats and Types – support for a range of input and output data formats and types and the ability to convert from one to another • Data Transformations and Data Processing Rules – facility for transforming source data March 22, 2021 34
  35. 35. Data Integration Platform Management Characteristics • Speed, Volume, Throughput, Capacity, Scalability – ability of the platform to handle the volume of data integration activity within agreed times • Security and Access Control – provision of facilities to authenticate and authorise data access requests and to interact with data source security layer • Development, Validation, Deployment and Maintenance – capability to develop, test, deploy and manage new data integrations and changes to existing data integrations • Monitoring, Administration and Management – facilities to monitor the operation of the data integration platform and manage and administer it • Scheduling and Triggering – capacity to manage data integration schedules and events that trigger integrations • Logging, Analysis, Reporting, Event and Alert Management -provision of event and activity logging, the ability to define and receive alerts and the ability to report on and analyse event data March 22, 2021 35
  36. 36. Data Integration Platform Service Characteristics • Service Level Management – ensuring that the platform complies with agreed data integration performance and throughput service levels • Capacity Management – monitoring the resources used by the integration platform and ensuring that the platform has sufficient resources • Availability and Continuity Management – guaranteeing that the platform meets availability needs and ensuring its continuity of operations • Platform Architecture Management – managing the overall platform architecture, its upgrades, the additional of new facilities and the support for new integration technologies • Governance and Knowledge Management, Data Semantics – managing knowledge about data integration and providing information about data read from sources and transferred to targets • Operations Management – managing the provision of operational support services for all aspects of the data integration platform March 22, 2021 36
  37. 37. Logical Unified Data Integration Architecture March 22, 2021 37 Dashboard/ Analytics/ Reporting Deployed Data Integrations Operational Process Usage Log Scheduler, Rules Engine Operational Data Integrations Integration Design and Development, Version Management and Control Integration Templates and Template Library Integration Publication/ Deployment External Data Sources and Targets Internal Data Sources and Targets Integration Component /Product /Tool Library Deployed Integration Operation Alerting/ Event Management Management and Administration Interface Internal Access Layer External Access Layer Data Knowledge Store Security Interim Data Store External to Internal Translation Data Integration Execution Core integration Platform Data Integration Gateway
  38. 38. Logical Unified Data Integration Architecture – Components – 1/2 • Core integration Platform – this orchestrates and manages the operation of data integrations • Deployed Integration Operation – these are specific data integrations that have been developed, tested and are deployed to the Core Integration Platform • Scheduler, Rules Engine – this component manages the definition and operation integration schedules and the actioning of integrations based on triggering events • Operational Data Integrations – these are data integrations that are deployed to operation • Data Integration Execution – this is the component of the Core Integration Platform that executes data integrations • Data Integration Gateway – gateway components provide communications channels to external data sources and targets • External Access Layer/Connectors – this allows external data sources and targets connect to the Core Integration Platform • Internal Access Layer /Connectors – this allows internal data sources and targets connect to the Core Integration Platform • Security – this provide support for source and target authorisation and authentication and integration with their security layers • Internal Data Sources and Targets – these are the data sources and targets that are local to the platform • External Data Targets and Targets – these are the data sources and targets that are remote from the platform • External to Internal Translation – this is intended to represent a facility that translates external requests to internal addresses to provide an additional level of security March 22, 2021 38
  39. 39. Logical Unified Data Integration Architecture – Components – 2/2 • Data Knowledge Store – this stores information about data being integrated with to enable its retrieval by subject and content • Interim Data Store – this is a staging area for data being stored between transfer from source to target • Operational Process Usage Log – this contains a log of integration usage and activities • Alerting/Event Management – this allows for the definition, maintenance and handling events and alerts • Dashboard/Analytics/Reporting – this provide a facilities to report on platform activity and usage • Management and Administration Interface – this allows the platform to be managed and administered • Deployed Data Integrations – this represents the set of active deployed integrations • Integration Design and Development, Version Management and Control – this enables data integrations to be developed, tested, deployed to production and subsequently updated • Integration Templates and Template Library – this contains a library of data integration templates that can be used and reused during development • Integration Component /Product/Tool Library – this represents a library of integration technology tools that can be incorporated into and used in integration run times • Integration Publication/ Deployment – this supports the process for deploying data integrations into production March 22, 2021 39
  40. 40. Generalised Data Integration Approach • Every data integration consists of a minimum of two (logical) components 1. A source extract/provision half 2. A target delivery half • The source must make the data available in some form and either allow (enable PULL) or initiate (PUSH) the data movement to the target • The target then receives (PUSH) or retrieves (PULL) the data • Direct source to target data integration involves individual point-to- point connections, bypassing any data integration hub • There may be an interim transformation stage where the format and content of the provided data is changed to suit the needs of target • Some Source/Target PUSH/PULL combinations imply the need for a staging area where extracted/provided data from the source resides before being passed to the target − Asynchronous data integration • Classification can be extended by allowing for multiple sources and targets March 22, 2021 40 Source PUSH PULL Target PUSH PULL
  41. 41. Logical Data Integration Scenarios March 22, 2021 41 Data Source Data Source Data Source Data Source Data Target Data Source Source PULL Target PUSH Data Source Data Target Source PUSH Target PUSH Source PULL Target PULL Source PUSH Target PULL Source PUSH Target PUSH INCOMING HALF OUTGOING HALF Data Target Source PUSH Target PULL Data Target Source PUSH Target PUSH Data Target Data Integration Hub
  42. 42. Integration Combinations • There are many different integration modes/patterns depending on factors such as: − Number of sources for a single integration − Number of targets for a single integration − Push or pull by source and target − Initiator of the integration – source, target or hub • Single Source, Single Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull • Multiple Source, Single Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull • Single Source, Multiple Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull • Multiple Source, Multiple Target − Source Push Target Push − Source Push Target Pull − Source Pull Target Push − Source Pull Target Pull March 22, 2021 42
  43. 43. Single Source PUSH Single Target PUSH • Single data source pushes data to integration hub • Hub pushes data to target March 22, 2021 43 Data Source Data Target Source PUSH Target PUSH
  44. 44. Single Source PUSH Single Target PULL March 22, 2021 44 • Single data source pushes data to integration hub • Hub allows the target to pull data Data Source Data Target Source PUSH Target PULL
  45. 45. Single Source PULL Single Target PUSH March 22, 2021 45 • Data pulled from single data source • Hub pushes data to target Data Source Data Target Source PULL Target PUSH
  46. 46. Single Source PULL Single Target PULL March 22, 2021 46 • Data pulled from single data source • Hub allows the target to pull data Data Source Data Target Source PULL Target PULL
  47. 47. Multiple Source PUSH Single Target PUSH March 22, 2021 47 Data Source Data Target Multiple Source PUSH Target PUSH Data Source Data Source • Multiple data sources push data to integration hub where it is aggregated • Hub pushes data to target
  48. 48. Multiple Source PUSH Single Target PULL March 22, 2021 48 Data Source Data Target Multiple Source PUSH Target PULL Data Source Data Source • Data pushed from multiple data sources and aggregated • Hub allows the target to pull data
  49. 49. Multiple Source PULL Single Target PUSH March 22, 2021 49 Data Source Data Target Multiple Source PULL Target PUSH Data Source Data Source • Data pulled from multiple data sources and aggregated • Hub pushes data to target
  50. 50. Multiple Source PULL Single Target PULL March 22, 2021 50 Data Source Data Target Multiple Source PULL Target PULL Data Source Data Source • Data pulled from multiple data sources and aggregated • Hub pushes data to multiple targets
  51. 51. Single Source PUSH Multiple Target PUSH March 22, 2021 51 Data Source Data Target Source PUSH Multiple Target PUSH Data Target Data Target • Single data source pushes data to integration hub • Hub allows the target to pull data
  52. 52. Single Source PUSH Multiple Target PULL March 22, 2021 52 Data Source Data Target Source PUSH Multiple Target PULL Data Target Data Target • Single data source pushes data to integration hub • Hub allows multiple targets to pull data
  53. 53. Single Source PULL Multiple Target PUSH March 22, 2021 53 Data Source Data Target Source PULL Multiple Target PUSH Data Target Data Target • Data pulled from single data source • Hub pushes data to multiple targets
  54. 54. Single Source PULL Multiple Target PULL March 22, 2021 54 Data Source Data Target Source PULL Multiple Target PULL Data Target Data Target • Data pulled from single data source • Hub allows multiple targets to pull data
  55. 55. Multiple Source PUSH Multiple Target PUSH March 22, 2021 55 Data Source Data Target Multiple Source PUSH Multiple Target PUSH Data Target Data Target • Multiple data sources pushes data to integration hub and aggregated • Hub allows multiple targets to pull aggregated data Data Source Data Source
  56. 56. Multiple Source PUSH Multiple Target PULL March 22, 2021 56 Data Source Data Target Multiple Source PUSH Multiple Target PULL Data Target Data Target • Multiple data sources pushes data to integration hub and aggregated • Hub pushes aggregated data to multiple targets Data Source Data Source
  57. 57. Multiple Source PULL Multiple Target PUSH March 22, 2021 57 Data Source Data Target Multiple Source PULL Multiple Target PUSH Data Target Data Target • Data pulled from multiple data sources and aggregated • Hub pushes aggregated data to multiple targets Data Source Data Source
  58. 58. Multiple Source PULL Multiple Target PULL March 22, 2021 58 Data Source Data Target Multiple Source PULL Multiple Target PULL Data Target Data Target • Data pulled from multiple data sources and aggregated • Hub allows multiple targets to pull aggregated data Data Source Data Source
  59. 59. Data Integration Initiation And Notification • For source PULL/target PUSH integrations, the integration hub is always in direct control and can synchronise the two halves of the integration – its can initiate the data PULL and then PUSH the resulting data • For other combinations, the hub has less control of synchronisation − Source PUSH/Target PUSH – integration hub can PUSH the data to the target after it has been PUSHed by the source − Source PULL/Target PULL – integration hub can PULL the data from the source when the target requests it − Source PUSH/Target PULL – integration hub must wait for source to PUSH data before it can respond to PULL request from target March 22, 2021 59 Source PUSH PULL Target PUSH PULL = Fully Synchronised = Partially Synchronised = Unsynchronised
  60. 60. Synchronous And Asynchronous Data Integration • Synchronous integration occurs where the hub initiates both the PULLing of source data and the PUSHing of transmitted data • Asynchronous integration is where the source supply and the target provision of data do not occur in sequence or where the triggering of the source supply or target provision events are not controlled • This includes subscription-type integration where the data is retained by the hub and retrieved by subscribers March 22, 2021 60 Data Source Data Target Source PULL Target PUSH
  61. 61. Data Integration Hub Data Retention • How long should the integration hub retain data? • The integration hub should not become one more organisation data store where data is retained forever • Target PULL integrations are the potential source of accumulated retained undelivered data • The integration hub needs to include a facility to purge unretrieved data and/or the data retention interval needs to be specified as a data integration attribute • Where a target makes a PULL request for data no longer available, the integration hub needs to handle this. March 22, 2021 61
  62. 62. Data Integration Initiation – Source PULL/Target PUSH March 22, 2021 62 Data Target Data Source Data Target Hub Requests Data from Source and Send it To The Target
  63. 63. Data Integration Initiation – Source PUSH/Target PUSH March 22, 2021 63 Data Source Data Target Hub Receives Data from Source Data Target Data Target Hub Pushes Data to Target
  64. 64. Data Integration Initiation – Source PULL/Target PULL March 22, 2021 64 Data Target Data Target Target Requests Data Data Source Data Target Hub Pulls Data From Source Data Target Data Target Hub Responds to Pull Request From Target
  65. 65. Data Integration Initiation – Source PUSH/Target PULL March 22, 2021 65 Data Target Data Target Target Requests Data Hub Responds Data Is Not Available Data Source Data Target Source Pushes Data to Hub Hub Receives Data from Source Data Target Data Target Hub Notifies Target Data is Available Data Target Data Target Target Requests Data Hub Responds to Pull Request From Target
  66. 66. Data Integration Security • Data integration security arises in fours areas − Source • PUSH – source may need to authenticate with the integration hub • PULL – integration hub may need to authenticate with data source − Target • PUSH – integration hub may need to authenticate with data target • PULL – target may need to authenticate with the integration hub • Integration hub needs to support a range of authentication and authorisation protocols • Integration hub also needs to support security operations and administration March 22, 2021 66
  67. 67. Data Integration Security – Source PUSH March 22, 2021 67 Data Source Data Target Hub Authenticates Source and Transmits Authorisation and Access Details Data Source Data Target Data Source Data Target Source Authenticates With Hub, Identifying Integration Name Source PUSHes data
  68. 68. Data Integration Security – Source PULL March 22, 2021 68 Data Source Data Target Source Authenticates Source and Transmits Authorisation and Access Details Data Source Data Target Data Source Data Target Hub Authenticates With Source, Identifying Integration Name Hub PULLs data
  69. 69. Data Integration Security – Target PUSH March 22, 2021 69 Data Target Data Target Data Target Data Target Data Target Data Target Target Authenticates Source and Transmits Authorisation and Access Details Hub Authenticates With Target, Identifying Integration Name Hub PUSHes data
  70. 70. Data Integration Security – Target PULL March 22, 2021 70 Data Target Data Target Data Target Data Target Data Target Data Target Hub Authenticates Target and Transmits Authorisation and Access Details Target Authenticates With Hub, Identifying Integration Name Target PULLs data
  71. 71. Data Integration Metadata • Data that provides information about the data integration that enables the integration to be defined, implemented, operated, managed and monitored • Classifications of metadata types March 22, 2021 71 Types of Integration Metadata Descriptive Information about the data integration Business What the data is, its sources, targets, meaning and relationships with other data Structural How the data integration is organised, operated and how versions are maintained? Administrative/ Process How the data integration should be managed and administered through its lifecycle stages and who can perform what operations on the metadata Statistical Information on actual data integration options, usage and other volumetrics Reference Sets of values for structured metadata fields
  72. 72. Attributes Of A Data Integration • Each data integration has a number of attributes or sets of metadata that defines its operation and use in detail • This information is needed to define and operate the integration • The information must be collected, stored, made available and maintained in a metadata store March 22, 2021 72 Attribute Description Identifier Defines a unique integration identifier Related Integrations Lists related integrations and identifies the nature of the relationships, including any dependencies Source(s) Defines the source systems or locations where the source data will be obtained from Target(s) Defines the target systems or locations to which the data will be delivered or made available Push/Pull from Source Identifies if the data is pulled or pushed from the source Push/Pull from Target Identifies if the data is pulled or pushed to the target Source Data Format Defines the format of the source data Target Data Format Defines the format of the target data Source Protocol Defines the interface protocol used to obtain the source data and any protocol-specific information Target Protocol Defines the interface protocol used to deliver the target data and any protocol-specific information Validation Lists any validations to be performed on the source data, defining where they are blocking or non- blocking and any exception processing to be performed Transformation Defines any transformation to be performed on the source data including transformation steps and any splits or aggregations performed Data Size Contains an estimate of the size of the source and (transformed) target data Trigger Defines the event(s) that triggers the integration, if relevant Frequency Defines the expected frequency of the data integration, if relevant Data Retention Defines how long the data should be retained between source and target Monitoring and Alerting Lists how the integration will be monitored and how alerts will be generated based on events Source Access Security Defines any security associated with accessing the data source Target Access Security Defines any security associated with accessing the data target Audit Log Identifies where audit information relating to the operation and use of the integration ate stored Restart After Failure Lists detail on how the integration should be recovered and restarted after failure Data Sensitivity Lists the sensitivity of the data being handled by the integration Ownership Identifies the business and technical owners of the integration Priority Defines any priority assigned to the integration Supporting Documentation Identifies where documentation relating to the integration is available User Interface to View/Maintain Transferred Data Identifies the user interface that is available to view and maintain the transferred data Version Details on the current integration version and any previous versions Active/Inactive Flag Indicates if the integration is active or inactive
  73. 73. Data Integration Specification • Data integration can be logically specified as follows {Integration{Name, Attributes} Sources {Source1,TechnologyType,Direction,Attributes} {Source2,TechnologyType,Direction,Attributes} {…} } {Transformation {Name, Attributes} Steps {Step1,<Processing>} {Step2,<Processing>} […] } Targets {Target1,TechnologyType,Direction,Attributes} {Target2,TechnologyType,Direction,Attributes} {…} } March 22, 2021 73 Set of data sources, the mechanisms by which data is transferred, the transfer direction (PUSH/PULL) and the extended integration attributes The transformation performed on the source data to create the data sent to or made available to the target Set of data targets, the mechanisms by which data is transferred, the transfer direction (PUSH/PULL) and the extended integration attributes Overall integration identifier and attributes
  74. 74. Data Integration Specification • Attributes can be defined at the overall data integration level or at the individual data source and target definition level • Technology type could be one of: − FT – transfer a file using a file transfer protocol − API – information is requested using an API made available by the application − MSG – information is exchanged using a message queueing protocol − ETL – data is exchanged using an ETL process − HTTP – data is exchanged using HTTP GET/PUT • This describes a common approach to defining data integrations March 22, 2021 74
  75. 75. Data Integration Transformation Specification • Set of data processing activities, requiring on or more inputs and performed in structured interim contingent outcome- dependent order or sequence to generate one or more outputs and cause one or more outcomes • Transformation is the self-contained unit that completes a given task • Transformation can consist of sub-processes and/or activities • Transformation and its constituent activities, stages and steps can be decomposed into a number of levels of detail, down to the individual atomic level • Transformation is primarily concerned with its outcomes and outputs March 22, 2021 75
  76. 76. Data Integration Transformation March 22, 2021 76 • Transformation can be represented at different levels of detail Transformation Trigger(s) Required Input(s) Output(s) Outcome(s)
  77. 77. Data Integration Transformation March 22, 2021 77 • Activities within transformation can be linked by routers that direct flow and maintain order based on the values of output(s) and the status of outcome(s) Data Processing Trigger(s) Required Input(s) Output(s) Outcome(s) Router Data Processing Trigger(s) Required Input(s) Output(s) Outcome(s) Data Processing Trigger(s) Required Input(s) Output(s) Outcome(s)
  78. 78. Standardised Deployed Operational Data Integrations March 22, 2021 78 Dashboard/ Analytics/ Reporting Deployed Data Integrations Operational Process Usage Log Scheduler, Rules Engine Operational Data Integrations Integration Design and Development, Version Management and Control Integration Templates and Template Library Integration Publication/ Deployment External Data Sources and Targets Internal Data Sources and Targets Integration Component /Product /Tool Library Deployed Integration Operation Alerting/ Event Management Management and Administration Interface Internal Access Layer External Access Layer Data Knowledge Store Security Interim Data Store External to Internal Translation Data Integration Execution Core integration Platform Data Integration Gateway
  79. 79. Next Steps • Understand the Scope of the Current Data Integration State − Create an inventory of data integration technologies − Create an inventory of existing data integrations • Create a Future State Data Integration Architecture − Create a data integration reference architecture − Translate reference architecture into an implementation design − Map implementation design to integration technologies and products − Map existing integrations to implementation design March 22, 2021 79
  80. 80. More Information Alan McSweeney http://ie.linkedin.com/in/alanmcsweeney https://www.amazon.com/dp/1797567616 22 March 2021 80

×