From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Bhawani prasad data integration-ppt
1. 1
Data Integration Strategy
April 10, 2013 – BHAWANI NANDAN PRASAD
– BI Practice Head SMP
– IIM Calcutta, MBA – Stratford University USA, B.E. (IT)
2. 2
2
IIF Moment
Integration Scope and Frame
Integration Strategy Agenda
Business Requirements
Appendix – Lessons Learned
Best Practices and Industry Research
Recommendation -
Integration Strategy and Architecture
Integration Strategy Decisions
Integration Technology Comparison
4. 4
4
IN THE FRAME
• Define IT enablement needs for the key Business Process Areas that are not
met by Advanced Technical Applications. e.g., L&D Support Systems,
Portals to facilitate workflow, Management Reporting System(s).
• Define standards for Data Definition, Information flow. Design architecture
data and repositories to enable Reporting and data sharing across
applications, including the “Given” applications.
• Define standards for easy integration of new applications (design for
growth).
• Support Advanced Technical Apps team to meet ESAS and other standards
and to ensure data integration.
• Validate the infrastructure needs of applications against the PCIS
environment. Initiate change if necessary.
• Phase 4 planning – Proof of Concepts e.g. implement database structures,
BI tools (internal pilots).
• Application, and Information Governance design.
4
R00#1 Frame and Data Integration Charter
“Robust”
Messaging,
Data transformation,
Federation
architecture
Data warehousing,
archive
Enterprise data layer
Business
integration/analytics
Bi-directional
transfer of data
Shows the R00#1 Match-
play selection
6. 6
Conceptual Architecture –
Integration Focus area
Presentation
Refining Optimization
Center
Business
Analytics
(Dimensional
Models)
Integration
Services
Physical Integration
Project DB
Extract, Transform &
Load
Virtual Integration
Information IntegrationInformation Integration
Services
Process &
Workflow
Process &
Workflow
Business
Services Scheduling
Simulations
Planning
Shipping Maintenance
Blending Historian
Production
Operations
Information
Sources
Data Integration
Application
Data
Document
Repositories
Data
Sources
DataGovernance
Master Data
Top 10
• Material
• Production
• Warf
• TankDCS &
instruments
OE/Reliability
Test Results
Field Reliability Center
Key Focus Area Related Area
7. 7
7
Solution Integration Architecture and Information Architecture
• Solution Integration Architecture focus areas:
– Process Integration: Workflow, Orchestration, ESB
– Business Performance Mgmt: Business Activity Monitoring
– Collaboration: Communications
Information Architecture focus areas:
– Data Integration: EAI, ETL, EII, CDC, Replication
– Data Management: Repository for Core Data standardization; Data Governance;
Master Data Management; Taxonomy
– Data Standardization: Chevron/Industry standard model Definition, Semantic model,
– Business Intelligence (BI): Reporting; Dashboard
– Unstructured Data Management: Semantic Web; Ontology
9. 9
9
Console Operator’s Data View
Console
Operator
HMI
360 view
ROC
Oil Movement
Systems
Optimization
(V-mesa, GDOT,
DMCplus)
Early Event
Detection (EED)
Mass Balance
Alarm System
(CAMMS)
Procedure Mgmt
System (ExaPilot)
Root Cause Analysis
Plant Resource
Mgmt System (PRM)
Business Event Log
Global Notification
Communication
Alert Mgmt System
Document Mgmt
Knowledge Mgmt
Learning Mgmt
Console Scheduling
Instruction
Enterprise Asset
Management
(Maximo)
Erroneous Data Log
IT Asset Mgmt (ITSM)
KPI Tracking &
Dashboard
Lab System
(STARLIMS)
Optimization Log
Scheduling Systems
Lynx
SIMTO/MB
O/RVT
Simulator
Shift Turnover
Structure Round
Safety Instrumented
System (SIS)
Legend
IT scope
AT scope
Existing
LP System Flying
Petro
Loss Prevention &
HES Systems
10. 10
10
System Data Flow
(Partial example)
Legend
IT scope
AT scope
Existing
Based on BPR Business Requirements and IA Assessment details, every systems and data input/output are depicted in
diagrams as below and vetted with SMEs to assure Business Requirements are captured correctly in relation to data. This
then serves as basis for our RICEF (Report, Interface, Conversion, Extension & Forms ) detail list.
11. 11
11
RICEF List Summary
Reports Web Reports
Dashboard
Conversions Extensions
(Forms)
6 >100 10 3
Shift Turnover Report General KPIs Document Management System
(DMS)
Maximo
Console Scheduling
Instructions
Blending KPIs Knowledge Management System
(KMS)
Lab
Training Level Assessment EED & Alarms KPIs Turnover documents into Shift
Turnover System
Knowledge Management
System
Equipment Maximo List Reliability KPIs Plant Procedures into ExaPilot
Lab Stream Reports SIS KPIs Console Scheduling Instructions
(CSI)
Energy KPIs Learning Management System
(LMS)
Optimization KPIs Plant economics information to
Area of Optimization Petro
Process Control KPIs (?) P&IDS, drawings
Others ITSM conversion
Facilities Phone list to Refinery
Phone directory
RICEF is an acronym for Reports, Interfaces, Conversions, Extensions, and
Forms, all of which are basis for Data Integration.
12. RICEF List Summary
12
Example Total of 36 “system to system” data interfaces in IT Scope
25 interfaces with bulk data on scheduled basis
7 interfaces on demand with low volume of data
4 interfaces with real-time streaming data
6+ of the 36 have process & workflow requirements
13. RICEF List Interfaces in IT Scope
Bulk Data (large volumes) on a Schedule Basis (25)
Scheduling (SIMTO/MBO) to Console Scheduling
Instructions (CSI) *
Scheduling (SIMTO/MBO) to RVT
Erroneous Data Log (EDL) to RVT (Lynx)*
Structure Rounds to Shift Turnover
Event Log to Shift Turnover
Maximo to Shift Turnover
Shift Turnover to Knowledge Management System (KMS)
**
Optimization Log to KMS **
Document Management System (DMS) to KMS **
PI – Health Environmental and Safety (HES)
(continued )
PI - Accounting systems
PI-Lab
Lab to DMS
Learning Management System (LMS) to DMS
LMS to People Data
IT Asset Management System (ITSM) to Simulator
Scheduler
CSI/SIMTO - RVT
SIMTO to Area of Optimization
Oil Movement System to Wharf/Pipeline Scheduling (2)
Maximo / Lab / Reliability / PI to KPI (4+)
13
* Indicates possible workflow
* Indicates Unstructured data
14. RICEF List Interfaces in IT Scope
14
On Demand & Lower Volume Data (7)
Maximo to Maintenance Schedule
Maximo to Maintenance Request (approval/denial) *
Learning Management System (LMS) to Simulation Systems
LMS to Simulator Scheduling System
Document Management System (DMS) to Simulator Scheduling
System
DMS to IT Asset Management System (ITSM)
LMS to ITSM
* Indicates possible workflow
Real-time and streaming data (4)
Oil Movement Systems (OMS) to Alert System
Lab to Alert System
Task System to Alert System
Alert System to Global Notification System
Process Interfaces – Workflow (6)
Shift Turnover- Knowledge Management System (KMS)
(approval) *
Maximo to Maintenance Request (approval/denial) *
Document Management System (DMS) to KMS (Approval)
*
Optimization Log to KMS * (approval)
Scheduling (SIMTO/MBO) to Console Scheduling
Instructions (CSI) *
Erroneous Data Log (EDL) to RVT (Lynx) *
Maximo to Maintenance Request (approval/denial) *
15. RICEF List Interfaces in PCIS Scope
15
PCIS Scope (11)
PI to Mass Balance
PRM to Alarms System
DCS to Alarm System
PRM to Maximo
Lab to-Mass Balance
DCS to APC
DCS to Exapilot
DCS-OMS
DCS-PI
PI-APC
Scheduling (SIMTO/MBO) - OMS
16. 16
16
Integration Landscape
Oil Movement System
Early Event
Detection
Mass Balance
Alarm System
Procedure Mgmt
System (Exa-Pilot)
Plant Resource
Mgmt System
Business Event Log
Alert Mgmt
System
Knowledge Mgmt
System
Learning Mgmt
System
Console Scheduling
Instruction
Enterprise Asset
Management
Erroneous Data Log
IT Asset Mgmt (ITSM)
KPI Tracking &
Dashboard
Lab Systems
(Starlims)
Optimization Log
Shift Turnover
Structure Round
PI (ODS with History)
DCS
Simulator
Scheduling
Outlook
People
Data
Safety Instrumented
System
Real Time
Scheduled
On Request
Scheduling RVT
Legend
IT scope
AT scope
Existing
Level 3.5
P
C
I
S
S
c
o
p
e
I
T
S
c
o
p
e
Scheduling Systems
SIMTO/MBO
Simulation
Communication
Global
Notification
Document Mgmt
System
Flying Petro
Optimization
18. 18
The Integration Stack & Products
Low
High
High
Integration Stack Vendor Capability
BusinessCapabilityofadoptingIntegration
File Transfer
ETL
Data
Transfer File Transfer
ETL
Ad-Hoc Interfaces
Point-to-point
Interfaces ETL
Adapters
EAI
WS*-Communication
ESB
Orchestration Engine
BAM
Service Registry
BPM & SOA
WS*-Communication
ESB
Orchestration Engine
BAM
Service Registry
Web 2.0
Composite
Solutions
WS*-Communication
ESB
Orchestration Engine
BAM
Service Registry
Web 2.0
EDA
Integration 2.0
• Innovative integration techniques
• Complex & flexible Technology
• Mature integration techniques
• Proven & Robust Technology
Integration technology today
moving data...
…to synchronize and
rationalize data for systems...
…leveraging functionality
in applications…
…to create new processes and services
to support business needs
…predicting future business
while sharing business
capabilities with partners...
…anywhere, anytime, and
through any standard means
More Less
Centralized
Integration
18
19. 19
19
Key Integration Components
Custom code remains a popular option
Key Data Integration Methods
EII
(Enterprise
Information
Integration)
CDC
(Change
Data
Capture)
EAI
(Enterprise
Application
Integration)
Data
Replication
ETL
(Extract,
Transform,
Load)
SOA (Service Oriented Architecture) Framework
Key Process Integration Methods
Workflow EAI
Orchestration +ESB
ESB (Enterprise
Service Bus)
Web Services
EII
(Enterprise
Information
Integration)
EAI
(Enterprise
Application
Integration)
ETL
(Extract,
Transform,
Load)
CDC
(Change
Data
Capture)
Data
Replication
21. 21
21
Enterprise Information Integration (EII)
Data Virtualization
As a project-oriented DI middleware, data virtualization is often referred to as virtual data federation,
high-performance query or EII. As enterprise architecture, it is frequently described as a virtualized
data layer, an information grid, an information fabric or as data services in SOA environments.
EII is a middle tier query server:
contains a metadata layer with
consolidated business
definitions.
Communicates through web
services, database connections,
or XML;
Listener waits for a request –
sends whatever queries are
needed across whatever data
sources are required to return
data to the requestor;
Metadata robustness is the
differentiator.
Federated data stores produce
accessibility to enterprise data
without forcing central control.
22. 22
22
Technology Overview:
EII (Enterprise Information Integration)
EII tools
Create virtual views of distributed enterprise data through queries executed in real time
without physically moving or copying data
* Also known as data federation; virtual data warehousing; data virtualization
Benefits
Latency: Through federated queries, information can be accessed within milliseconds
Storage: Data is not moved or copied from source systems, so additional storage is not
required
Drawbacks
Volume: Should only be used for small targeted data sets
Quality: Minimal transformation capabilities — efforts to include will negatively impact
latency
While never making a material impact as a pure-play market, data
federation is an important part of the data integration platform, but need
to watch out for high level integration and maintenance effort
23. 23
23
Enterprise Application Integration (EAI)
•EAI is focused on moving data between Enterprise Applications with business logic applied. It picks up an
application transaction and initiates a transaction in another system, for example CRM system picks up a new
order and enters it into your Financial Application.
Driven by business events
Connectivity between applications
Information consistency a key requirement
Bus/hub with application adaptors
Wire-level messaging protocols
24. 24
24
Technology Overview:
EAI (Enterprise Application Integration)
EAI Tools
These products, which started out as rudimentary software that supported basic messaging,
routing, and data transformation needs, have grown into more sophisticated tools that now
also provide full support for SOA as well as electronic data interchange (EDI).
Benefits
Latency: Through message based orchestration, information can be transferred within
seconds to service real-time data integration
Event based: Data transfer can be triggered by event
Drawbacks
Proprietary: Traditional EAI vendors used proprietary protocols
Quality: Data validation can be performed, however doing it with match &
merge multiple source systems is not the strength of this toolset
EAI toolset can be brought in as a middleware framework to support SOA,
however with insufficient in-house experience, a POC is recommended.
25. 25
25
Extract Transform Load (ETL)
As a data integration hub, ETL products connect to a broader array of databases, systems, and
applications as well as other integration hubs. ETL batch architecture is generally split into 4 major
components: Extract, Clean, Transform and Load.
Provide expanded functionality, especially in
the areas of data quality, transformation, and
administration
Coordinate and exchange meta data among
heterogeneous systems to deliver a highly
integrated environment that is easy to use and
adapts well to change
Capture and process data in batch or near-
real time using a standardized information
delivery architecture
Provide greater performance, throughput, and
scalability to process larger volumes of data at
higher speeds
Load data more quickly and reliably, by adding
change data capture techniques, continuous
processing, and improved runtime operations.
26. 26
26
Technology Overview:
ETL (Extract, Transform and Load)
•ETL tools
Batch or incremental extraction of high volumes of data from one or more sources
Able to run complex transformations on the data which can include cleansing, reformatting,
standardization, aggregation, or the application of any number of business rules
Loads the resulting data set into specified target systems
Benefits
Volume: Manages extremely high volumes of data movement
Quality: Allows for complex data transformations, enabling much higher quality, hence more
usable information
Re-Use: Routines to extract/transform/load can be re-used by many applications
Drawbacks
Latency: Optimized when scheduled as batch data movement as opposed to real-time or
on-demand. By reducing the volume throughput (with CDC) ETL can be used to meet
operational near real-time requirements.
Performance: Extracts can cause performance impacts to source production systems, so
low-impact batch extraction “windows” need to be identified, or use CDC.
ETL has spread beyond data warehousing and can supports near real-time
data integration for both operational and BI applications
27. 27
Change Data Capture (CDC)
•CDC Integration Suite provides secure, high-volume, real-time, and bi-directional data integration and
transformation between applications. This product supports a wide range of databases, including those
that run on legacy, back-office, and other operational systems on different platforms.
27
Journal Log
Redo/Archive
Logs
Publisher
Engine
And Metadata
Subscriber
Engine
And Metadata
TCP/IP
GUI
Unified Admin
Point
With Monitoring
Databas
e
Audit
Database
Message
Queue
Web
Services
Business
Process
Publisher Subscriber
• Provide a pseudo to actual real-time
update capabilities
• Heterogeneous system and platform
support
• Real-time selective data capture and
delivery
• Limited data transformation
• High performance even with very high
volumes;
• Guaranteed integrity of data
transactions- 2 phase commit
28. 28
28
Technology Overview:
CDC Change Data Capture
CDC tools
The optimal approach is to capture “deltas” or changes in the source data created or
updated in operational systems as they are written to the DBMS log files and make them
immediately available in real-time to downstream applications
Benefits
Volume: Captures only changes or “deltas” since last pull from source databases,
reducing amount of data that needs to be moved
Performance: With the option to access database log files versus production database —
no performance impact to source operational systems
Latency: Can enable continuous updates throughout the day
Drawbacks
Latency: No latency issue. Due to source log reading, source system down time
could cause extra administrative task to synchronize and monitor data
Performance: Data transformation ability is more limited unlike ETL tools
CDC could be an option to combine with ETL for the enablement of near real-
time and more throughput, with less impact to source systems.
30. 30
30
Technology Overview:
ESB (Enterprise Service Bus)
•ESB tools
These technologies typically incorporate adapter technology to connect to a variety of
application and database types, ability to route transactions according to business rules and
transport transactions from source to target with low latency.
Benefits
Open: More open than EAI tools. Universal support for distributed processing
External Entities: ESB is easier to configure and implement; hence often chosen to
support B2B applications
Drawbacks
Vendor: ESB only vendors are smaller than EAI vendors
Experience: Lack of Chevron internal working and support knowledge
ESB is best used for establishing business processes (BPM) and
orchestration infrastructure that will leverage a business services layer to
support SOA across the entire enterprise. ESB federation can also be
implemented to mitigate drawbacks.
31. 31
31
• Focus on the Differences…
Differences Between EAI and ETL
EAI ETL
Focus Application Integration
Process, B2B
Data Integration
Analytic, KPI
Timing Real-Time Batch, Near-real time
Data Transactional Historical
Transformation Minimal Complex
Interfaces Predictable Evolutionary
Volume Single Message or
Transaction
Bulk (Hour, Day, Week,
etc.)
33. 33
33
Approach for Integration Strategy Decision
1. Translate BPR business process and functional requirements into system data
flow diagrams, vetted and confirmed with business SMEs and AT teams.
Categorize data integration requirements based on system data flow diagrams,
and RICEF list was created
2. Study standards and seek to understand environment
3. Leverage other Information Management initiatives and EA direction. Capture
lessons learned from others projects.
4. Conduct technology scanning and gather industry information from Gartner,
Forrester, Open O&M and vendors
5. Develop Integration strategy focused on Data Integration, that supports
Requirements, Process Integration and Data Management
6. Vet with Architects - IT AA, IA and SIA teams to get feedbacks
7. Present Integration strategy recommendation to technical review team
8. Include stakeholder and technical review board feedbacks and update
recommendation
34. 34
34
Objective and Criteria for
Data Integration Strategy Decision
Decision Objective
Must support the business’ need in delivering timely & well integrated data with consistent naming,
content and meaning; providing Console Operators a complete and concise view of trusted data.
Requirements: How well does the DI decision satisfy the requirements?
Standards: How well does the DI decision align with Chevron standards?
Reliability: Is the DI decision proven with robust technologies?
Interoperability: How well does the DI components interoperate?
Supportability: Does the DI decision match organization capability?
Total Cost of Ownership: Does this decision offer the optimum TCO?
Sustainability: Can the DI decision easily adapt to business changes?
Data Management: Does the DI decision support information management disciplines?
Decision Criteria
35. 35
35
Key Integration Alternatives
Custom code remains a popular option
Key Data Integration Methods
EII
(Enterprise
Information
Integration)
CDC
(Change
Data
Capture)
EAI
(Enterprise
Application
Integration)
Data
Replication
ETL
(Extract,
Transform,
Load)
SOA (Service Oriented Architecture) Framework
Key Process Integration Methods
Workflow EAI
Orchestration +ESB
ESB (Enterprise
Service Bus)
Web Services
EII
(Enterprise
Information
Integration)
EAI
(Enterprise
Application
Integration)
ETL
(Extract,
Transform,
Load)
CDC
(Change
Data
Capture)
Data
Replication
36. 36
Step 1 Business Requirement
36 “system to system” data interfaces in R00#1 IT Scope
25 bulk data on scheduled basis
7 on demand with low volume of data
4 real-time streaming data
37. 37
37
Component Technology Standard
Data Integration
ETL tools
Managed Choice:
1. IBM InfoSphere DataStage
2. MS SSIS
Process Integration
EAI tools
Managed Choice:
1. SAP XI (version 3)
2. BizTalk
Integration
Middleware
Managed Choice:
1. Integration Brokers (EAI tools)
2. Web Services
3. Batch file transfer
4. Direct access
5. Intermediate databases
6. Custom built
Step 2 Standard and Usage
38. 38
Step 3 Lessons learned from other projects
• Selected SOA architecture to facilitate multiple data integration points with
real time BI integration
• Realized the value of Master Data Management with their SOA
implementation
• Selected a hub and spoke architecture to facilitate multiple data integration
points with complex data translations. Selected ETL platform for data
movement for all planning and scheduling data.
• Realized the value of web services to facilitate work flow for data validation
processes within the Refineries.
• Selected a hybrid architecture to facilitate multiple data integration points
with complex data translations. Most data was required in real time to
capture trade deals.
• Selected an ETL platform for application integration with robust
transformation.
• Selected orchestration to facilitate work flow and data integration with
external parties and systems
38
39. 39
Step 4 Integration Capability Comparisons
Data Integration Technologies ETL
EAI
(Orchestration)
Bulk Data Transfer
Real Time Messaging Routing
On Demand Data Integration
Metadata Data Management
Data Transformation
Process Orchestration
Distributed Processing
Data Standardization
Human Interfaces
SOA and Web Services Integration
WorkflowEII ESB
full support partial support no support
40. Step 5 Integration Components Chosen
40
Custom code remains a popular option
Key Data Integration Methods
EII
(Enterprise
Information
Integration)
CDC
(Change
Data
Capture)
EAI
(Enterprise
Application
Integration)
Data
Replication
ETL
(Extract,
Transform,
Load)
SOA (Service Oriented Architecture) Framework
Key Process Integration Methods
Workflow
EAI
Orchestration +ESB
ESB (Enterprise
Service Bus)
Web Services
EII
(Enterprise
Information
Integration)
EAI
(Enterprise
Application
Integration)
ETL
(Extract,
Transform,
Load)
CDC
(Change
Data
Capture)
Data
Replication
Selection
41. 41
Integration Strategy Decision
• Include both Process and Data Integration as a hybrid architecture
• Process Integration includes EAI Orchestration and Workflow
• Application Integration includes EAI for real-time data integration
• Data Integration includes ETL for non real-time bulk data integration
• ETL platform can be used to add on Data Management toolset
• Exclude technologies that do not meet requirements or criteria
• Custom coding does not meet supportability & TCO criteria
• EII does not meet Info Mgmt - data standardization criteria
• CDC for near real-time data can be handled by EAI
• Replication/CDC is already used by PI (ODS), but is not extensible
• ESB is not yet in CVX standard, EAI has some ESB features
41
42. 42
42
Integration Conceptual Architecture Hybrid Technology
Data Mart
Human
Workflow
Process Integration
Orchestration (EAI)
Filter
Route
Other
Requesting
applications
Receiving
applications
XML
messages
Transform
Filter
Route
Service Calls Service wrapper
Guarantee
Data Integration
Hub (ETL)
Extract
Transform
Load
Profile
Quality
Metadata
Mgmt
MDM
Staging
Area
Data
Warehouse
ODS
(PI)
Source
Systems
OLAP
cube
HMI
Operational BI
Analytical BI
Target
Systems
43. 43
43
Data Integration Strategy
Decision Rationale
Requirements
• The combination of ETL, EAI & Workflow components satisfy the business
requirements of bulk data transfer, real-time data integration and workflow.
• ETL platform is necessary to facilitate analytical BI environment. Mature ETL
platform incorporates information management - data standardization toolset.
• EAI is required to facilitate the real-time application integration and automated
work processes defined by the BPR teams.
Standards
• Ttechnology standards and best practices include Data Integration (ETL) and
Application Integration (EAI) Toolsets.
• Select a Data Management standard for Master Data Management, Metadata
Management and Data Quality.
Reliability
• Using ETL toolset to provide bulk and scheduled data interfaces as baseline. This
technology has been proven and used by many projects.
• EAI technology is mature, EAI toolset (BizTalk).
44. 44
44
Data Integration Strategy
Decision Rationale - continued
Interoperability
• Partner orchestration of BizTalk and Share point portal.
• Standard ETL and BI toolsets with web service to prove interoperability across the platforms.
Supportability
• Continue to use PI as data transfer hub between Plant Information Network (PIN) and
Process Control Network (PCN) - Leverage what exists increases supportability. Apply PCIS
standards to utilize OPC for PI integration.
• GDST and Refinery have experiences with implementing and supporting ETL. ITC provides
services for ETL support, database support and BI support.
Total Cost of Ownership
• Using EAI and ETL standard toolset to facilitate refinery centrally managed process and data
flow would bring cost benefit in leveraging enterprise support and license costs.
• Infrastructure and resources for ETL toolset may be shared with Lynx within refineries which
can greatly reduce license and support cost.
45. 45
45
Data Integration Architecture
Decision Rational - continued
Sustainability
• EAI and workflow provides the foundation for SOA framework and adaption of newer
technology (composite software and Web 2.0 ) is feasible
• Once we gain more experiences with EAI and workflow toolset, it can be expanded to
handle more integrations to accelerate SOA.
• We will maintain ETL as a foundation to add on real-time and on-demand
components.
• SOA still maturing. Work with ITC to ensure we remain consistent with the company
direction of SOA
Data Management
• Info Mgmt disciplines provide Data Governance, Master Data Management,
Metadata Management and Data Quality improvement.
• Using data integration hub to provide standardized data layer provides a good
foundation for information management.
47. 47
47
Integration Strategy Recommendation
Business Requirements ETL EAI Workflow
Near real time and scheduled Bulk Data
Conversion and Interfaces X
Integrate data from Operational BI to
Analytical BI (load ODS and staging data
into DW/Data Mart/OLAP)
X
Real-time Integration (application to
application or Integration Hub to HMI) X
On Demand low volume of data (event
triggered data delivery) X
Human-centric Workflow with
orchestration X X
Provide services to HMI in connecting all
portals, application data, workflow data,
integration hub data and collaboration
data.
X X
Data Transformation, Meta Data
Management, Data Cleansing
X
49. 49
Next Steps
– Review Proof of Technology findings of EAI Tools
– Gather and review feedback to update the DI Strategy
recommendations:
• Internal – IT EA and AA teams
AT team, if feasible
• External - Information Architects
– Recommend Data Integration Toolsets
51. 51
Lessons learned 1
– SOA architecture to facilitate multiple data integration points with real
time BI integration instead of using an integration middleware
• However, this alternative carries a very large architectural
footprint, higher. costs, and demands for technology expertise.
– Master Data Management with their SOA implementation
• A reference data model was added to the SOA implementation
when data quality issues were surfaced due to disparate data
sources.
51
52. 52
Upstream Foundation Services vs Data Integration
SO
– Small messages on demand
– Transformations tend to be simple
BI
– Infrequent exchanges of (large)
amounts of data
– Transformations complex
– Increasing drive for Real Time DW
SOBI
– Leverages the strengths at the
extremes
– Exploits the middle ground
Messages vs. Data
SO BI
Fine Grain
Services / Real-
time events
Medium Grain
Services
Coarse Grain
Import / Export /
ETL
53. 53
SOBI Summary
Service Orientation (SO) Business Intelligence (BI)
• Provides application-to-
application integration
• Well suited to events and real-
time data – high frequency
• Allows agile change in business
processes
• Supports reuse of enterprise
components
• Encapsulates and abstracts
functionality
• Tightly defined data formats
and structures
• Well suited for data-to-data
integration
• Can handle large data volumes
• Provides foundation for business
decisions
• Provides a combined model of the
enterprise data
• Good tools and mechanisms for
transforming data
• Ability to question the data and to
answer key business questions
54. 54
Solution Architecture Services Integration pattern
Business Message Standards (schemas & semantics)
Presentation
Presentation Services
(Analysis & Reporting)
Business Analytics
& Analysis Services
(Dimensional Models)
Integration
Services
Physical Integration
Project
DB
Extract, Transform &
Load
Virtual Integration
Data Integration Message Standards (schemas & semantics)
Services
Atomic & Composite
Entity Services
Proc
ess &
Workf
low
Business
Services
Production
DrillingHES
Maintenance Financial
Well
Reservoir
Surveillance
Analytics
Notification
Data
• Enterprise
• OPCO
• SBU
• Asset
Message Standards (schemas & semantics)
Applicati
on Data
Facade
Document
Repositories
Facades
Data
Sources
Facades Systems
of Record
(SoR)
Hierarchy&CrossReferenceServices
Master,
Reference
& Hierarchy
SUPER 7
• Well
• Reservoir
• Equipment
• Field
• Property
• Location
• Facility
55. 55
Lessons learned 2
– Select a hub and spoke architecture to facilitate multiple data
integration points with complex data translations. Most data required
for 1-7 day plans.
– Use ETL platform for data movement for all planning and scheduling
data.
• Several ODS tables and data warehouse structures were built in
the central hub (San Ramon) with supporting individual hubs
within each refinery
• A robust cross reference model was used for the numerous codes
and data sources to provide a consistent name and definition of
master data across the supply chain.
– Use the value of web services to facilitate work flow for data validation
processes.
• A web services front end was added to the Validation Tool that
provides updates and corrections for data to be used in the
scheduling tool (SIMTO)
55
56. 56
Conceptual Architecture
Hub and Spoke Pattern
External
Source
Systems
SRA
(Crude)
ICTS
SAP
PS DF RSPF RBS&OP
TI
(SRA)
ETL
“Full visibility” with
limited event
notification
capabilities
“Integration”
P to P
Interfaces
(Driven by SubTeams) –
(Stored Procs, ETL
or Connect Direct)
WebaccessDashboard/KPI
LynxReporting/AnalyticsArchitecture
ADHOCReporting/QueriesDrillDown,OLAP
Metadata
Reporting/AnalyticsTool
Lynx Data
Warehouses
(regional & global)
Operational Data
Stores/StagingETL
Common Data Model
Master Data
Management
Common Business Transformations
SQL Server database
ETL“Availability”
57. 57
Lessons learned 3
– Hybrid architecture
– To facilitate multiple data integration points with complex data
translations. Most data was required in real time to capture trade
deals.
– ETL platform for application integration with robust
transformation.
– Orchestration tool (Bitzttalk) facilitates work flow and data
integration with external parties and systems.
57
58. 58
Logical Architecture - Hybrid integration pattern
ServicesServices
Service Providers
Transport Providers
Pipeline
3rd Party
Leases
Extex
(Royalty Payments)
Market Data
Providers
Deal Confirm
Exchanges
Banks
Counterparty
Inspection
Terminal
Ship
Ports
Rail
Trucks
4GEN
Tax
“SOG”
“Corporate Credit”
Cashflow
Netback
“Master Data”
- EA Master Data
- EA Facilities
- SAP
SAP
SAP
Rolfe & Nolan
NAVARIK
Trading 1 Trading 2 Trading 3
Price
Credit
MDM +Xref
O
R
C
H
E
S
T
R
A
TI
O
N
O
R
C
H
E
S
T
R
A
TI
O
N
RTR
Intraday PositionSnapshot DB
ETL
BI
Document
Management
MRA
SAPXI
MPA Price
Noms
Confirms
Ship Status
Lifting
Schedules
Ratings
Deals
Statements
Royalty
Vols
Deals
Corporate
Credit
Services
SOG
Services
SAPXI
TAX
Clients
Port Activity
Credit Limit
Master Data
Credit Services
Risk
Algo
License
Mgm
R&N
Services
Risk Services
Price Services Master &
Xref Services
Unstructured
Market/CP data
Master Contracts
Lease Vols
Rates
Payments
Actual
Volumes
Inspection
Reports
Consolidated
Position
Viewer
Brokers
Rating Agency
RailTrac
CVMS/Shipnet
Clients
Confirms
Tickets
Schedules
Refineries
Exchange
Allocation
CP
Services
Banking
Services
Exchange
Cuts
News/Data
AR/AP/GL
Invoices
Credit
Exposure
Plans
Rail Car
Ship info
Ship
Schedules
Movement
Tools
Enterprise
Facilities
Credit
Engine
Valuation
Libraries
60. 60
60
Best Practices for Data Integration
1. Don’t loose sight of DI Architecture vision, however include tactical data
integration solution for specific business requirements. (phased
approach)
2. Categorize data in business value and usage. (prioritize)
3. Prioritize the sequence of implementing data integration. (sequence)
4. Document data migration and infrastructure deployment roadmap.
5. Establish new standards for naming, data types and metadata.
(governance)
6. Publish metadata definitions and glossaries of business terms.
7. Establish a coexistence strategy with legacy systems. Always have a
migration plan.
8. Establish physical reference architecture and tools.
9. Implement environments for the foundation components ahead of time.
10.Begin data migration into the integrated environment.
61. 61
61
Planning for Data Migration
•Data Migration (Conversion) from legacy system to the newly integrated
environment needs to be considered carefully by weighing highest value
vs. highest usage.
– Foundation Data Migration - Implementing the main lookup
data, or master data, for enterprise
– Core transactional data migration - Detailed transactions for the
basic enterprise events
– Application data migration – Supports specific company
functions
•This strategy leverages the building of foundational master data that
will be most often queried by end users, then adding core transactional
data that adds value and incrementally allows more business value as
data becomes richer in content.
63. 63
63
Integrating Data Content and Meaning
•Another aspect of data integration is standardizing the usage of data content
and meaning. This type of data content integration yields business efficiencies
and quality of data.
– Integration of content standardizes data values, e.g. lookup codes, across
different data bases. (For example, if PI Tag or P&ID needs to be uniquely
identified at the global level across all refineries, a newly defined unique
ID can be created and tied with existing ID.) Depending on local operation
or global data analysis, two sets of ID can be translated and delivered to
satisfy user request.
– Besides the physical data movement and storage of integrated data
bases, the common integration of data meaning needs to be
standardized. Metadata provides definitions of subject areas, tables, and
columns in a data repository.
– When all users refer to the data repository, the meaning of each data
element is standardized to a common definition.
– Additional metadata can be provided that displays calculations for
derived data elements, glossaries of business terms, and lineage of the
source of data.
64. 64
64
Data Integration Architecture considerations
Commonality, consistency and interoperability of DI components:
Minimal number of products or product suites supporting all data
deliveries
Single metadata repository and/or the ability to share metadata
across all components
Common design environment to support all deliverables
Interoperability with other integration tools and applications
Efficient support for all data deliveries regardless of runtime
architecture (centralized vs. distributed )
65. 65
65
Decision Making methodology
Top-down
•Integrate Use Case with Pattern Matching
•Using integration-pattern matching, look for matches by comparing their
specific use cases with “typically deployed” Data Integration patterns.
Examples:
• To improve Global Manufacturing-wide reliability reporting, the
appropriate integration pattern would be an enterprise data
warehouse that physically consolidates and summarizes OE data
from across all refineries.
• To provide operational DCS information to business level
applications and for operational BI, an replicated operation data
store that stores up-to-the-second transactional data would be the
best fit.
• To support upstream or downstream product movement analysis
and establish a performance, a data mart or an OLAP cube sourced
from the ODS or Data Warehouse would be the best pattern.
66. 66
66
Decision Making methodology
Bottom-up
Assessing integration factors
This is often valuable where the DI decision is complex and/or where a clear
integration pattern match is not obvious. For example, to determine whether
virtual, physical or a hybrid combination:
If data extracted from many source systems could be used by many other
systems, then physical data store is good for data reuse and future
expansion.
If significant data cleansing and complex transformation are required,
then physical data consolidation is typically the most practical choice.
If harmonized data need to be aggregated, summarized to provide for
analytical dashboard, then physical data store is needed to load into Data
Warehouse/Data Mart and/or OLAP cubes.
If source systems are mostly available as system of record, data can be
passed between systems without significant data matching, merging or
harmonizing, then virtual makes sense.
Hybrid combination may be a good choice if a project has both real-time
business process integration and large amount of data interfaces.
71. 71
71
Traditional EAI vs. ESB
Lightweight, distributed, standards-based and inexpensiveComplex, proprietary, centralized, and costly integration
Flexible and adaptive business logicLack of support for new business logic
AbstractionKnown Implementation
Message OrientedObject and Message Oriented
Loosely Coupled with coarse-grained Business ServicesTightly Coupled with use of proprietary adapters
Services OrchestrationApplication Block
Designed to changeDesigned to last
Process OrientedFunctionality Oriented
Service Oriented Architectureshub-and-spoke architecture
ESBTraditional EAI
72. 72
72
Use Cases of EAI, ETL, EAI + ETL
•EAI Software
An example - During the Internet boom, companies flocked to EAI to connect e-commerce
with back-end inventory and shipping systems to reflect product availability and delivery
times.
•ETL toolset in an ‘always awake’ mode – near real time
To deliver near-real-time capabilities. The ETL tools typically use application-
level interfaces to detect new transactions or events as soon as they are
generated by the source application. They then deliver these events to any
application that needs them either immediately (near real time), at predefined
intervals (scheduled), or when the target application asks for them (publish and
subscribe).
•EAI plus ETL
EAI tools captures data and application events in real time and passes them to
the ETL tools, which transform the data and loads it into the BI environment.
73. 73
73
73
What vendors say about ESB?
– Some stress the role of the ESB in eBusiness, its inter-organizational. Rather than intra-
organizational role
– Almost all believe, that the ESB is more than the bus it runs on. Essentially, they are
describing a service-oriented architecture from another viewpoint
– Some see orchestration as part of the ESB architecture, others do not
– Some package MOM and EAI in their ESB products
– Some identify event monitoring as the major differentiator from MOM
– Some consider services management as part of the ESB solution
– Some see an ESB as strictly related to Web services and describe it as a Web Services
Network.
All Vendors are “flexible” in defining ESB. Their definition always manages to show that their
current solutions are using it
74. 74
74
ESB, When to Consider
– When deploying SOA across the enterprise
– When establishing business processes (BPM) and orchestration
infrastructure that will leverage a business services layer
– When moving from a complex point-to-point or ‘spaghetti’ architecture
to a more manageable and flexible IT infrastructure
– When integrating to multiple and heterogeneous data sources and
applications
– When there is heavy business logic and security through the service bus
to multiple end points
– When further separation from composite applications is required (away
from underlying implementations)
– When flexible coupling is required
75. 75
75
Information (Data) Services in SOA
•For data to be a first-class citizen in the SOA world, a clear separation must exist between data
consumers and data providers. This separation mirrors the principle that service consumers and
providers must be distinct and separate in an SOA. Furthermore, this separation must be delineated by
an interface, or contract, that both providers and consumers share
76. 76
76
Gartner on SOA and Data Services
Gartner suggested that success in loosely-coupled service-oriented business applications
(SOBAs) becomes more difficult since each design point has to verify it own semantics,
context and data structures.
Key Findings
Under a loosely-coupled architecture, data stewardship and governance best practices can be
supported by data services within an SOA instead of embedding such practices within application
logic. Where people and processes were formerly embedded in application design, they now fall
under the domains of business process platforms and EIM - Enterprise Information Management.
Predictions
Based on lessons learned through data warehouse, data mart and operational data store
implementation practices, 60% of failed information-as-a-service initiatives through 2009 will list a
lack of an effective data governance strategy as one root cause of failure.
Recommendations
Organizations should begin their selection of data profiling, quality, mining and master data
management tools with the end goal of deploying all the logic and processing within these tools as
services that can interoperate and execute actions on behalf of and against data used by SOBAs, and as a
callable service by business context services.
77. 77
77
Composite solutions
• Some of the approaches promoted by the Web 2.0 movement (mash-ups, RIA - Rich Internet
Applications) are moving the Integration challenges up to the presentation layer
SAP Work Management
& Purchasing Personal
Management
Drilling Information
Collaboration
"As Is"
Business Process: 3.0 Set-up New Well
Sub process: 3.3 Set-up Well Ownership
Company: APC
Verison 1.0, Version Date 2/28/01
3.3.2
CREATE TEMP WELL
FILE AND CHECKLIST
OF STEPS TO
COMPLETE D.O.
PROCESS
(LAND CLERK)
R.O.W.L.
DRILLING
TITLE
OPINION
TITLE CURATIVES
CONTRACTS AND
LEASES FOR UNIT
PLAT (IF NEEDED)
SPACING/ POOLING
INFORMATION
3.3.5
DELIVER WELL
FILE TO D.O.
MANAGER
(LAND CLERK)
3.3.6
ASSIGN WELL FILE
TO LAND ADMIN
DIVISION ORDER
ANALYST
(D.O. MANAGER/
SUPERVISOR)
3.3.7
REVIEW WELL
FILE FOR
COMPLETENESS
(LAND ADMIN)
3.3.9
ANALYZE AREA
TO DETERMINE IF
IN A PRIORITY
MARKETING AREA
(LAND ADMIN)
PAPER PAPER
3.1.19
TRACK PARTNER
AFE RESPONSES
(LAND ADMIN)
3.5.1
PLACE DRILLING
REPORT WITH "FINAL
REPORT" STATUS ON
NETWORK DRIVE
(PROD CLERK)
3.3.1
SET-UP 100% APC
BILLING SCHEDULE
IN EXCALIBUR
(JIB)
A
B
3.3.3
SEND R.O.W.L. TO
JIB
(LANDMAN)
PAPER
3.3.8
COORDINATE WITH LANDMAN
FOR MISSING FILE INFO. (LAND
ADMIN DIVISION ORDER
ANALYST)
3.3.4
UPDATE BILLING
SCHEDULE WITH
TRUE JIB
INTEREST
(JIB)
PRE-DRILL ACTIVITIES
"As Is"
Business Process: 3.0 Set-up New Well
Sub processes: 3.1 Set-up Drilling AFE
Company: UPR
Version 1.1, Version Date 3/5/01
3.1.1
RUN WELL
ECONOMICS IN
OGRE
(RESVR ENGR)
3.1.2
TEAM MTNG TO
COMMUNICATE
NEED FOR AFE,
LEASE AND
WELL STATUS
3.1.3
SET-UP WELL
NUMBER IN
WINS
(ENGR TECH)
3.1.4
CREATE $0.00
PENDING AFE
IN WINS
(LAND SPEC)
3.1.5
COMPLETE AND
PRINT AFE (LAND
SPEC)
3.1.6
ENTER $0.00 AFE
IN EXCALIBUR
(FIN SPEC)
3.1.9
APPROVE AFE
BY COMMITTEE
MEETING
(CROSS-DEPT)
E-MAIL
E-MAIL,
PHONE or
FAX
3.1.7
NOTIFY
LANDMAN AFE IS
COMPLETE
(LAND SPEC)
PRINTED
INTERNAL AFE
3.1.10
SEND SIGNED
AFE TO
FINANCIAL SPEC
(LAND SPEC)
3.2.2
SET-UP WELL
NUMBER IN
PERC/ DIMS
(AUTO)
AUTO
3.1.8
NOTIFY
ENGINEERING
TECH AFE IS
COMPLETE
(LAND SPEC)
SIGNED
AFE
A
3.2.1
SET-UP WELL
NUMBER IN
EXCALIBUR
(AUTO)
MARKETING
PRICE
INFORMATION
G + G
FORECAST
ECONOMIC
FORECAST
WELL-UNIT
OWNERSHIP
(LANDMAN)
"To Be" for 2001
Business Process: 3.0 Set-up New Well
Sub-process: 3.3 Set-up Well Ownership
Version 1.5, Version Date 7/18/01
3.3.1
PREPARE STAKE/
PERMIT PACKAGE
IN WORD
(LANDMAN)
3.3.4
BEGIN RELEASE
OF WELL
LOCATION MEMO
(ROWL) IN WORD
(LANDMAN/LAND
EXPLORATION
SPEC)
3.3.2
ORDER
TITLE
OPINION(S)
(LANDMAN)
3.3.3
BUILD
WELL/ UNIT
FILES
(LANDMAN)
3.3.7
EVALUATE PIPELINE
CONNECTIONS TO
WELL, PRIORITY OF
MARKETING AREA
(FIELD SERV)
A
3.3.5
REVIEW JOA
CONTRACT
OWNERSHIP IN
CONTRACTS
(LANDMAN)
3.3.6
REVIEW OR
CREATE CROSS
REFERENCE OF
JOA TO WELL(S) IN
WINS
(LANDMAN)
3.3.8
CAPTURE
PRELIMINARY
WELL OWNERSHIP
IN ROWL
(LANDMAN)
PRE-DRILL ACTIVITIES
3.1.10
RECEIVE REQUEST
FOR NEW WELL
DRILL AFE
OWNERSHIP (LAND
ADMIN SPEC)
3.3.10
ENTER LEASES AND
CONTRACTS INTO WINS;
SET UP APO INTERESTS;
SET INTEREST FINAL-LAND
FLAG
(LAND EXPLORATION
SPEC)
3.3.13
REVIEW MKTG
ARRANGEMENT
SET-UP FOR ANY
OWNER CHANGES
(FIELD SERV)
3.3.14
REVIEW JIB
DECK FOR ANY
OWNER
CHANGES
(JIB ACCT)
3.3.12
COMPLETE AND
APPROVE ROWL
(LANDMAN/LAND
EXPLORATION
SPEC)
3.3.9
SEND WELL WORKING
INTEREST PARTNERS
AND PERCENTAGES TO
BUSINESS SERVICES
(LAND EXPLORATION
SPEC)
EMAIL AND POST TO NETWORK DRIVE
3.3.15
ANALYZE ROWL FOR
DRILLING/
COMPLETION INFO IN
DIMS, WINS, PDB
(OPERATIONS TECH)
3.1.33
UPDATE FINAL
INTERESTS BASED ON
PARTNERS' RESPONSES
IN ROWL (LAND ADMIN
SPEC)
EMAIL 3.3.11
ADD/COMPLETE
NACU DATA TO
ROWL
(LANDMAN/LAND
ADMIN ANALYST)
TITLE CURATIVE,
TITLE OPINIONS,
ETC.
3.3.16
3.4.1
PRELIMINARY D.O.
HEADER AUTO
ESTABLISHED IN
DOMAIN
Documents
Knowledge
Management
Planning
Process Guides
• Presentation Integration Servers
enables the creation of Composite
Applications by introducing a level
of orchestration between the
presentation layer of “legacy”
and composed applications
• Business processes are packaged
and reused by BPM tools
introducing business process
layer composition
• Solutions are built by combining
capabilities at every level of
the software stack: data,
process and presentation
78. Web 2.0 (1)
78
One’s view of Web 2.0 is highly dependent on one’s background and interest, and can best be
described by these three anchor points:
Technology and architecture – consists of the infrastructure of the Web and the concept of
Web platforms. Examples of specific technologies include Ajax, Representational State
Transfer (REST) and Really Simple Syndication (RSS.) Technologists tend to gravitate toward
this view.
Community and Social – looks at the dynamics around social networks, communities and
other personal content publish/share models, WIKIs, and other collaborative-content
models. Most people tend to gravitate toward this view, hence, there is a lot of Web 2.0
focus on “the architecture of participation.”
Business and process – Web services-enabled business models and mashup/remix
applications. (A mashup is a Web site of Web application that combines content from more
than one source.) Examples include long-tail economics and advertising and subscription
models such as a service (SaaS.) Of course, business people tend to zero in on this angle.
79. 79
Web 2.0 (2)
• What's Old Is New Again
• Most of what people call Web 2.0 is not entirely new. Many of the concepts and
technologies have existed for some time:
• For example, RSS is essentially the same as resource definition framework, a format
popularized by Netscape during Web 1.0 and the hype around push technology.
• Ajax is essentially JavaScript, dynamic HTML and asynchronous XML, all of which
have existed for more than five years and have become well-known with the advent
of high-profile implementations such as Google Maps.
• Certainly, collaboration and advertising are not new.
• Mashups bear a striking similarity to the SOA-derived term "composite
applications." What is new is how some of these are used, and in what
combinations.