SlideShare une entreprise Scribd logo
1  sur  48
Pragmatics Driven Issues in
Data and Process Integrity in Enterprises
                        Keynote/Invited Talk

                IFIP TC-11 First Working Conference on
     Integrity and Internal Control in Information Systems
           Zurich, Switzerland December 4-5, 1997



                  Amit Sheth
                  Large Scale Distributed Information System Lab
                  University of Georgia
                  http://LSDIS.cs.uga.edu/
Three Real Challenges to Data Integrity
Three Real Challenges to Data Integrity

Three realities of IS environment
• Dirty data
• Interdependent Data
• Process Coordination /Workflow
  Management
  but traditional data integrity and database
  transaction solutions come up short ...…...
Overview
Poor Quality      Inconsistent              Process
of Data           Related Data              Coordination


Data Cleanup/     Correct                  Workflow
Purification      Inconsistencies          Specifications

           Achieve
           Data                  Data   Process
           Integrity


 Transaction      Interdependent            Manage
 Management       Data Management           Data Integrity
Dirty Data



                         Dirty Data

  Managing Data Quality                                            46%
 Business Data Modeling                                      31%
   End-user Expectations                                    29%

                           Legacy Data                    25%
                   Transformation                       22%
  Business Rule Analysis                              17%
Management Expectations                              16%
  Database Performance Source: DCI/Meta Group, Inc.
 Users cite their biggest data warehouse challenges;
Dirty Data


         Stories I have heard/seen

• 30% fall-outs (“requests for manual assist”)
  due to mismatch between address in
  customer service request and loop inventory
  database in a Telco
• PUC insisted that a Regional Bell Company
  do something about reducing 400 persons
  employed ($40 million+) to keep data
  consistent
Dirty Data



         Dirty Data: Real World Stories

• Insurance company regional data: 80% of
  claims had “broken leg” as diagnosis*
• 4% error rate, a $2 billion forfeits $80 million
  in revenue*




* Emily Kay, Dirty Data Challenges Warehouses, DW/Software Magazine, Oct. 97
Dirty Data



         Data Quality Dimensions

• invalid or impaired data
• incomplete or missing data
• inconsistent data

How to continue business operations
• by discounting affect of poor data quality data
• without worsening data quality
Dirty Data



         Improving Data Quality

• Rule discovery, audit,
  scrubbing/cleansing/purifying, defect
  prevention
• Commercial offerings give partial solutions
  to some aspects of identifying data quality
  problems and some aspects of cleanup
  (scrubbing)
Dirty Data


            NASD Data Quality Toolset

 Client-access tool                Cognos, SAS, Applix

 Conversion tool                   ETI* Extract

 Metadata tool                     Platinum Tech’s
                                   Repository
 Auditing tool                     Prism Solution’s QDB
                                   Solutions QDB/Connect

Problem: No integrated solution!
From L. Wilson, “NASD: Securing Data Quality, DW/Software Magazine, Oct. 97
Dirty Data



     More on Commercial Solutions

• Commercial solution providers: Information
  Builders, Platinum Technologies, SAS
  Institute, Group 1 Software, Vality
  Technology, First Logic
• Hundred of thousands of dollars: Why?
Dirty Data



       Issues reasonably addressed

• Conceptual framework -- MIT’s work gives
  very good start
• Most existing solutions apply to single data
  repository or database -- possible to use
  remote data access solutions for one
  database at a time
Dirty Data


         Challenges to be addressed

• Most solutions deal with structured/relational
  data only -- increasingly data is in different
  media
• Most solutions deal with creation of data
  warehouse; OK for decision support, but what
  about operational use?
Dirty Data



         Data Quality Challenges

How to continue business operations
• by discounting affect of poor data quality data
• without worsening data quality


   “A Mediator for Approximate Consistency:
 Supporting “Good Enough” Materialized Views”
              Seligman-Kerschberg
Dirty Data

       A Research Project: Q-Data
    Define         Invoke Validation     Display Results
    Rules             & Cleanup            or Consult


                   GUI

                                                Rules & Programs
         Declarative Rule and
                                                - Ref. Integrity
         Procedural Programs                    - Approx. Match
         LDL++ (LDL/Prolog/C++)                 - Consistency


  Database                           Legacy
Access Interface                 System Interface

 Databases                Legacy Information Systems
Dirty Data



     Interested in More Information?

• Industry/Practice:
  – www.sentrytech.com
  – “Data Quality Maze”, DW, Software Magazing,
    Oct. 1997
• MIS: Total Data Quality Research:
  www.mit.edu/tqdm/www
• Computer Science Research:
  Sheth-Wood- Kashyap, Ami Motro,...
Interdependent Data

          Interdependent Data and
         Multidatabase Consistency
Function oriented independently created
  application systems to automate different parts of
  operation.
Hence independently developed databases where:
• information about a subject is distributed in
  multiple systems
• a new application manages existing data
  independently
Interdependent Data

                   Interdependent Data and
                  Multidatabase Consistency
                    Order        Billing   Planning &
                    Processing   System    Engineering
                    System                 System


Customer Data

Inventory Data

Assignment Data

Reference Data
Interdependent Data



                War Stories

• Data analysis: One data element was in 43
  separate legacy system files, maintained by
  43 separate programs.
• Telco: Customer information is probably in
  over 100 information systems. Some
  information may be overlapping, and in
  different representational forms.
Interdependent Data



       Real Example:
Provisioning Residential Line
Interdependent Data




Lack of understanding and maintenance of data
  independency lead to data inconsistency and require
• manual intervention for completed failed operations
• work-around/patches
• manual reconciliation

and result in
• incorrect and wasted operations, poor quality of work
• difficulty in interoperability, high costs
• lost business opportunities
Interdependent Data

           A Framework for Specifying
               Interdependent Data




                      data dependency descriptor



  dependency               consistency             restoration

structural control     data state   temporal   coupled/       vital/
                                               decoupled      non-vital

                     Sheth and Rusinkiewicz 1990
Interdependent Data



                  A Case Study at Bellcore

                                     Planning Apps.




Inventory/
                Planning
  Source
                                                      Reference
                              Engineering Design        Data
     Karabatis and Sheth 92
Interdependent Data



       An Example of Interdependent Data




        YEAR (…,demand, …)         DMD_CAP(…,assigned,…)
      ENTITY_JOB (…,capacity,…)


• Dependency: join and aggregation/sum over YEAR and ENTITY_JOB
• Consistency requirement: C1: demand/capacity > 0.9 or
                         C2: (capacity - demand) < 5000
• Restoration procedure:
    • when C1 then regular_planning_update as non-coupled
    • when C2 then emergency_planning_update as coupled & vital
Interdependent Data



   Types of Dependency Specification

• Redundant data
  – replication data, primary-secondary copies
  – vertical/horizontal partitions
• Semantic integrity constraints
  – value existential constraints
• Derived data
Interdependent Data



  Types of Consistency Requirements

• Immediate consistency
• eventual consistency
• lagging consistency

  – Temporal criteria
     • at or before some time, within an interval, periodically
  – Data state criteria
     • number of operations or data items change, value of change,
       before or after an operation
Interdependent Data



        Some Relevant Work: Criteria

• replica control: primary secondary copies, one-
  copy serializability
• epsilon-serializability [Pu & Leff], N-ignorance
  [Krishnakumar & Bernstein], k-completeness [Sarin et al]
• eventual and lagging consistency [Sheth et al]
Interdependent Data



      Some Relevant Work: Modeling

• Identity Connections [Wiederhold & Qian]
• Demarcation Protocol [Barbara and Garciia-Molina]
• Data Dependency Descriptors
  [Rusinkiewicz/Sheth/Karabatis]
• Existence/Value Dependency [Ceri & Widom],
  Interdependencies (existence, structural,
  behavioral, value) [Li and McLeod]
• Computational Invariants, PATH structure [Etzion]
• ECA Rules [Dayal]
Interdependent Data


             Enforcement Strategies

• Application code
• Middleware: Transaction Monitors,
  Replication Server [Notes]
• Quasi-copies [Barbara et al]
• Production Rules and Persistent Queues [Ceri
  and Widom]
• Extended Distributed Transaction
  Management
  – Polytransactions [Sheth et al], Quasi-transactions
    [Arizio et al]
Interdependent Data



                              Polytransactions

root transaction (t1)                      IDS                                     t1
                                                                       coupled-         coupled-
                               t2b                     t3              non-vital          vital
 t2a

             Interdependent          Interdependent   Interdependent          t2a         t2b
             Data Manager            Data Manager     Data Manager
                                                                                    Non--coupled

              Local DBMS             Local DBMS       Local DBMS                   t3



                How are related transactions determined? => S,U,P
                When is a related transaction created? => C, Policy
                   What does a related transaction do? => A
Interdependent Data



          Enforcement Policy




current                 consistent                    inconsistent

          eager restoration           partial restoration


               late restoration or lazy restoration
Workflow



             Workflow Management

• Workflow Management (WFM) is the
  automated coordination, control, and
  communication of work, both of people and
  computers, in the context of organizational
  processes, through the execution of software in
  a network of computers whose order of execution
  is controlled by a computerized representation of
  the business processes.
Workflow


       What is workflow about ?

• Effective coordination, control and
  communications of work among human
  participants and system/information resources
  to orchestrate organizational processes
• Need to improve human/organization
  productivity, efficiency, quality of work
• New paradigm for “Programming in the large”
METEOR Workflow Model
                                 (very high level)


                                task
start                                                                task
        task                                                                end
                       filter
                                                     task


        interface                   interface           interface
                      aux. sys
        proc.                          proc.                proc.
        entity                         entity               entity
METEOR2 Task Models
                                                            Initial

                                  Initial                    start
          Initial
                                   start                  Executing
           start
          Executing               Executing                  done
                       abort                      abort
fail                                 commit                 Done
                done                                        prepared

                                                           Prepared
                       Aborted     Committed
Failed       Done
                                               abort           commit
Non-Transactional       Transactional
                                               Aborted    Committed

                                                Open 2PC transactional
A Complex Real-world Example

Generates:
• alerts to identify
patient’s needs.
• contraindications                                            CLINICAL SUBSYSTEM
to caution
providers.                                                                 Reminders to parents


Health providers can obtain up-to-date
clinical and eligibility information
                                                                                                    C
                                                                                                    T
                                                                                 Reports to state
Hospitals and clinics update
central databases after
encounters               Health agencies can
                         use reports generated
 SDOH and                to track
 CHREF                   population’s needs             Hospitals and
 maintain                                               case workers
 databases,            State and HMO’s                  can reach
                       can update                       out to the population         HMOs can keep track
 support EDI                                                                          of performance
                       patient’s eligibility
 transactions          data
                                               TRACKING SUBSYSTEM
Implementation Testbed
                                                                                               Admit Clerk      Triage Nurse    Doctor/NP           Maternity Ward
            Administrator      Case Worker etc.




                                                                                                                CORBA (ORBeline)*
                                                                                                                               Iris (Pentium/ Windows NT)
Om (SunSparc 20 / Solaris)
          Illustra DBMS
                                                                                                                                            Oracle7 DBMS
                                                                Web Server                      Web Server
   MPI        MEI      Immunization Db
                                                   Optimus (SunSparc 2 / Solaris)              Ra (SunSparc 20 / Solaris)
                                                                                                                                      Detailed Encounter Db

CHREF                                                                                                                                                Hospital
                                                                                    Internet


                                                                                                                                                          Clinic
                                                                           I

CHREF/SDOH
                                                                        ED




                                                                                                     Admit Clerk      Triage Nurse    Doctor/NP

                                                           em
                                                    S   yst
                                               File             Web Server                                                                                POMS
                                          rk
    Om (SunSparc 20 / Solaris)         two
                                    Ne

                  Illustra DBMS

                                                                                                                                                           Db
                                                                                                                                                  Files
             Insurance Eligibility Db
                                                                                                                                     Detailed Encounter Data
Workflow



         Data Integrity Challenges

• Workflows express application level
  integrity needs
  – e.g., customer available to task 1 should be
    consistent with the related information
    available to task 2 even if both execute quite
    independently
• In wake of -- inter-workflow requrements
• Integrity of specification for adaptive
  workflows
Workflow

                Weaknesses of
            State-of-the-art WFMS

• Lack of clear theoretical basis
• Undefined correctness criteria
• Limited support for:
  –   Concurrency Control
  –   Interoperability between workflow systems
  –   Scalability
  –   Availability
  –   Recovery (no human assisted recovery)
Workflow


         Transactions to the rescue?

• DB transactions and DP transactions
  address the correctness, consistency,
  recovery issues to different degrees, and
  have strong theoretical foundation ---
• BUT can they apply to Workflow
  Management? Applications and
  environments differ significantly!
Workflow



          Transactions in WFMS

• Task specific:
  – transactional tasks (e.g., database related)
  – distributed transaction processing
• Domain specific:
  – EDI, HL7
  – business contracts
Workflow



          Transactions in WFMS

• Business-process specific:
   – workflow correctness and reliability from a
     business process point of view
   – roles, worklists, error handling
• infrastructure specific: (each with their own
  notions)
   – CTM, DOM (CORBA), WWW, TP-
     monitors, Lotus Notes
Workflow


          An intuitive argument -
      why extended transactions don’t apply

• ATMs were often motivated by a particular
  domain or a set of applications ...
  too narrow a scope in many case
• Workflow is more horizontal in nature,
  many ATMs have been vertical in nature
  (Transaction concepts scale relatively well
  with hierarchical decompositions)
• Significant human involvement, long
  running, autonomous systems,...
Workflow

        Characteristics of Large-Scale
      Real-World Workflow Applications
• HAD computing environments
• Multiple communication paradigms
• Humans, legacy applications, and other non-
  transactional tasks
• Organizational requirements (roles,
  authentication, security, etc.)
• Heterogeneous multimedia data
• Dynamic and virtual enterprises
• Electronic commerce
Workflow



                    Our view
• In the context of workflows:
   – basis for modeling transactional tasks … YES
   – basis for modeling group of tasks as a
     transaction …MAY BE or YES
   – basis for ensuring reliable communication
     between workflow components … MAY BE or
     YES
   – basis for modeling workflows ?? …. NO!
• Transactions --yes, ATMs --probably not
Workflow


                      Our view
• Notion of transactions in WFMS is more
  generalized than in TP-systems and DBMSs
• Workflow systems should provide support for all
  forms of transactions
• Strict transactional semantics not practical in
  workflow systems
• Role of transactions in workflow systems:
  – for tasks within the workflow process
  – for implementing solutions to support fault-tolerance,
    concurrency control, correctness, recovery
Conclusions

• Neither Systems Environment nor Data
  integrity requirements are as “simplistic”,
  “clean”, “well defined” as in research
• Research has taken “black and white”
  approach -- we need to deal with “shades of
  gray”, how do you deal with the imperfect
  world?
• We have to address issues that span
  multiple heterogeneous systems
  – numerous, more challenging, more complex
• Both data, application/process level issues
For more information:
          http://lsdis.cs.uga.edu

For publications: check corresponding areas at
     http://lsdis.cs.uga.edu/publications

Contenu connexe

Tendances

Database Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention PeriodsDatabase Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention PeriodsCraig Mullins
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Kendall_White Resume6
Kendall_White Resume6Kendall_White Resume6
Kendall_White Resume6Kendall White
 
Making the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemMaking the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemPerficient
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-AshishGuleria
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousinguncleRhyme
 
Defence IT 2012 - Data Quality and Financial Services - Solvency II
Defence IT 2012 - Data Quality and Financial Services - Solvency IIDefence IT 2012 - Data Quality and Financial Services - Solvency II
Defence IT 2012 - Data Quality and Financial Services - Solvency IIDavid Twaddell
 

Tendances (18)

Database Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention PeriodsDatabase Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention Periods
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Kendall_White Resume6
Kendall_White Resume6Kendall_White Resume6
Kendall_White Resume6
 
Making the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management SystemMaking the Move to an Enterprise Clinical Trial Management System
Making the Move to an Enterprise Clinical Trial Management System
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Big Data SE vs. SE for Big Data
Big Data SE vs. SE for Big DataBig Data SE vs. SE for Big Data
Big Data SE vs. SE for Big Data
 
Data mining
Data miningData mining
Data mining
 
Data Cleansing
Data CleansingData Cleansing
Data Cleansing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Defence IT 2012 - Data Quality and Financial Services - Solvency II
Defence IT 2012 - Data Quality and Financial Services - Solvency IIDefence IT 2012 - Data Quality and Financial Services - Solvency II
Defence IT 2012 - Data Quality and Financial Services - Solvency II
 

En vedette

Identidad cultural
Identidad cultural Identidad cultural
Identidad cultural wilmer
 
חלק 110 אתרים ארכיאולוגיים - א' (nx power-lite)
חלק 110   אתרים ארכיאולוגיים - א' (nx power-lite)חלק 110   אתרים ארכיאולוגיים - א' (nx power-lite)
חלק 110 אתרים ארכיאולוגיים - א' (nx power-lite)Yossi Fatael
 
プレゼンテーション 小中学生のスマホについて
プレゼンテーション 小中学生のスマホについてプレゼンテーション 小中学生のスマホについて
プレゼンテーション 小中学生のスマホについてArisa Sato
 
Ilm shiatsu fræðin
Ilm shiatsu fræðinIlm shiatsu fræðin
Ilm shiatsu fræðinsigrunasgeirs
 
Expo lenguaje leer y escribir dia a dia en las aulas
Expo lenguaje leer y escribir dia a dia en las aulasExpo lenguaje leer y escribir dia a dia en las aulas
Expo lenguaje leer y escribir dia a dia en las aulasEsteban Corleone
 
Paradise-Hotel-Group_eBrochure_v2014-1
Paradise-Hotel-Group_eBrochure_v2014-1Paradise-Hotel-Group_eBrochure_v2014-1
Paradise-Hotel-Group_eBrochure_v2014-1Jim Papovich, CHA
 

En vedette (8)

Identidad cultural
Identidad cultural Identidad cultural
Identidad cultural
 
חלק 110 אתרים ארכיאולוגיים - א' (nx power-lite)
חלק 110   אתרים ארכיאולוגיים - א' (nx power-lite)חלק 110   אתרים ארכיאולוגיים - א' (nx power-lite)
חלק 110 אתרים ארכיאולוגיים - א' (nx power-lite)
 
プレゼンテーション 小中学生のスマホについて
プレゼンテーション 小中学生のスマホについてプレゼンテーション 小中学生のスマホについて
プレゼンテーション 小中学生のスマホについて
 
Ilm shiatsu fræðin
Ilm shiatsu fræðinIlm shiatsu fræðin
Ilm shiatsu fræðin
 
Expo lenguaje leer y escribir dia a dia en las aulas
Expo lenguaje leer y escribir dia a dia en las aulasExpo lenguaje leer y escribir dia a dia en las aulas
Expo lenguaje leer y escribir dia a dia en las aulas
 
Developing windows 8 apps
Developing windows 8 appsDeveloping windows 8 apps
Developing windows 8 apps
 
Paradise-Hotel-Group_eBrochure_v2014-1
Paradise-Hotel-Group_eBrochure_v2014-1Paradise-Hotel-Group_eBrochure_v2014-1
Paradise-Hotel-Group_eBrochure_v2014-1
 
Sistemas Operativos
Sistemas OperativosSistemas Operativos
Sistemas Operativos
 

Similaire à Pragmatics Driven Issues in Data and Process Integrity in Enterprises

How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsExtraHop Networks
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services PlatformEnabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platformprajods
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsGDi Techno Solutions
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services MarketplaceDenodo
 
Machine Data - How to Realize and Amplify its Value
Machine Data - How to Realize and Amplify its ValueMachine Data - How to Realize and Amplify its Value
Machine Data - How to Realize and Amplify its ValueMark Chmarny
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentDenodo
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Increasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationIncreasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationDenodo
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackPrecisely
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 

Similaire à Pragmatics Driven Issues in Data and Process Integrity in Enterprises (20)

How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services PlatformEnabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Data Services Marketplace
Data Services MarketplaceData Services Marketplace
Data Services Marketplace
 
Machine Data - How to Realize and Amplify its Value
Machine Data - How to Realize and Amplify its ValueMachine Data - How to Realize and Amplify its Value
Machine Data - How to Realize and Amplify its Value
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data Environment
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Increasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationIncreasing Agility Through Data Virtualization
Increasing Agility Through Data Virtualization
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
BigData Testing by Shreya Pal
BigData Testing by Shreya PalBigData Testing by Shreya Pal
BigData Testing by Shreya Pal
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 

Dernier

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Dernier (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

Pragmatics Driven Issues in Data and Process Integrity in Enterprises

  • 1. Pragmatics Driven Issues in Data and Process Integrity in Enterprises Keynote/Invited Talk IFIP TC-11 First Working Conference on Integrity and Internal Control in Information Systems Zurich, Switzerland December 4-5, 1997 Amit Sheth Large Scale Distributed Information System Lab University of Georgia http://LSDIS.cs.uga.edu/
  • 2. Three Real Challenges to Data Integrity Three Real Challenges to Data Integrity Three realities of IS environment • Dirty data • Interdependent Data • Process Coordination /Workflow Management but traditional data integrity and database transaction solutions come up short ...…...
  • 3. Overview Poor Quality Inconsistent Process of Data Related Data Coordination Data Cleanup/ Correct Workflow Purification Inconsistencies Specifications Achieve Data Data Process Integrity Transaction Interdependent Manage Management Data Management Data Integrity
  • 4. Dirty Data Dirty Data Managing Data Quality 46% Business Data Modeling 31% End-user Expectations 29% Legacy Data 25% Transformation 22% Business Rule Analysis 17% Management Expectations 16% Database Performance Source: DCI/Meta Group, Inc. Users cite their biggest data warehouse challenges;
  • 5. Dirty Data Stories I have heard/seen • 30% fall-outs (“requests for manual assist”) due to mismatch between address in customer service request and loop inventory database in a Telco • PUC insisted that a Regional Bell Company do something about reducing 400 persons employed ($40 million+) to keep data consistent
  • 6. Dirty Data Dirty Data: Real World Stories • Insurance company regional data: 80% of claims had “broken leg” as diagnosis* • 4% error rate, a $2 billion forfeits $80 million in revenue* * Emily Kay, Dirty Data Challenges Warehouses, DW/Software Magazine, Oct. 97
  • 7. Dirty Data Data Quality Dimensions • invalid or impaired data • incomplete or missing data • inconsistent data How to continue business operations • by discounting affect of poor data quality data • without worsening data quality
  • 8. Dirty Data Improving Data Quality • Rule discovery, audit, scrubbing/cleansing/purifying, defect prevention • Commercial offerings give partial solutions to some aspects of identifying data quality problems and some aspects of cleanup (scrubbing)
  • 9. Dirty Data NASD Data Quality Toolset Client-access tool Cognos, SAS, Applix Conversion tool ETI* Extract Metadata tool Platinum Tech’s Repository Auditing tool Prism Solution’s QDB Solutions QDB/Connect Problem: No integrated solution! From L. Wilson, “NASD: Securing Data Quality, DW/Software Magazine, Oct. 97
  • 10. Dirty Data More on Commercial Solutions • Commercial solution providers: Information Builders, Platinum Technologies, SAS Institute, Group 1 Software, Vality Technology, First Logic • Hundred of thousands of dollars: Why?
  • 11. Dirty Data Issues reasonably addressed • Conceptual framework -- MIT’s work gives very good start • Most existing solutions apply to single data repository or database -- possible to use remote data access solutions for one database at a time
  • 12. Dirty Data Challenges to be addressed • Most solutions deal with structured/relational data only -- increasingly data is in different media • Most solutions deal with creation of data warehouse; OK for decision support, but what about operational use?
  • 13. Dirty Data Data Quality Challenges How to continue business operations • by discounting affect of poor data quality data • without worsening data quality “A Mediator for Approximate Consistency: Supporting “Good Enough” Materialized Views” Seligman-Kerschberg
  • 14. Dirty Data A Research Project: Q-Data Define Invoke Validation Display Results Rules & Cleanup or Consult GUI Rules & Programs Declarative Rule and - Ref. Integrity Procedural Programs - Approx. Match LDL++ (LDL/Prolog/C++) - Consistency Database Legacy Access Interface System Interface Databases Legacy Information Systems
  • 15. Dirty Data Interested in More Information? • Industry/Practice: – www.sentrytech.com – “Data Quality Maze”, DW, Software Magazing, Oct. 1997 • MIS: Total Data Quality Research: www.mit.edu/tqdm/www • Computer Science Research: Sheth-Wood- Kashyap, Ami Motro,...
  • 16. Interdependent Data Interdependent Data and Multidatabase Consistency Function oriented independently created application systems to automate different parts of operation. Hence independently developed databases where: • information about a subject is distributed in multiple systems • a new application manages existing data independently
  • 17. Interdependent Data Interdependent Data and Multidatabase Consistency Order Billing Planning & Processing System Engineering System System Customer Data Inventory Data Assignment Data Reference Data
  • 18. Interdependent Data War Stories • Data analysis: One data element was in 43 separate legacy system files, maintained by 43 separate programs. • Telco: Customer information is probably in over 100 information systems. Some information may be overlapping, and in different representational forms.
  • 19. Interdependent Data Real Example: Provisioning Residential Line
  • 20. Interdependent Data Lack of understanding and maintenance of data independency lead to data inconsistency and require • manual intervention for completed failed operations • work-around/patches • manual reconciliation and result in • incorrect and wasted operations, poor quality of work • difficulty in interoperability, high costs • lost business opportunities
  • 21. Interdependent Data A Framework for Specifying Interdependent Data data dependency descriptor dependency consistency restoration structural control data state temporal coupled/ vital/ decoupled non-vital Sheth and Rusinkiewicz 1990
  • 22. Interdependent Data A Case Study at Bellcore Planning Apps. Inventory/ Planning Source Reference Engineering Design Data Karabatis and Sheth 92
  • 23. Interdependent Data An Example of Interdependent Data YEAR (…,demand, …) DMD_CAP(…,assigned,…) ENTITY_JOB (…,capacity,…) • Dependency: join and aggregation/sum over YEAR and ENTITY_JOB • Consistency requirement: C1: demand/capacity > 0.9 or C2: (capacity - demand) < 5000 • Restoration procedure: • when C1 then regular_planning_update as non-coupled • when C2 then emergency_planning_update as coupled & vital
  • 24. Interdependent Data Types of Dependency Specification • Redundant data – replication data, primary-secondary copies – vertical/horizontal partitions • Semantic integrity constraints – value existential constraints • Derived data
  • 25. Interdependent Data Types of Consistency Requirements • Immediate consistency • eventual consistency • lagging consistency – Temporal criteria • at or before some time, within an interval, periodically – Data state criteria • number of operations or data items change, value of change, before or after an operation
  • 26. Interdependent Data Some Relevant Work: Criteria • replica control: primary secondary copies, one- copy serializability • epsilon-serializability [Pu & Leff], N-ignorance [Krishnakumar & Bernstein], k-completeness [Sarin et al] • eventual and lagging consistency [Sheth et al]
  • 27. Interdependent Data Some Relevant Work: Modeling • Identity Connections [Wiederhold & Qian] • Demarcation Protocol [Barbara and Garciia-Molina] • Data Dependency Descriptors [Rusinkiewicz/Sheth/Karabatis] • Existence/Value Dependency [Ceri & Widom], Interdependencies (existence, structural, behavioral, value) [Li and McLeod] • Computational Invariants, PATH structure [Etzion] • ECA Rules [Dayal]
  • 28. Interdependent Data Enforcement Strategies • Application code • Middleware: Transaction Monitors, Replication Server [Notes] • Quasi-copies [Barbara et al] • Production Rules and Persistent Queues [Ceri and Widom] • Extended Distributed Transaction Management – Polytransactions [Sheth et al], Quasi-transactions [Arizio et al]
  • 29. Interdependent Data Polytransactions root transaction (t1) IDS t1 coupled- coupled- t2b t3 non-vital vital t2a Interdependent Interdependent Interdependent t2a t2b Data Manager Data Manager Data Manager Non--coupled Local DBMS Local DBMS Local DBMS t3 How are related transactions determined? => S,U,P When is a related transaction created? => C, Policy What does a related transaction do? => A
  • 30. Interdependent Data Enforcement Policy current consistent inconsistent eager restoration partial restoration late restoration or lazy restoration
  • 31. Workflow Workflow Management • Workflow Management (WFM) is the automated coordination, control, and communication of work, both of people and computers, in the context of organizational processes, through the execution of software in a network of computers whose order of execution is controlled by a computerized representation of the business processes.
  • 32. Workflow What is workflow about ? • Effective coordination, control and communications of work among human participants and system/information resources to orchestrate organizational processes • Need to improve human/organization productivity, efficiency, quality of work • New paradigm for “Programming in the large”
  • 33. METEOR Workflow Model (very high level) task start task task end filter task interface interface interface aux. sys proc. proc. proc. entity entity entity
  • 34. METEOR2 Task Models Initial Initial start Initial start Executing start Executing Executing done abort abort fail commit Done done prepared Prepared Aborted Committed Failed Done abort commit Non-Transactional Transactional Aborted Committed Open 2PC transactional
  • 35. A Complex Real-world Example Generates: • alerts to identify patient’s needs. • contraindications CLINICAL SUBSYSTEM to caution providers. Reminders to parents Health providers can obtain up-to-date clinical and eligibility information C T Reports to state Hospitals and clinics update central databases after encounters Health agencies can use reports generated SDOH and to track CHREF population’s needs Hospitals and maintain case workers databases, State and HMO’s can reach can update out to the population HMOs can keep track support EDI of performance patient’s eligibility transactions data TRACKING SUBSYSTEM
  • 36. Implementation Testbed Admit Clerk Triage Nurse Doctor/NP Maternity Ward Administrator Case Worker etc. CORBA (ORBeline)* Iris (Pentium/ Windows NT) Om (SunSparc 20 / Solaris) Illustra DBMS Oracle7 DBMS Web Server Web Server MPI MEI Immunization Db Optimus (SunSparc 2 / Solaris) Ra (SunSparc 20 / Solaris) Detailed Encounter Db CHREF Hospital Internet Clinic I CHREF/SDOH ED Admit Clerk Triage Nurse Doctor/NP em S yst File Web Server POMS rk Om (SunSparc 20 / Solaris) two Ne Illustra DBMS Db Files Insurance Eligibility Db Detailed Encounter Data
  • 37. Workflow Data Integrity Challenges • Workflows express application level integrity needs – e.g., customer available to task 1 should be consistent with the related information available to task 2 even if both execute quite independently • In wake of -- inter-workflow requrements • Integrity of specification for adaptive workflows
  • 38. Workflow Weaknesses of State-of-the-art WFMS • Lack of clear theoretical basis • Undefined correctness criteria • Limited support for: – Concurrency Control – Interoperability between workflow systems – Scalability – Availability – Recovery (no human assisted recovery)
  • 39. Workflow Transactions to the rescue? • DB transactions and DP transactions address the correctness, consistency, recovery issues to different degrees, and have strong theoretical foundation --- • BUT can they apply to Workflow Management? Applications and environments differ significantly!
  • 40. Workflow Transactions in WFMS • Task specific: – transactional tasks (e.g., database related) – distributed transaction processing • Domain specific: – EDI, HL7 – business contracts
  • 41. Workflow Transactions in WFMS • Business-process specific: – workflow correctness and reliability from a business process point of view – roles, worklists, error handling • infrastructure specific: (each with their own notions) – CTM, DOM (CORBA), WWW, TP- monitors, Lotus Notes
  • 42. Workflow An intuitive argument - why extended transactions don’t apply • ATMs were often motivated by a particular domain or a set of applications ... too narrow a scope in many case • Workflow is more horizontal in nature, many ATMs have been vertical in nature (Transaction concepts scale relatively well with hierarchical decompositions) • Significant human involvement, long running, autonomous systems,...
  • 43. Workflow Characteristics of Large-Scale Real-World Workflow Applications • HAD computing environments • Multiple communication paradigms • Humans, legacy applications, and other non- transactional tasks • Organizational requirements (roles, authentication, security, etc.) • Heterogeneous multimedia data • Dynamic and virtual enterprises • Electronic commerce
  • 44. Workflow Our view • In the context of workflows: – basis for modeling transactional tasks … YES – basis for modeling group of tasks as a transaction …MAY BE or YES – basis for ensuring reliable communication between workflow components … MAY BE or YES – basis for modeling workflows ?? …. NO! • Transactions --yes, ATMs --probably not
  • 45. Workflow Our view • Notion of transactions in WFMS is more generalized than in TP-systems and DBMSs • Workflow systems should provide support for all forms of transactions • Strict transactional semantics not practical in workflow systems • Role of transactions in workflow systems: – for tasks within the workflow process – for implementing solutions to support fault-tolerance, concurrency control, correctness, recovery
  • 46. Conclusions • Neither Systems Environment nor Data integrity requirements are as “simplistic”, “clean”, “well defined” as in research • Research has taken “black and white” approach -- we need to deal with “shades of gray”, how do you deal with the imperfect world?
  • 47. • We have to address issues that span multiple heterogeneous systems – numerous, more challenging, more complex • Both data, application/process level issues
  • 48. For more information: http://lsdis.cs.uga.edu For publications: check corresponding areas at http://lsdis.cs.uga.edu/publications