SlideShare une entreprise Scribd logo
1  sur  33
Data Loading
Best Practice
PPDM Association
2012 Data
Symposium



www.etlsolutions.com
Agenda



                                Best practices in
Data loading
                                   tools and
 challenges
                                 methodology




   We’ll be taking a look at loading data into PPDM, but
   much of this applies to generic data loading too
Agenda



                        Best practices in
Data loading
                           tools and
 challenges
                         methodology
We’ve been listening to the Data Manager’s perspective



                • PPDM Conference,
                  Houston
                • PNEC, Houston
                • Data Managers’
                  challenges:
                   •   Education
                   •   Certification
                   •   Preserving knowledge
                   •   Process
Data management is difficult and important
Different data movement scenarios

                   Migration




                    Data


    Integration                   Loading




   But all require mapping rules for best practice
The business view of data migration can be an issue




 • Often started at the end of a programme
 • Seen as a business issue (moving the filing
   cabinet), not technical
 • However, the documents in the filing cabinet
   need to be read, understood, translated to the
   new system; obsolete files need to be
   discarded
Different data migration methodologies are available


   PDM (Practical Data          Providers
     Migration)                 • Most companies
   •   Johny Morris               providing data
   •   Training course            migration
                                  services/products
   •   PDM certification          have a methodology
   •   Abstract                 • Ours is PDM-like, but
   •   V2 due soon                more concrete and
                                  less abstract
Agenda




                        Best practices in
Data loading
                           tools and
 challenges
                         methodology




                         Methodology
As an example, our methodology


   Project scoping
                                                                           Core migration
    Configuration


                        Landscape analysis   Data assurance   Migration design          Migration development

Requirements analysis     Data discovery       Data review
                                                              Testing design            Testing development
                          Data modelling     Data cleansing



                                                                            Execution


                                                                               Review




                                                                     Legacy decommissioning
Firstly, review the legacy landscape
Satellites




                                     Archive             SAP



                                               Legacy


                                     Report             Application




                 Excel   Access DB




                 VBA
Eradicate failure points
 Beware the virtual waterfall process




Requirements                Agile Development             Migrate




                                                Signoff
Agenda




                        Best practices in
Data loading
                           tools and
 challenges
                         methodology




                             Rules
Rules are required

• In data migration, integration
  or loading, one area of
  commonality is the link
  between source and target
• This requires design,
  definition, testing,
  implementation and                   PPDM

  documentation
• The aim is automated loading
  of external data into a common
  store
• This requires best practice
Best practice: A single version of truth

• So for each of these data loaders
  we want a single version of truth
• Whatever artifacts are required,
  we want to remove duplication,
  because duplication means
  errors, inconsistency and
  additional work
• We want to remove boiler plate
  components that are only                       PPDM 3.8

  indirectly related to the business
  rules by which data is loaded
• Let’s look at what goes into a data
  loader and where the duplication
  and unnecessary work comes
  from...
The PPDM physical model

•   PPDM comes to us as a physical
    projection, rather than a logical model –
    maps directly to a relational database
•   Access therefore via SQL, PL/SQL; low
    level detail is important i.e. how
    relationships are implemented (e.g.
    well header to borehole)
•   Considerations to access: primary
    keys, foreign keys, data types –
    conversions, maximum lengths. Load
    order required by FKs – PPDM Load of
    the rings, relationships – cardinality etc
•   SQL: only know at runtime, so
    turnaround can be slow
•   All of this metadata is available in
    machine readable format, so we should
    use it
External data sources
•   Looking at the external files, we need a
    variety of skills: text manipulation, XML
    processing, Excel, database
•   The data model is unlikely to be as rich as
    PPDM, but there is some definition of the
    content e.g. Excel workbooks have a
    tabular layout with column titles,
    worksheets are named
•   It can be hard to find people with the
    relevant skills - you sometimes see ad        PPDM 3.8
    hoc, non-standard implementations
    because the developer used whatever
    skills he/she had: perl, python, xslt, sql
•   So the next clue is that we should use the
    model information: what elements,
    attributes and relationships are defined,
    rather than details of how we access it
•   Abstract out the data access layer; don’t
    mix data access with the business rules
    required to move them into PPDM
Challenges with domain expert mapping rules
•   A common step for defining how a data source is to be loaded is for a domain expert to
    write it up in Excel
•   Not concerned with data access, but some details will creep in, e.g. specifying an xpath
•   When lookups, merging/splitting values, string manipulation, conditional logic appear,
    the description can become ambiguous
•   Also note the duplication: the model metadata is being written in the spreadsheet; if the
    model changes, the spreadsheet needs to be manually updated
Challenges with developer mapping rules
•   The example here probably wouldn’t pass a code inspection, but it does illustrate the
    type of issues that can arise
•   Firstly, duplication: this is reiterating the Excel rules – they need to match up, but while
    a domain expert might follow the simple example previously, low level code can be
    tricky to discuss
•   Secondly, metadata is again duplicated: the names of the tables and columns appear in
    the SQL statements, the max length of the name column is checked
•   Thirdly, boiler plate code: select/update/insert conditional logic
•   Fourthly, data access code appears in the rules
•   Finally, the code becomes hard to maintain as the developer moves on to other roles
Documentation of mapping rules


       Word document for
        sign-off
       Data Management
        record
         How data was loaded
         Stored in your MDM
          data store
            Can be queried
         PPDM mapping tables
Test artifacts

• Here is where you do require
  some duplication
• Tests are stories:
    • Define what the system
       should do
    • If it does, the system is good
       enough if the tests are
       complete
• If we use a single version of
  truth to generate tests, the tests
  will duplicate errors, not find
  them
Agenda




                        Best practices in
Data loading
                           tools and
 challenges
                         methodology




                             Tools
Use tools

  • Use available metadata
  • Abstract out data access layer
  • Higher level DSL for the mapping
    rules:
      • Increase team communication
        – developer/business
      • Reduce boiler plate code
  • One definition:
      • Replace Excel and code
      • Generate documentation
An example of a graphical tool: Altova MapForce
•   Tools such as Talend, Mule DataMapper
    and Altova MapForce take a predominantly
    graphical approach
•   The metadata loaded on the left and right
    (source/target) with connecting lines
•   In addition to the logic gates for more
    complex processing, code snippets can be
    added to implement most business logic
•   Issues:
      • Is it really very easy to read? The
         example here is a simple mapping;
         imagine PPDM well log curve,
         reference data tables etc
      • It isn’t easy to see what really happens:
         a+b versus an “adder” – e.g. follow the
         equal() to Customers – what does that
         actually do?
      • But: can generate documentation and
         executable from that single definitive
         mapping definition
      • Typing errors etc are mostly eliminated
ETL Solutions’ Transformation Manager
•   An alternative is to use a textual DSL: again
    the metadata has been loaded
•   No data access code
•   Metadata is used extensively: for example
    warnings, primary key for identification;
    relationships
•   Typing errors are checked at designtime,
    and model or element changes affecting the
    code are quickly detected e.g. PPDM 3.8 to
    3.9
•   Rels used to link transforms: a more logical
    view with no need to understand underlying
    constraints; complexity of the model doesn’t
    matter, as the project becomes structured
    naturally
•   FK constraints used to determine load order
•   Metadata pulled in directly from the source
    e.g. PPDM, making use of all the hard work
    put in by the PPDM Association
Generated documentation
Keeping the PPDM data manager happy
One of the many questions a data manager
has about the data he/she manages:

Data lineage: How did this data get here?




                    PPDM 3.8
PPDM provides tables to record data lineage
Transformation Manager can generate documentation for the
                PPDM metadata module
Agenda




                        Best practices in
Data loading
                           tools and
 challenges
                         methodology




                   Project management
Key points


•   Be aware
     •   Look at data migration
         methodologies
     •   Select appropriate
         components
•   Look for and remove
    large risky steps
•   Start early
     •   Ensure correct
         resources will be
         available
     •   No nasty budget
         surprises
•   Use tools
•   Build a happy virtual team
Questions


• Did you know about these
  tables?
• Who uses them?
• How do you use them?
• What features would be truly
  useful in a data loader tool?
Contact us for more information:
   Karl Glenn, Business Development Director
   kg@etlsolutions.com
   +44 (0) 1912 894040


   Read more on our website:
   http://www.etlsolutions.com/what-we-do/oil-and-gas/



                                                                                Raising data
                                                                                management
                                                                                 standards
www.etlsolutions.com
 www.etlsolutions.com
                        Images from Free Digital Photos freedigitalphotos.net

Contenu connexe

En vedette

Geographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas IndustryGeographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas IndustryFrancois Viljoen
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadataLuis Bermudez
 
Data - the Oil & Gas asset that isn’t managed like one
Data  - the Oil & Gas asset that isn’t managed like oneData  - the Oil & Gas asset that isn’t managed like one
Data - the Oil & Gas asset that isn’t managed like oneMolten2013
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Jeff Smith
 
Sql server ___________ (advance sql)
Sql server  ___________  (advance sql)Sql server  ___________  (advance sql)
Sql server ___________ (advance sql)Ehtisham Ali
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingDmitry Kniazev
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumEnergySys Limited
 
Data Modelling is NOT just for RDBMS's
Data Modelling is NOT just for RDBMS'sData Modelling is NOT just for RDBMS's
Data Modelling is NOT just for RDBMS'sChristopher Bradley
 
Oil and gas big data analytics data Visualization
Oil and gas big data analytics data VisualizationOil and gas big data analytics data Visualization
Oil and gas big data analytics data VisualizationInfobrandz
 
Data modelling where did it all go wrong?
Data modelling where did it all go wrong?Data modelling where did it all go wrong?
Data modelling where did it all go wrong?Christopher Bradley
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Leveraging Information Steward
Leveraging Information StewardLeveraging Information Steward
Leveraging Information StewardMethod360
 
Information is at the heart of all architecture disciplines & why Conceptual ...
Information is at the heart of all architecture disciplines & why Conceptual ...Information is at the heart of all architecture disciplines & why Conceptual ...
Information is at the heart of all architecture disciplines & why Conceptual ...Christopher Bradley
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingDenodo
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production AllocationEnergySys Limited
 
Incorporating ERP metadata in your data models
Incorporating ERP metadata in your data modelsIncorporating ERP metadata in your data models
Incorporating ERP metadata in your data modelsChristopher Bradley
 
Pennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & Tricks
Pennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & TricksPennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & Tricks
Pennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & TricksJeff Smith
 
The role of Data Virtualisation in your EIM strategy
The role of Data Virtualisation in your EIM strategyThe role of Data Virtualisation in your EIM strategy
The role of Data Virtualisation in your EIM strategyChristopher Bradley
 

En vedette (19)

Geographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas IndustryGeographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas Industry
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
Data - the Oil & Gas asset that isn’t managed like one
Data  - the Oil & Gas asset that isn’t managed like oneData  - the Oil & Gas asset that isn’t managed like one
Data - the Oil & Gas asset that isn’t managed like one
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
 
Sql server ___________ (advance sql)
Sql server  ___________  (advance sql)Sql server  ___________  (advance sql)
Sql server ___________ (advance sql)
 
WITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark StreamingWITSML data processing with Kafka and Spark Streaming
WITSML data processing with Kafka and Spark Streaming
 
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation ForumChallenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
Challenges in Global Standardisation | EnergySys Hydrocarbon Allocation Forum
 
Data Modelling is NOT just for RDBMS's
Data Modelling is NOT just for RDBMS'sData Modelling is NOT just for RDBMS's
Data Modelling is NOT just for RDBMS's
 
Oil and gas big data analytics data Visualization
Oil and gas big data analytics data VisualizationOil and gas big data analytics data Visualization
Oil and gas big data analytics data Visualization
 
Data modelling where did it all go wrong?
Data modelling where did it all go wrong?Data modelling where did it all go wrong?
Data modelling where did it all go wrong?
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Leveraging Information Steward
Leveraging Information StewardLeveraging Information Steward
Leveraging Information Steward
 
Information is at the heart of all architecture disciplines & why Conceptual ...
Information is at the heart of all architecture disciplines & why Conceptual ...Information is at the heart of all architecture disciplines & why Conceptual ...
Information is at the heart of all architecture disciplines & why Conceptual ...
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
Standards for Production Allocation
Standards for Production AllocationStandards for Production Allocation
Standards for Production Allocation
 
Incorporating ERP metadata in your data models
Incorporating ERP metadata in your data modelsIncorporating ERP metadata in your data models
Incorporating ERP metadata in your data models
 
Pennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & Tricks
Pennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & TricksPennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & Tricks
Pennsylvania Banner User Group Webinar: Oracle SQL Developer Tips & Tricks
 
The role of Data Virtualisation in your EIM strategy
The role of Data Virtualisation in your EIM strategyThe role of Data Virtualisation in your EIM strategy
The role of Data Virtualisation in your EIM strategy
 
WITSML
WITSMLWITSML
WITSML
 

Plus de ETLSolutions

How to create a successful proof of concept
How to create a successful proof of conceptHow to create a successful proof of concept
How to create a successful proof of conceptETLSolutions
 
DMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it rightDMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it rightETLSolutions
 
How to prepare data before a data migration
How to prepare data before a data migrationHow to prepare data before a data migration
How to prepare data before a data migrationETLSolutions
 
E&P data management: Implementing data standards
E&P data management: Implementing data standardsE&P data management: Implementing data standards
E&P data management: Implementing data standardsETLSolutions
 
An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of conceptETLSolutions
 
Data integration case study: Oil & Gas industry
Data integration case study: Oil & Gas industryData integration case study: Oil & Gas industry
Data integration case study: Oil & Gas industryETLSolutions
 
Data integration case study: Automotive industry
Data integration case study: Automotive industryData integration case study: Automotive industry
Data integration case study: Automotive industryETLSolutions
 
Migrating data: How to reduce risk
Migrating data: How to reduce riskMigrating data: How to reduce risk
Migrating data: How to reduce riskETLSolutions
 
Preparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guidePreparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guideETLSolutions
 
A 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data managementA 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data managementETLSolutions
 
Automotive data integration: An example of a successful project structure
Automotive data integration: An example of a successful project structureAutomotive data integration: An example of a successful project structure
Automotive data integration: An example of a successful project structureETLSolutions
 

Plus de ETLSolutions (11)

How to create a successful proof of concept
How to create a successful proof of conceptHow to create a successful proof of concept
How to create a successful proof of concept
 
DMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it rightDMS data integration: 6 ways to get it right
DMS data integration: 6 ways to get it right
 
How to prepare data before a data migration
How to prepare data before a data migrationHow to prepare data before a data migration
How to prepare data before a data migration
 
E&P data management: Implementing data standards
E&P data management: Implementing data standardsE&P data management: Implementing data standards
E&P data management: Implementing data standards
 
An example of a successful proof of concept
An example of a successful proof of conceptAn example of a successful proof of concept
An example of a successful proof of concept
 
Data integration case study: Oil & Gas industry
Data integration case study: Oil & Gas industryData integration case study: Oil & Gas industry
Data integration case study: Oil & Gas industry
 
Data integration case study: Automotive industry
Data integration case study: Automotive industryData integration case study: Automotive industry
Data integration case study: Automotive industry
 
Migrating data: How to reduce risk
Migrating data: How to reduce riskMigrating data: How to reduce risk
Migrating data: How to reduce risk
 
Preparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guidePreparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guide
 
A 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data managementA 5-step methodology for complex E&P data management
A 5-step methodology for complex E&P data management
 
Automotive data integration: An example of a successful project structure
Automotive data integration: An example of a successful project structureAutomotive data integration: An example of a successful project structure
Automotive data integration: An example of a successful project structure
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

PPDM data loading best practice

  • 1. Data Loading Best Practice PPDM Association 2012 Data Symposium www.etlsolutions.com
  • 2. Agenda Best practices in Data loading tools and challenges methodology We’ll be taking a look at loading data into PPDM, but much of this applies to generic data loading too
  • 3. Agenda Best practices in Data loading tools and challenges methodology
  • 4. We’ve been listening to the Data Manager’s perspective • PPDM Conference, Houston • PNEC, Houston • Data Managers’ challenges: • Education • Certification • Preserving knowledge • Process
  • 5. Data management is difficult and important
  • 6. Different data movement scenarios Migration Data Integration Loading But all require mapping rules for best practice
  • 7. The business view of data migration can be an issue • Often started at the end of a programme • Seen as a business issue (moving the filing cabinet), not technical • However, the documents in the filing cabinet need to be read, understood, translated to the new system; obsolete files need to be discarded
  • 8. Different data migration methodologies are available PDM (Practical Data Providers Migration) • Most companies • Johny Morris providing data • Training course migration services/products • PDM certification have a methodology • Abstract • Ours is PDM-like, but • V2 due soon more concrete and less abstract
  • 9. Agenda Best practices in Data loading tools and challenges methodology Methodology
  • 10. As an example, our methodology Project scoping Core migration Configuration Landscape analysis Data assurance Migration design Migration development Requirements analysis Data discovery Data review Testing design Testing development Data modelling Data cleansing Execution Review Legacy decommissioning
  • 11. Firstly, review the legacy landscape Satellites Archive SAP Legacy Report Application Excel Access DB VBA
  • 12. Eradicate failure points Beware the virtual waterfall process Requirements Agile Development Migrate Signoff
  • 13. Agenda Best practices in Data loading tools and challenges methodology Rules
  • 14. Rules are required • In data migration, integration or loading, one area of commonality is the link between source and target • This requires design, definition, testing, implementation and PPDM documentation • The aim is automated loading of external data into a common store • This requires best practice
  • 15. Best practice: A single version of truth • So for each of these data loaders we want a single version of truth • Whatever artifacts are required, we want to remove duplication, because duplication means errors, inconsistency and additional work • We want to remove boiler plate components that are only PPDM 3.8 indirectly related to the business rules by which data is loaded • Let’s look at what goes into a data loader and where the duplication and unnecessary work comes from...
  • 16. The PPDM physical model • PPDM comes to us as a physical projection, rather than a logical model – maps directly to a relational database • Access therefore via SQL, PL/SQL; low level detail is important i.e. how relationships are implemented (e.g. well header to borehole) • Considerations to access: primary keys, foreign keys, data types – conversions, maximum lengths. Load order required by FKs – PPDM Load of the rings, relationships – cardinality etc • SQL: only know at runtime, so turnaround can be slow • All of this metadata is available in machine readable format, so we should use it
  • 17. External data sources • Looking at the external files, we need a variety of skills: text manipulation, XML processing, Excel, database • The data model is unlikely to be as rich as PPDM, but there is some definition of the content e.g. Excel workbooks have a tabular layout with column titles, worksheets are named • It can be hard to find people with the relevant skills - you sometimes see ad PPDM 3.8 hoc, non-standard implementations because the developer used whatever skills he/she had: perl, python, xslt, sql • So the next clue is that we should use the model information: what elements, attributes and relationships are defined, rather than details of how we access it • Abstract out the data access layer; don’t mix data access with the business rules required to move them into PPDM
  • 18. Challenges with domain expert mapping rules • A common step for defining how a data source is to be loaded is for a domain expert to write it up in Excel • Not concerned with data access, but some details will creep in, e.g. specifying an xpath • When lookups, merging/splitting values, string manipulation, conditional logic appear, the description can become ambiguous • Also note the duplication: the model metadata is being written in the spreadsheet; if the model changes, the spreadsheet needs to be manually updated
  • 19. Challenges with developer mapping rules • The example here probably wouldn’t pass a code inspection, but it does illustrate the type of issues that can arise • Firstly, duplication: this is reiterating the Excel rules – they need to match up, but while a domain expert might follow the simple example previously, low level code can be tricky to discuss • Secondly, metadata is again duplicated: the names of the tables and columns appear in the SQL statements, the max length of the name column is checked • Thirdly, boiler plate code: select/update/insert conditional logic • Fourthly, data access code appears in the rules • Finally, the code becomes hard to maintain as the developer moves on to other roles
  • 20. Documentation of mapping rules  Word document for sign-off  Data Management record  How data was loaded  Stored in your MDM data store  Can be queried  PPDM mapping tables
  • 21. Test artifacts • Here is where you do require some duplication • Tests are stories: • Define what the system should do • If it does, the system is good enough if the tests are complete • If we use a single version of truth to generate tests, the tests will duplicate errors, not find them
  • 22. Agenda Best practices in Data loading tools and challenges methodology Tools
  • 23. Use tools • Use available metadata • Abstract out data access layer • Higher level DSL for the mapping rules: • Increase team communication – developer/business • Reduce boiler plate code • One definition: • Replace Excel and code • Generate documentation
  • 24. An example of a graphical tool: Altova MapForce • Tools such as Talend, Mule DataMapper and Altova MapForce take a predominantly graphical approach • The metadata loaded on the left and right (source/target) with connecting lines • In addition to the logic gates for more complex processing, code snippets can be added to implement most business logic • Issues: • Is it really very easy to read? The example here is a simple mapping; imagine PPDM well log curve, reference data tables etc • It isn’t easy to see what really happens: a+b versus an “adder” – e.g. follow the equal() to Customers – what does that actually do? • But: can generate documentation and executable from that single definitive mapping definition • Typing errors etc are mostly eliminated
  • 25. ETL Solutions’ Transformation Manager • An alternative is to use a textual DSL: again the metadata has been loaded • No data access code • Metadata is used extensively: for example warnings, primary key for identification; relationships • Typing errors are checked at designtime, and model or element changes affecting the code are quickly detected e.g. PPDM 3.8 to 3.9 • Rels used to link transforms: a more logical view with no need to understand underlying constraints; complexity of the model doesn’t matter, as the project becomes structured naturally • FK constraints used to determine load order • Metadata pulled in directly from the source e.g. PPDM, making use of all the hard work put in by the PPDM Association
  • 27. Keeping the PPDM data manager happy One of the many questions a data manager has about the data he/she manages: Data lineage: How did this data get here? PPDM 3.8
  • 28. PPDM provides tables to record data lineage
  • 29. Transformation Manager can generate documentation for the PPDM metadata module
  • 30. Agenda Best practices in Data loading tools and challenges methodology Project management
  • 31. Key points • Be aware • Look at data migration methodologies • Select appropriate components • Look for and remove large risky steps • Start early • Ensure correct resources will be available • No nasty budget surprises • Use tools • Build a happy virtual team
  • 32. Questions • Did you know about these tables? • Who uses them? • How do you use them? • What features would be truly useful in a data loader tool?
  • 33. Contact us for more information: Karl Glenn, Business Development Director kg@etlsolutions.com +44 (0) 1912 894040 Read more on our website: http://www.etlsolutions.com/what-we-do/oil-and-gas/ Raising data management standards www.etlsolutions.com www.etlsolutions.com Images from Free Digital Photos freedigitalphotos.net

Notes de l'éditeur

  1. To keep things simple when I’m talking, we’ll discuss loading data into PPDM, but a lot of this applies to generic data loading – moving data out of PPDM, or not involving PPDM at all.Data transformation is mudane from a business perspective, but very important to get right. The less time and trouble it causes, the more time you can spend doing more interesting things directly benefiting your business.Badly loaded data by definition affects the quality of the data in your MDM store.
  2. To keep things simple when I’m talking, we’ll discuss loading data into PPDM, but a lot of this applies to generic data loading – moving data out of PPDM, or not involving PPDM at all.Data transformation is mudane from a business perspective, but very important to get right. The less time and trouble it causes, the more time you can spend doing more interesting things directly benefiting your business.Badly loaded data by definition affects the quality of the data in your MDM store.
  3. How I came to be here giving this talkDmigration expert with ETL for 10+yearsMy first PPDM conference was last October in Houston. We're from a data loading, integration background, and it was a bit of an eye opener to listen to that data manager's view of things/PPDM, and it seemed to me that companies such as ourselves are part of the problem. One aspect of the conference that caught my attention was the education, certification and, for want of a better word, the professionalisation of petroleum data management - moving away from comparmentalised expertise within companies - I had the impression that one of the industry concerns is that a lot of this expertise is held in people's heads, and the experienced people looking forward more to retirement than their career. At PNEC, it was clear that some companies are very well advanced in this respect, for standards bodies like PPDM there is a lot of work to be done.There are similar rumblings within the data migration and integration industry, so today's talk will be an introduction to data migration best practices, then a quick look at how these link up with data management best practice.Went to pnec/ppdm, got data management perspective, lots of interest in certification, education, passing on the legacyAlso: focus on understanding the data - lineage, quality led me to think about how we as data loaders don’t really think about ongoing data management.[cause of/solution to]Hence talk on Data migration good practices/process
  4. We’ll first look at 3 scenarios for moving data about
  5. And yet cos it happens at end of programme, it is often started late and treated as a technical issue – moving the filing cabinet.-----DM happens at the end of the programme, just before the old system turns off, so it runs late and hence is usually started late into the programme.Possible exaggeration, but business view is a forklift truck picking up the filing cabinet and dropping it into the new system.Maybe a new filing cabinet and make sure all the papers are in there before movingIn reality, need to read, understand, translate the documents to new system, discard obsolete papers. Only business knows what’s important.
  6. DM methodologies, certifications, training are sprouting up same way as with data managementWhat’s available?PDM – v2 coming, it’s a bit abstract.Project Management for Data Conversions and Data Management, Charles Scott
  7. To keep things simple when I’m talking, we’ll discuss loading data into PPDM, but a lot of this applies to generic data loading – moving data out of PPDM, or not involving PPDM at all.Data transformation is mudane from a business perspective, but very important to get right. The less time and trouble it causes, the more time you can spend doing more interesting things directly benefiting your business.Badly loaded data by definition affects the quality of the data in your MDM store.
  8. And workflow parts.------------------A key to success is the old system being turned off. This gets people interested.Concentrate on Legacy Decommissioning. See what light it sheds on the process – turns out to be a lot.Sometimes a process can seem like a sea of required steps, hard to get people to buy into it.Focus on something everyone can agree with – the legacy system will be turned off – surprisingly often not considered in much detail, but it gives you leverage, grabs peoples attention if you can get them to believe you
  9. WHAT: Need to know what you’re turning off.Programme: single vcersion of truth.Data quality.Model the systems and relationships you have – as you go out and start talking to the people who use these systems and how, you’ll discover more connections and systems including little empires, satellite systems used for operations you want to be brought into the new system. You’ll also discover bits of the legacy system nobody uses – no need to migrate.Landscape analysis. Typically data discovery done here, not data profiling.
  10. Data migration is unusual in many ways. There are several teams of people involved, users, new system providers, old system experts, migration experts, project management, “business”.You write code that can correctly interpret the relevant information in the old system – moves and checks it to the new system – it’s complicatedAt the end of the programme, you run it once, it’s tested to ensure the new system will allow business to continue confidently, you don’t want them or you sleepless worriedThen you throw it away.There’s a lot of nostalgia around at the moment, you can buy your childhood memories on ebay, but if you’re looking for a 70s style waterfall methodology you’d be hard pushed to find services or product companies that don’t have the word agile in their process.Don’t despair, with data migration you can create you own waterfall method despite each stage being demonstrably agile.Describe diagramRequirements – light – just move the data. The some truly agile dev, maybe asking some reqs questions, unit testing, so on – let’s pretend that all the different bits are developed together and there are no nasty surprises when you try to join them up in your weeks worth of integration testing. Testers – they know how to write tests, right – so alongside the agile dev they’re writing tests, starting with the big pebbles and filling up the jar, let’s assume they test incrementally.then signoff – big step – security, data stakeholders, huge docsThen migrate – users hardly noticed you up til now, told to test, think of lots of fiendish lttle corner casesEg postcodeIf you have a nostalgic desire to follow a 70’s style waterfall process, you’d be hard pushed to find services or product companies that don’t have the word agile in their process, but don’t worry, in a DM project there are many ways to achieve it while using agile methods throughout.Signoff here might fail – big jump back at end of projectUsers might reject migration – bigger jump
  11. To keep things simple when I’m talking, we’ll discuss loading data into PPDM, but a lot of this applies to generic data loading – moving data out of PPDM, or not involving PPDM at all.Data transformation is mudane from a business perspective, but very important to get right. The less time and trouble it causes, the more time you can spend doing more interesting things directly benefiting your business.Badly loaded data by definition affects the quality of the data in your MDM store.
  12. In the 3 scenarios we looked at, one area of commonality are the innocuous arrows linking the source and target.A great deal of work is involved in realising them – design, requirements, testing, implementation, documentation.What we really want is the implementation, the automated loading of external data into PPDM.let’s look at best practice for implementing these.
  13. So for each of these data loaders, to relate back to a mantra we’ve heard in previous data management conferences, ideally we want a single version of truth; Whatever artifacts are required, we want to remove duplication, because duplication means errors, inconsistency, additional workWe want to remove boiler plate components that are only indirectly related to the business rules by which data is loaded.Let’s look at what goes into a data loader and where the duplication and unnecessary work comes from.--------------------------------As a data manager, looking at your single version of truth (logical, may be physically many databases), you want to be able to ask questions about that:Confidence in dataLegal questions – should we be looking at archived data, where is it, how did that data map to our current data view.Is there a link between instances of common errors – eg a problem with data loaded from a particular source.Looking back at data load, migration, integration, they have something in common: the arrows, or the rules by which data loaded.On the diagram they look nothing, but there is a lot of work that goes into them.Extend the single version of truth analogy, you want a single version of truth for each of these arrows. Look into them and they are generally very ad-hoc and poorly documented.Business rules: Excel – not well version controlled, woolly language – vague. Then implemented in code or using a graphical tool. Duplication. Difficult to communicate developer to domain expert.From a DM perspective, where are these rules. Well PPDM have done a great job in this respect by providing tables specifically for that.
  14. PPDM comes to us as a physical projection, rather than a logical model – maps directly to a relational database.Access therefore via SQL, PL/SQL; low level detail is important ie how relationships are implemented eg well header to borehole.Considerations to access: primary keys, foreign keys, data types – conversions, maximum lengths. Load order required by FKs – PPDM Load of the rings, relationships – cardinality, etc.SQL: only know at runtime, so turnaround can be slowAll of this metadata is available in machine readable format, so we should use it -
  15. Looking at the external files, we need a variety of skills: text manipulation, XML processing, Excel, database, etc.The data model is unlikely to be as rich as PPDM, but there is some definition of the content: the LAS 2.0 specficiation, Excel workbooks have a layout, eg tabular with column titles, worksheets are named, etc.It can be hard to find people with the relevant skills, and you can end up with some adhoc, non standardised implementations because the developer used whatever skills he had: perl, python, xslt, sql.So the next clue is that we should use the model information: what elements, attributes and relationships are defined, rather than details of how we access it:Abstract out the data access layer, don’t mix data access with the business rules required to move them into PPDM.
  16. A common step for defining how a data source is to be loaded is for a domain expert to write it up in Excel.Not concerned with data access, but some details will creep in, egspecifiying an xpath.When lookups, merging / splitting values, string manipulation, conditional logic etc come in the description can become ambiguous.Also note the duplication: the model metadata is being written in to the spreadsheet; if the model changes, the spreadsheet needs to be manually updated.
  17. A developer implements those rules in code. Above pseudo code shows typical things that are undesirable:First, duplication – this is reiterating the excel rules – they need to match up, but while a domain expert might follow the simple example above, low level code can be tricky to discuss.Second again: metadata is again duplicated – the names of the tables and columns appear in the SQL statements, the max length of the name column is checked. Explicit looping construct.Third boiler plate code: select/update/insert conditional logic.Fourth: data access code appears in the rules.I’ve made it explicit here and the code above probably wouldn’t pass a code inspection, but it does illustrate the type of duplication that can arise.In particular, the developer reads the specifications, knowledge stored in developers head, and regurgitated as code. Developer becomes valuable, code becomes hard to maintain as talented developer moves on.
  18. Tools are a recognised best practice, they’re better than trying to do things by hand, eg hand coding migration scripts, workflow, profiling.But you do need skills in the toolsets.
  19. To keep things simple when I’m talking, we’ll discuss loading data into PPDM, but a lot of this applies to generic data loading – moving data out of PPDM, or not involving PPDM at all.Data transformation is mudane from a business perspective, but very important to get right. The less time and trouble it causes, the more time you can spend doing more interesting things directly benefiting your business.Badly loaded data by definition affects the quality of the data in your MDM store.
  20. Graphical tools such as Talend, Mule DataMapper, and her AltovaMapForce take a predominantly graphical approach.You can see the metadata loaded in on the left and right (source/target), and lines connecting.In addition to the logic gates for more complex processing, in the background you can add code snippets to implement most business logic.Issues:is it really very easy to read (above is simple mapping, imagine PPDM well log curve, reference data tables etc).It isn’t easy to see what really happens: a+b versus an “adder” – eg follow the equal() to Customers – what does that actually do?But: can generate documentation and executable from that single definitive mapping definitionTyping errors etc are mostly eliminated.
  21. An alternative is to use a textual DSL. Again you can see the metadata has been loaded[switch to TM live to enable interaction].No data access code.Metadata is used extensively – for example warnings, primary key for identification; relationships – cardinality, no explicit iteration. Typing errors checked at designtime. Model, element changes that affect the code quickly detected, egimagince PPDM 3.8 to 3.9.Rels used to link transforms – more logical view, no need to understand underlying constraints; complexity of model doesn’t matter, the project becomes structured naturally.FK constraints used to determine load order.Metadata pulled in directly from the source metadata, eg PPDM. Show comments; customisation. So, making use of all the hard work put in by PPDM.
  22. From the same source you can generate code to execute the project
  23. As a data manager, looking at your single version of truth (logical, may be physically many databases), you want to be able to ask questions about that:Confidence in dataLegal questions – should we be looking at archived data, where is it, how did that data map to our current data view.Is there a link between instances of common errors – eg a problem with data loaded from a particular source.Looking back at data load, migration, integration, they have something in common: the arrows, or the rules by which data loaded.On the diagram they look nothing, but there is a lot of work that goes into them.Extend the single version of truth analogy, you want a single version of truth for each of these arrows. Look into them and they are generally very ad-hoc and poorly documented.Business rules: Excel – not well version controlled, woolly language – vague. Then implemented in code or using a graphical tool. Duplication. Difficult to communicate developer to domain expert.From a DM perspective, where are these rules. Well PPDM have done a great job in this respect by providing tables specifically for that.
  24. PPDM provides tables to allow us to record this - PPDM_SCHEMA, PPDM_TABLE, PPDM_COLUMN to describe the schemas and PPDM_MAP_RULE and PPDM_MAP_RULE_DETAIL to record the mappings. So, PPDM_SCHEMA doesn't just enable you to store details about your PPDM schema - it's pretty much the same as the Oracle catalog tables, with a few additions for recording units of measure, for example. So you can record there schema information about the legacy system, about XML schemas for example WITSML or PRODML. The PPDM_MAP tables let you record the mapping rules, so how a particular element or attribute goes from the source to the target schema and is stored in PPDM. This makes you the data manager happy, because you can query the database using your finely honed SQL skills to create reports to present to the business users who need to know this information - hopefully not the legal department.PPDM_MAP_RULE is used to contain lower level code, eg PL/SQL, python rules etc.It's hard to populate these tables though and their use is not standardised.
  25. So: How about a tool, which can already generate documentation, generate the same for the PPDM metadata module.Above is a bit simplistic – hopefully PPDM_SYSTEM, PPDM_TABLE etc are already populated for the actual PPDM instance.And we only want to publish mappings when they actually used.Switch to demonstration to show TM prototype of how they can be populated.
  26. You can store code but is it readable? Unit of code is usually a block, perhaps saying how to migrate an entire table.
  27. So how about the tools used in data loading populate these tables automatically. Most, not all, have some metadata representation. If it's hand coding, then often the metadata is encapsulated in the code itself - the developer read the documentation, created the queries and updates which would run against the data stores, but apart from that you expect a tool to show you what you are moving data between.You'd also hope for a higher level representation of the mapping rules - lines, boxes, a domain specific language. Possibly a reporting capability.So what we did was to take our reporting capabilities, and look at how we can "report" to PPDM - export the metadata and mapping rules into the relevant module. I want to emphasise that we did an investigation, we don't have production code and we may have gotten some of the details wrong, but I'm going to fire up our tool, show you some simple mapping rules as developed, then show you what we populated in PPDM schema and mapping tables at the push of a button.
  28. A typical tool – using ours cos I get a discount from our sales guy[better pic reqd]
  29. To keep things simple when I’m talking, we’ll discuss loading data into PPDM, but a lot of this applies to generic data loading – moving data out of PPDM, or not involving PPDM at all.Data transformation is mudane from a business perspective, but very important to get right. The less time and trouble it causes, the more time you can spend doing more interesting things directly benefiting your business.Badly loaded data by definition affects the quality of the data in your MDM store.
  30. At the end of these initial phases, you might realise that the big bang approach is not going to work, and you need to change your approach significantly.It’s much better to realise this early on.Eg you might migrate bit by bit, maintaining both systems and using a data highway to move data between the old/new systems[dvla]
  31. You need people to agree to the old system off, identify these people and bring them with youDecide what docs they will sign off – if you require design docs, these will need to be understood – are user stories better than low level business rules?When – so – 2 weeks before you run the final project you have all the docs ready for sign off – what do you do – hand them all over then for the stakeholders to sign off?These docs will be large and hard to understand, and you
  32. WHEN: Need to ensure business continuity
  33. Tools are a recognised best practice, they’re better than trying to do things by hand, eg hand coding migration scripts, workflow, profiling.But you do need skills in the toolsets.