Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

DataEd Slides: Approaching Data Management Technologies

346 vues

Publié le

Our architecturally solid stool requires three legs: people, process, and technologies. This webinar looks at the most misunderstood of these three components: technology. While most organizations begin with technologies, it turns out that technologies are the last component that should be considered. This webinar will survey a range of Data Management technologies that can be used to increase the productivity of Data Management efforts.

Publié dans : Données & analyses
  • 12 Signs From The Universe When You Are On The...  http://scamcb.com/manifmagic/pdf
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

DataEd Slides: Approaching Data Management Technologies

  1. 1. Peter Aiken, Ph.D. Approaching Data Management Technologies Copyright 2019 by Data Blueprint Slide # !1 Unlocking Business Value Peter Aiken, PhD • DAMA International President 2009-2013 / 2018 • DAMA International Achievement Award 2001 
 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005 Peter Aiken, Ph.D. !2Copyright 2019 by Data Blueprint Slide # • I've been doing this a long time • My work is recognized as useful • Associate Professor of IS (vcu.edu) • Founder, Data Blueprint (datablueprint.com) • DAMA International (dama.org) • 10 books and dozens of articles • Experienced w/ 500+ data management practices worldwide • Multi-year immersions – US DoD (DISA/Army/Marines/DLA) – Nokia – Deutsche Bank – Wells Fargo – Walmart – … PETER AIKEN WITH JUANITA BILLINGS FOREWORD BY JOHN BOTTEGA MONETIZING DATA MANAGEMENT Unlocking the Value in Your Organization’s Most Important Asset.
  2. 2. Unified Data Orchestration for the Cloud Dipti Borkar |Vice President, Product | Alluxio dipti@alluxio.com | @dborkar
  3. 3. 4 big trends driving the need for a new architecture Separation of Compute & Storage Hybrid – Multi cloud environments Self-service data across the enterprise Rise of the object store
  4. 4. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE
  5. 5. Data Orchestration for the Cloud Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver NFS Driver
  6. 6. Use Cases Data Orchestration Enables Hive Alluxio Run big data workloads in hybrid cloud environments On premise Same instance / container Spark Alluxio Any Cloud / Multi Cloud Same data center / region PrestoSpark Alluxio Accelerate big data frameworks on the public cloud Same instance / container Enable big data on object stores across single or multiple clouds Standalone
  7. 7. Unified Namespace Bring all files into a single interface Interact with data using any API Accelerate & tier data transparently API Translation Intelligent Multi-tiering Key Innovations of theVirtual Unified File System
  8. 8. Incredible Open Source Momentum with growing community 900+ contributors & growing 3760+ Git Stars Apache 2.0 Licensed Hundreds of thousands of downloads Join the conversation on Slack alluxio.org/slack
  9. 9. 1Infogix Confidential Copyright 2019 • Innovating data solutions since 1982 • Headquartered in Chicago • Large and mid‐size customers world‐wide: • Organizations rely on Infogix so they can trust  their data • Average customer tenure > 18 years Infogix “Industries that thrive on data”
  10. 10. !3Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies Transform 4 Problems with forklifting 1. no basis for decisions made 2. no inclusion of architecture/engineering concepts 3. no idea that these concepts are 
 missing from 
 the process 4. 80% of organizational data is ROT Less Cleaner More shareable 
 ... data Making Cloud Successful Copyright 2019 by Data Blueprint
  11. 11. Gartner Strategic Planning Assumptions • By 2021 – Strategy using data hubs, lakes and warehouses will support 30% more use cases (capabilities) than competitors. • By 2022 – 50% of cloud decisions based on data assets provided rather than on the product capabilities. – Active metadata will reduce time to data delivery by 30%. • By 2023 – AI-enabled automation will reduce 
 the need for IT specialists by 20%. – 75% of all databases will be cloud, reducing the DBMS vendor landscape and increasing complexity for data governance and integration. !5Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6 Gartner Cloud Vendor Offerings CSP-specific data assets may be of interest when combined with easy access, becoming a key differentiator. 
 For example: • Google: – Google Search data – YouTube data – Google Ads data – Retailers. • Azure – LinkedIn – Office 365 data – Sales and customer-relationship-focused analytics !6Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
  12. 12. Core Data Management Capabilities !7Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6 
 
 
 UsesUsesReuses What is data management? !8Copyright 2019 by Data Blueprint Slide # Sources 
 Data Engineering 
 Data 
 Delivery 
 Data
 Storage Specialized Team Skills Data Governance Understanding the current and future data needs of an enterprise and making that data effective and efficient in supporting 
 business activities

 Aiken, P, Allen, M. D., Parker, B., Mattia, A., 
 "Measuring Data Management's Maturity: 
 A Community's Self-Assessment" 
 IEEE Computer (research feature April 2007) Data management practices connect data sources and uses in an organized and efficient manner • Engineering • Storage • Delivery • Governance When executed, 
 engineering, storage, and 
 delivery implement governance Note: does not well-depict data reuse
  13. 13. Standard data Data supply Data literacy Making a Better Data Sandwich !9Copyright 2019 by Data Blueprint Slide # Data literacy Standard data Data supply Making a Better Data Sandwich !10Copyright 2019 by Data Blueprint Slide # Standard data Data supply Data literacy
  14. 14. !11Copyright 2019 by Data Blueprint Slide # Standard data Data supply Data literacy This cannot happen without engineering and architecture! Quality engineering/
 architecture work products 
 do not happen accidentally! Making a Better Data Sandwich Technologies by themselves, are a One Legged Stool !12Copyright 2019 by Data Blueprint Slide #
  15. 15. !13Copyright 2019 by Data Blueprint Slide # Success Requires a 3-Legged Stool People Process Technology !14Copyright 2019 by Data Blueprint Slide # People Process Technology
  16. 16. !15Copyright 2019 by Data Blueprint Slide # • as opposed to mobile device management • MDM is a discipline or strategy – "… where the business and the IT organization 
 work together to ensure the uniformity, accuracy, 
 semantic persistence, stewardship and accountability 
 of the enterprise's official, shared master data." • Sold as technology- based solution Definitions !16Copyright 2019 by Data Blueprint Slide #
  17. 17. Master Data Architecture !17Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International • Technology-first approaches often de- emphasize the 
 people and process components • Successful MDM also requires – Governance/quality – Process architecture !18Copyright 2019 by Data Blueprint Slide # Tools and Methods
 Are Required!
  18. 18. Growth of Data vs. Growth of Data Analysts • Stored data accumulating at 
 28% annual growth rate • Data analysts in workforce 
 growing at 5.7% growth rate !19Copyright 2019 by Data Blueprint Slide # Supply/demand for data talent https://www.logianalytics.com/bi-trends/3-keys-understanding-data/ !20Copyright 2019 by Data Blueprint Slide # R. Buckminster Fuller
  19. 19. !21Copyright 2019 by Data Blueprint Slide # https://en.wikipedia.org/wiki/Moore%27s_law#/media/File:Moore%27s_Law_Transistor_Count_1971-2016.png Postpone technology investments 
 as long as possible The hardest part of requirements is not doing design Vendor Hype • CIOs/CDOs feel pressure • Vendor/project promise auditing • No understanding of hype curve !22Copyright 2019 by Data Blueprint Slide #
  20. 20. Who wrote this … ? !23Copyright 2019 by Data Blueprint Slide # • In considering any new subject, • there is frequently a tendency first to overrate what we find to be already interesting or remarkable, and • secondly - by a sort of natural reaction - to undervalue the true state of the case. – Lady Augusta Ada King, (1815 – 1852)
 Countess of Lovelace – (aka) Ada Lovelace, 
 daughter of Lord Byron – Publisher of the first 
 computing program !24Copyright 2019 by Data Blueprint Slide #
  21. 21. Gartner Five-phase Hype Cycle http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp !25Copyright 2019 by Data Blueprint Slide # Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters. Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not. Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third- generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious. Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off. Hype Cycle for Data Management !26Copyright 2019 by Data Blueprint Slide #
  22. 22. Hype Cycle for Information Governance and Master Data Management !27Copyright 2019 by Data Blueprint Slide # Hype Cycle for Analytics and Business Intelligence !28Copyright 2019 by Data Blueprint Slide #
  23. 23. !29Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies Data Management Technologies • Managing data technology should follow the same principles and standards for managing any technology • Leading reference model for technology management is the Information Technology Infrastructure Library (ITIL): http://www.itil-officialsite.com/home/home.asp !30Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  24. 24. Understanding Data Technology Requirements Need to understand: • How the technology works • How it provides value in the 
 context of a particular business • Requirements of a data technology before determining what technical solution to choose for a particular situation Suggested questions: • What problem does this data technology mean to solve? • What sets this data technology apart from others? • Are there specific hardware/software/operating systems/ storage/network/connectivity requirements? • Does this technology include data security functionality? !31Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International Defining Data Technology Architecture • Data technology is part of the overall technology architecture • It is also often considered part of the enterprise’s data architecture • Data technology architecture addresses 3 questions: 1. What technologies are 
 standard/required/preferred/acceptable? 2. Which technologies apply to which 
 purposes and circumstances? 3. In a distributed environment, which 
 technologies exist where, and 
 how does data move from one node to another? !32Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  25. 25. !33Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies Computer-aided software engineering (CASE) is the scientific application of a set of tools and methods to a software system which is meant to result in high-quality, defect-free, and maintainable software products. It also refers to methods for the development of information systems together with automated tools that can be used in the software development process. CASE Tools Computer Aided Software/
 Systems Engineering Tools • Scientific application of a set of tools and methods to a software system which is meant to result in high-quality, defect free, and maintainable software products • Refers to methods for the development of information systems together with automated tools that can be used in the software development process • CASE functions include analysis, design, and programming !34Copyright 2019 by Data Blueprint Slide # Source: http://en.wikipedia.org/wiki/
  26. 26. CASE-based Support !35Copyright 2019 by Data Blueprint Slide # http://www.visible.com CASE-based Support !36Copyright 2019 by Data Blueprint Slide # http://www.visible.com
  27. 27. CASE-based Support !37Copyright 2019 by Data Blueprint Slide # http://www.visible.com CASE Tool Evolution • Microsoft – Excel – Powerpoint – Visio • ERwin ER/Studio • Rational Rose • Open source – It is never free. Even open-sourced 
 technology requires care and feeding • List of CASE Tools – http://www.unl.csi.cuny.edu/faqs/software-enginering/tools.html !38Copyright 2019 by Data Blueprint Slide #
  28. 28. Figure 18.2 Sample budget for implementing a $2500/ seat CASE technology can be $2.5 million over a 5-year period [adapted from Huff "Elements of a Realistic CASE Tool Adoption Budget" © 1992 Communications of the ACM] $187K = $2500/seat × 75 seats $360K = training $500K = workstations $150K= assessment costs $910K = total initial investment $150K = in-house support $ 55K = hardware and software maintenance $ 60K = ongoing training and misc. $265K = annual additional investment × 5 years $1325K investment over 5 years !39Copyright 2019 by Data Blueprint Slide # CASE Tool: "Taxonomy" !40Copyright 2019 by Data Blueprint Slide # This includes • Senders – flows from the CASE effort that can inform the re-architecting effort. • Receivers – flows from the project that can inform the CASE effort. • Senders and receivers – some elements, such as restructuring and reengineering, are both senders and receivers.
  29. 29. A variety of CASE-based methods and technologies can access and update the metadata metadata Integration Additional metadata uses 
 accessible via: web; portal; 
 XML; RDBMS Everything must "fit" into one CASE technology Changing Model of CASE Tool Usage !41Copyright 2019 by Data Blueprint Slide # Limited access from outside the CASE technology environment CASE 
 tool-specific 
 methods 
 and 
 technologies Limited additional
 metadata use !42Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies
  30. 30. The Biggest Challenges to Data Management Practice !43Copyright 2019 by Data Blueprint Slide # One Eighth of the Data Management Spend • Metadata management is still a nascent discipline that only represents 12% of the time spent in data management !44Copyright 2019 by Data Blueprint Slide # 88% 12% Metadata
  31. 31. Repositories have been difficult to "sell" 21 September 1999 Michael Blechar, Lisa Wallace Management Summary Most executive and IS managers view an IT metadata repository as an esoteric technology that is not directly related to the business. However, as will be seen, an IT metadata repository can substantially help IS organizations support the applications, which in turn support the business. An IT metadata repository is a pre-built system and reference database where the IS organizations can track and manage the information about the applications and databases they build and maintain; think of it as the inventory and change impact reporting system for IS. These repositories track metadata such as the descriptions of jobs, programs, modules, screens, data and databases, and the interrelationships between them. Metadata differs from the actual data being described. Metadata is information about data. For example, the metadata descriptions in the repository tell one that the field "customer number" appears in Databases A, B and F ... !45Copyright 2019 by Data Blueprint Slide # [From gartner.com] What tools do you use? 45% 23% 13% 9% 7% 2% 1% 1% 1% 1% None HomeGrown Other CA Platinum Rochade Universal Repository DesignBank DWGuide InfoManager Interface Metadata Tool • Almost one in four organizations (23%) is building their own Repository Technologies in Use !46Copyright 2019 by Data Blueprint Slide # Number Responding=181 • Almost one in two organizations (45%) doesn't use • The "traditional" players are 16%
  32. 32. Metadata Repositories 2004 "However, due to cost (these tools start at about $150,000, but frequently exceed $1 million) and being slow to market in terms of support for new service-oriented architectures (SOAs), CA and ASG have opened the door to smaller competitors" !47Copyright 2019 by Data Blueprint Slide # Magic Quadrant for Metadata Management Solutions !48Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
  33. 33. IBM's AD/Cycle Information Model !49Copyright 2019 by Data Blueprint Slide # !50Copyright 2019 by Data Blueprint Slide # https://wiscorp.com/kwf_diagram.html
  34. 34. Implementing Metadata Repository Functionality • "The repository" does not have to be an integrated solution – it must be an easily integrateable solution • Repository functionality (does not equal a) repository – metadata must easily evolve to repository solution • Multiple repositories are not necessarily bad – as interim solutions, Excel has been working quite well • Minimal functionality includes • ability to create, read, update, delete, and evolve metadata items • Remember the 1st law of data management – In order to manage metadata, you need metadata repository functions !51Copyright 2019 by Data Blueprint Slide # !52Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies
  35. 35. Time Spent by Data Management Teams Across Disciplines !53Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6 Data Discovery Technologies • Data analysis software technologies deliver up to 10X productivity over manual approaches • Based on a powerful computing technology that allows data engineers to quickly form candidate hypotheses with respect to the existing data structures • Hypotheses are then presented to the SMEs (both business and technical) who confirm, refine, or deny them • Allows existing data structures to be inferred at rate that is an order of magnitude more effective than previous manual approaches • Pioneers include Evoke->CSI, Metagenix->Ascential->IBM, Sypherlink !54Copyright 2019 by Data Blueprint Slide # Profiling
 Discovery 
 Analysis
  36. 36. How has this been done in the past? Old • Manually • Brute force • Repository dependent • Quality indifferent • Not repeatable New • Semi-automated • Engineered • Repository independent • Integrated quality • Repeatable • Currency • Accuracy !55Copyright 2019 by Data Blueprint Slide # !56Copyright 2019 by Data Blueprint Slide # Select an Attribute to get a list of values Double-click a value to see rows with that value
  37. 37. Reactive

 
 
 
 
 
 
 Proactive Comparing Weekly Progress Monday Morning:
 Model
 preparation Afternoon:
 Model refinement/ validation session Tuesday Morning:
 Model refinement/ validation session Afternoon:
 Model refinement/ validation session Wednesday Morning:
 Model
 preparation Afternoon:
 Model refinement/ validation session Thursday Morning:
 Model refinement/ validation session Afternoon:
 Model refinement/ validation session Friday Morning:
 Model
 preparation Afternoon:
 Model refinement/ validation session Monday Morning:
 Model 
 preparation Afternoon:
 Model 
 preparation Tuesday Morning:
 Model 
 preparation Afternoon:
 Model refinement/ validation session Wednesday Morning:
 Model 
 preparation Afternoon:
 Model 
 preparation Thursday Morning:
 Model 
 preparation Afternoon:
 Model refinement/ validation session Friday Morning:
 Model 
 preparation Afternoon:
 Model 
 preparation 57 Copyright 2019 by Data Blueprint 
 Baseline
 
 Relative Condition &
 Amount of
 Evidence [ ] Confounding characteristics
 
 Data Handling, Operating Environment 
 & Language Factor
 
 (Factor => 1) [ ][ Beneficial characteristics 
 
 Key End User Participation & Net Automation Impact
 
 (Impact =<1) ] Historical organizational reverse engineering performance data [ ] = Project characteristics "The purpose of the Preliminary System Survey is to determine how long and how 
 many resources will be required to reverse engineer the selected system components." [ ]Project characteristics = Project Estimate Preliminary System Survey (PSS) 58 Copyright 2019 by Data Blueprint
  38. 38. !59Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies Data Quality Engineering Tools • 4 categories of activities: 1. Analysis 2. Cleansing 3. Enhancement 4. Monitoring
 
 
 
 
 
 
 
 
 • Principal tools: 1. Data Profiling 2. Parsing and Standardization 3. Data Transformation 4. Identity Resolution and Matching 5. Enhancement 6. Reporting !60Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  39. 39. DQ Tools 1. Data Profiling – Need to be able to distinguish between good and bad data before making any improvements – Data profiling is a set of algorithms for 2 purposes: • Statistical analysis and assessment of the data quality values within a data set • Exploring relationships that exist between value collections within and across data sets !61Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International DQ Tools 2. Parsing & Standardization – Data parsing tools enable the definition of patterns that feed into a rules engine used to distinguish between valid and invalid data values – Actions are triggered upon matching a specific pattern – When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations !62Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International https://www.youtube.com/watch?v=r9UhJxFT5rk
  40. 40. DQ Tools 3. Data Transformation – Upon identification of data errors, trigger data rules to transform the flawed data – Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation – Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base !63Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International DQ Tools 4. Identify Resolution & Matching – Basic approaches to matching: – Deterministic • Relies on defined patterns and rules for assigning weights and scores to determine similarity – Predictable • Only as good as anticipations of the rules developers – Probabilistic • Uses statistical techniques to assess probabilities that pairs of records represent the same entity – Not reliant on rules • Refined based on experience -> matchers can improve precision as more data is analyzed !64Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  41. 41. DQ Tools 5. Enhancement – A method for adding value to information by accumulating additional information about a base set of entities and then merging all the sets of information to provide a focused view Examples: – Time/date stamps – Auditing information – Contextual information – Geographic information – Demographic information – Psychographic information !65Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International DQ Tools 6. Reporting • Good reporting supports: – Inspection and monitoring of conformance to data quality expectations – Monitoring performance of data stewards conforming to data quality SLAs – Workflow processing for data quality incidents – Manual oversight of data cleansing and correction • Associate report results w/: – Data quality measurement – Metrics – Activity !66Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  42. 42. !67Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies Data acquisition activities Data usage activitiesData storage Traditional Data Life Cycle !68Copyright 2019 by Data Blueprint Slide #
  43. 43. DataLifeCycleModel !69Copyright 2019 by Data Blueprint Slide # Metadata Creation Data Assessment MetadataRefinement DataRefinement Data Manipulation DataCreation Data Utilization Metadata Structuring Data Storage Metadata Data Dimension Focus/Phase: Refinement Creation Structuring Creation Manipulation Refinement Utilization Assessment Data Architecture Quality Data architecture quality is the focus of metadata creation & refinement efforts. ↵ ↵ ↵ Data Model Quality Data model quality is the focus of metadata refinement & structuring efforts ↵ ↵ ↵ Data Value Quality Data value quality is the focus of the data creation, manipulation, and refinements phases. ↵ ↵ ↵ ↵ Data Representation Quality Data representation quality is the focus of data utilization phase. ↵ ↵ Dimensions Related to Phases • Data architecture quality is the focus of metadata creation and refinement efforts. • Data model quality is the focus of metadata structuring efforts • Data value quality is the focus of the data creation, manipulation, and refinements phases. • Data architecture and model quality are the focus of metadata refinement efforts. • Data representation quality is the focus of data utilization and assessment phase. !70Copyright 2019 by Data Blueprint Slide #
  44. 44. !71Copyright 2019 by Data Blueprint Slide # By the end of this session, you should have a better understanding of data management technologies in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools Approaching Data Management Technologies Other Technologies Data Integration Definition: • Pulling together and reconciling dispersed data for analytic purposes that organizations have maintained in multiple, heterogeneous systems. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, standardized and transformed. • Other tools include: – Servers – EII technologies – Portals – Conversion tools !72Copyright 2019 by Data Blueprint Slide # Source: http://www.information-management.com
  45. 45. Portal Options !73Copyright 2019 by Data Blueprint Slide # [Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed Content Strategies - San Francisco, CA 23 January 2001] Legacy Systems Transformed Into Web-services Accessed Through a Portal !74Copyright 2019 by Data Blueprint Slide # Organizational Portal Saturday, April 6, 2019 - All systems operational! Organizational News • Organizational Early News • Industry News • Press Releases • Newsletters Organizational IT • Service Desk • Settings Email • 320 new msgs, 14,572 total • Send quick email Organizational Essentials • Knowledge network • Employee assistance • IT procurement • Organizational media design • Organizational merchandise Search Go Stocks Full Portfolio
 
 XYZ
 YYZ
 ZZZ Market Update
 
 50
 29.5
 45.25 As of: 
 Saturday, April 6, 2019 Get Quote Reporting Regional • Northeast • Northwest • Southeast • Southwest • Midnorth • Midsouth State • Alabama • Arkansas • Georgia • Mississippi • Vermont • Virginia Legacy
 Application 1 Legacy
 Application 2 Legacy
 Application 3 Legacy
 Application 4 Legacy
 Application 5 Web
 Service 1.1 Web
 Service 1.2 Web
 Service 1.3 Web
 Service 2.1 Web
 Service 2.2 Web
 Service 3.1 Web
 Service 3.2 Web
 Service 4.1 Web
 Service 4.2 Web
 Service 5.1 Web
 Service 5.2 Web
 Service 5.3
  46. 46. !75Copyright 2019 by Data Blueprint Slide # Top Tier Demo Portals as a Data Quality Tool !76Copyright 2019 by Data Blueprint Slide #
  47. 47. Defining Spaces • ETL Extract Transform, Load – delivers aggregated data to a 
 new database • EAI Enterprise Application Integration – connects applications to other applications in a predictable manner using 
 pre-established connections • EII Enterprise Information Integration – between ETL and EAI - delivers tailored views of information to users at the time that it is required !77Copyright 2019 by Data Blueprint Slide # Meta-Matrix Integration Example !78Copyright 2019 by Data Blueprint Slide #
  48. 48. Approaching Data Management Technologies By the end of this session, you should have a better understanding of data management technologies and their use as part of a people process & technology 3-legged stool in terms of: • Technology Considerations • Data Technology Architecture • CASE Tools • Repositories • Profiling/Discovery Tools • Data Quality Engineering Tools • Data Life Cycle • Other Technologies: – Servers, EII Technologies, Portals, Conversion Tools !79Copyright 2019 by Data Blueprint Slide # Gartner Key Findings • Data assets continue to drive strategic cloud service providers’ offerings • Machine learning is increasingly popular–key uses: – Data integration tools, – Database management systems, – Data quality tools and – Metadata management solutions • Increasing use of cloud for production applications requiring that database in the cloud • Organizations applying a combination of data warehouses, data lakes and data hubs can achieve greater flexibility to support a range of use cases compared to those applying only one. !80Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6
  49. 49. + = Questions? !81Copyright 2019 by Data Blueprint Slide # It’s your turn! 
 Use the chat feature or Twitter (#dataed) to submit your questions now! IT Business Data Perceived State of Data !82Copyright 2019 by Data Blueprint Slide #
  50. 50. Data Desired To Be State of Data !83Copyright 2019 by Data Blueprint Slide # IT Business The Real State of Data !84Copyright 2019 by Data Blueprint Slide # Data IT Business
  51. 51. It isn't possible to go digital Digital !85Copyright 2019 by Data Blueprint Slide # aBy just spelling 'data' Dat !86Copyright 2019 by Data Blueprint Slide #
  52. 52. It requires more work Data !87Copyright 2019 by Data Blueprint Slide # a Lady Ada Augusta King Rule !88Copyright 2019 by Data Blueprint Slide # https://people.well.com/user/adatoole/bio.htm
  53. 53. Recent Technology Realization !89Copyright 2019 by Data Blueprint Slide # GarbageIn➜ GarbageOut!Recent GI➜GO! !90Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Garbage 
 Data Garbage 
 Results Data Warehouse Machine Learning Business Intelligence Block ChainAIMDM Data Governance AnalyticsTechnology
  54. 54. GI➜GO! !91Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Garbage 
 Data Garbage 
 Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Analytics Technology Data Governance Quality In ➜ Quality Out! !92Copyright 2019 by Data Blueprint Slide # Perfect 
 Model Quality 
 Data Good 
 Results Data Warehouse Machine Learning Business Intelligence Block Chain AI MDM Analytics Technology Data Governance
  55. 55. More Data Management Tools !93Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International More Data Management Tools !94Copyright 2019 by Data Blueprint Slide # from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
  56. 56. Gartner Recommendations Leaders seeking to advance their strategies and deliver effective solutions must: • Validate both product capabilities and data availability • Leverage automation to free up scarce specialist resources • Update policies and governance 
 standards to address cloud and 
 database platform as a service 
 (dbPaaS) before purchasing • Favor providers with a clear 
 roadmap for ML enabling better 
 business outcomes or service levels • Don’t expect a single piece of 
 infrastructure to meet all of your 
 needs. Plan the core by 
 assessing use case type, 
 processing flexibility and 
 semantic requirements !95Copyright 2019 by Data Blueprint Slide # https://www.gartner.com/document/3894971?ref=solrAll&refval=219836558&qid=de595a5685b6f86db0ec6 Upcoming Events May Webinar
 Data Management Maturity: 
 Achieving Best Practices using DMM
 May 14, 2019 @ 2:00 PM ET June Webinar
 Data Governance: 
 Achieving Best Practices using DMM
 June 11, 2019 @ 2:00 PM ET 
 Sign up for webinars at: 
 www.datablueprint.com/webinar-schedule !96Copyright 2019 by Data Blueprint Slide # Brought to you by:
  57. 57. 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056 Copyright 2019 by Data Blueprint Slide # 97

×