Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Webinar: Data Quality, Data Engineering, and Data Science

1 394 vues

Publié le

This webinar explores the organizational constructs and processes for enabling business to build better insights through Data Quality, Data Engineering, and Data Science. In particular, it examines the needs for:

A Data Lab to foster an open, questioning, and collaborative environment to develop the right data principles, patterns, and standards.
A Data Factory to implement those standards developed in the Data Lab.
Different Data Quality requirements in the Lab and Factory, how Data Engineering aims to meet both needs.
Data Engineering, in advance of the sexier Data Science, to create the right environments in both the lab and the factory and to actually examine the data.
All of the above to provide the data needed to create more efficient processes for the Data Scientists to be more effective in their roles.

Join this webinar to hear Tom “The Data Doc” Redman discuss with Dr. Prashanth Southekal, recent author of Data for Business Performance, the details of achieving better insights with examples of a case study from an Oil and Gas company.

Publié dans : Business
  • Soyez le premier à commenter

Webinar: Data Quality, Data Engineering, and Data Science

  1. 1. Data Quality, Data Engineering and Data Science for Better Insights – Series of Questions 1 Tom Redman Prashanth Southekal
  2. 2. Our plan • Look holistically at the union of data quality engineering, and science. • An open-ended discussion 2
  3. 3. • Use The Leader’s Data Manifesto to start or continue important conversations about managing data assets • Use these conversations to initiate action within your organization to better manage data assets • Support this movement and show you are committed to change by signing the manifesto online at www.dataleaders.org © 2017 dataleaders.org
  4. 4. Bad Data is a hidden killer Baseline: • Best estimate is 45% of newly-created data records have a critical error. • Best estimate is CoPDQ ~ 20% of revenue. In Data Science: • Impact is different: • Errors may “cancel out.” • Bad data  Bad decision/prediction  Impacts thousands (e.g., financial crisis) • Bad data  Bad algorithm  Damage potentially unlimited • etc • Aligning data sources is a far bigger challenge. • Aligning decision makers is a far bigger challenge • Etc ©DQS, 2000-2017
  5. 5. Data for Business Performance My Book – Data for Business Performance is in line with you have just said. Specifically the book has three key elements that makes it special. 1. The book is holistic 2. The book is for practitioners 3. The book is technology agnostic 5
  6. 6. Reference Data Master Data Transactional DataMetadata Relationships between Different Data Types 6 Business Data Technical Data © 2017 DBP-Institute
  7. 7. 7 Example of Integrated Business Data Reference Data Reference Data Master Data Master Data Transactional DataReference Data Transactional Data © 2017 DBP-Institute
  8. 8. Structured vs. Unstructured data Depending on the manner in which data is initially created or recorded, data can be categorized into two main forms. • Structured Data. Data that resides in a fixed field within a record or file is structured data. • Unstructured Data. Unstructured data is the data in its native state. i.e. data doesn't have a any predefined data structure when created. 8 Customer Identifier 10 Digit Numeric Code Description with 25 character Structured Data Un-Structured Data © 2017 DBP-Institute
  9. 9. Taxonomy holds the key in Unstructured Data 9© 2017 DBP-Institute
  10. 10. Data Science in the Big Picture “Data revolution” or not, we need to get better at practically everything: • Day-in, day-out work: Largely quality. • Management. Planning. • Put data to work in new and exciting ways. My current list: – Making better decisions – Innovation – Informationalization – Providing content – Infomediation – Creating and Leveraging asymmetries • Seeking out, leveraging, and protecting proprietary data. 10©DQS, 2000-2017
  11. 11. Putting Data to work: End-to-end process ©DQS, 2000-2017 Data Discovery (data science) Delivery “Dollars” The D4 Process: Acquire and understand “potentially interesting” data Find something “truly interesting” in that data Deliver the discovery in the form of a product/service/report “Monetize” the discovery
  12. 12. 12 Origination Capture Validation Processing Distribution Aggregation Interpretation Consumption Data Storage Data Security Dominance in the Data Lifecycle (DLC) Data Engineering Data Science 8 of the 10 stages in the DLC pertain to Data Engineering © 2017 DBP-Institute
  13. 13. Data Engineering V/s Data Cleansing 13 Data Engineering Origination Capture Validation Processing Distribution Aggregation Data Cleansing 60% of the Effort in deriving Insights © 2017 DBP-Institute
  14. 14. Where do the data scientists sit? Basic Process Improvements New, sophisticated algorithms Fundamental New Discovery In the line: And everyone is involved In a “lab” Analytical “sophistication” “Home” ©DQS, 2000-2017
  15. 15. Data Lab and Data Factory • Lab for discovery, new products: Different management mind-set, people, goals • Factory for scale, control, profit: • Connect the two! 15©DQS, 2000-2017
  16. 16. 16 Building blocks of Data Factory Manage Core business processes in the SoR Manage Reference and Master data with Standards Enable Data Integration using Standards Position Data Governance as a Business Function What’s required in the factory? 1 2 3 4 © 2017 DBP-Institute
  17. 17. Most important takeaways: Prashanth: • Business Performance can be achieved any aligning data to the business goals, key questions, and KPIs. • There is no data management endeavour without a customer. • Data Quality is a journey and NOT a destination Tom: • The “data space” is advancing too slowly and it is time for data practitioners to push far harder. • The “easiest” place to begin is with quality. And the benefits stun. • Data practitioners must also take on the tough organizational issues. 17
  18. 18. Our Profiles 18 Dr. Prashanth H Southekal is the Managing Principal of DBP-Institute. He brings over 20 years of Information Management from companies such as SAP AG, Accenture, Deloitte, P&G, and General Electric. Dr. Southekal has published three books on Information Management including "Data for Business Performance". Dr. Thomas C Redman "the Data Doc” is an Advisor at DBP- Institute. Dr Redman is a world renowned thought leader who has helped blue-chip companies such as Chevron, Shell, JP Morgan, and AT&T make big improvements in Information Management. He has written dozens of papers and five books including the most popular “Data Driven: Profiting from Your Most Important Business Asset”.