Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Self-Service Data Analysis, Data Wrangling, Data Munging,
and Data Modeling - How Do They Fit Together?
Donna Burbank
Glob...
Global Data Strategy, Ltd. 2017
Donna Burbank
Donna is a recognised industry expert in
information management with over 20...
Global Data Strategy, Ltd. 2017
Lessons in Data Modeling Series
• January 26th How Data Modeling Fits Into an Overall Ente...
Global Data Strategy, Ltd. 2017
Agenda
• What is Self Service Data Prep, “Data Munging” and “Data Wrangling”?
• The Good, ...
Global Data Strategy, Ltd. 2017
What is Data Wrangling, Munging & Self-Service Data Prep?
Data wrangling (sometimes referr...
Global Data Strategy, Ltd. 2017
Aimed at Business Stakeholders & Data Scientists
• According to a recent DATAVERSITY surve...
Global Data Strategy, Ltd. 2017
Sample Tools in the Self Service Data Prep
• The following list of products and vendors ar...
Global Data Strategy, Ltd. 2017
Good Wrangling and Bad Wrangling
8
Bad Wrangling Good Wrangling
• Performed because a
soli...
Global Data Strategy, Ltd. 2017
The Reluctant Wrangler
9
Raw data used in Self-Service Analytics and BI environments is
of...
Global Data Strategy, Ltd. 2017
Data Wrangling? … or Herding Cats?
10
Global Data Strategy, Ltd. 2017
Reporting is Only as Good as the Underlying Architecture & Definitions
11
• Modern tools m...
Global Data Strategy, Ltd. 2017
Today’s Reporting Data Sets are Complex
• Reporting today goes beyond traditional relation...
Global Data Strategy, Ltd. 2017
Disparate Data Sources
• The 2016 DATAVERSITY Emerging Trends in Metadata survey revealed ...
Global Data Strategy, Ltd. 2017
In other words…Herding Cats
14
Global Data Strategy, Ltd. 2017
Paradigm Shift in the Way We Look at “Reporting”
Traditional
• Top-Down, Hierarchical
• De...
Global Data Strategy, Ltd. 2017
“Traditional” way of Looking at the World: Hierarchies
• Carolus Linnaeus in 1735 establis...
Global Data Strategy, Ltd. 2017
“New” Way of Looking at the World - Emergence
In philosophy, systems theory, science, and ...
Global Data Strategy, Ltd. 2017
Data Warehouse vs. Data Lake
18
Data Warehouse Data Lake
A Data Lake is a storage reposito...
Global Data Strategy, Ltd. 2017
Integrating the Data Lake & Traditional Data Sources
• The Data Lake has a different archi...
Global Data Strategy, Ltd. 2017
Combining DW & Big Data Can Provide Valuable Information
• There are numerous ways to gain...
Global Data Strategy, Ltd. 2017
Organizational Siloes
21
Data Lake & Data
Scientist
• Exploratory projects
• Quick wins
• ...
Global Data Strategy, Ltd. 2017
Organizational Siloes
22
Self-Service Data
Prep & BI Reporting
• Exploratory projects
• Qu...
Global Data Strategy, Ltd. 2017
Reducing Time to Insight is a Key Driver for
Self Service Data Prep
• According to a TDWI’...
Global Data Strategy, Ltd. 2017
Finding Balance – Model What Matters
24
• It’s important to find a balance between
• Manag...
Global Data Strategy, Ltd. 2017
Find a Balance in Implementing Data Architecture
• Find the Right Balance
• Data Architect...
Global Data Strategy, Ltd. 2017
Implement Fit-for-Purpose Data Modeling & Governance
• The data modeling & governance rigo...
Global Data Strategy, Ltd. 2017
Summary
• As more business stakeholders see the value of data, Self Service Data Preparati...
Global Data Strategy, Ltd. 2017
About Global Data Strategy, Ltd
• Global Data Strategy is an international information man...
Global Data Strategy, Ltd. 2017
Contact Info
• Email: donna.burbank@globaldatastrategy.com
• Twitter: @donnaburbank
@Globa...
Global Data Strategy, Ltd. 2017
White Paper: Emerging Trends in Metadata Management
30
Free Download
• Download from www.d...
Global Data Strategy, Ltd. 2017
Lessons in Data Modeling Series
• January 26th How Data Modeling Fits Into an Overall Ente...
Global Data Strategy, Ltd. 2017
Questions?
32
Thoughts? Ideas?
Prochain SlideShare
Chargement dans…5
×

Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling – How Do They Fit Together?

1 713 vues

Publié le

Self-Service data analysis holds the promise of more rapid time-to-value for both business and IT users as advanced tooling & visualization helps make sense of raw and source data sets. Does this mean that the paradigm of ‘design-then-build’ that’s typical of data modeling is no longer relevant? Or is it more relevant than ever, as more eyes on the data means more questions about core business definitions.

Join Donna Burbank for this webinar to discuss the realities of where data modeling fits in this new paradigm.

Publié dans : Technologie

Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling – How Do They Fit Together?

  1. 1. Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling - How Do They Fit Together? Donna Burbank Global Data Strategy Ltd. Lessons in Data Modeling DATAVERSITY Series June 22nd, 2017
  2. 2. Global Data Strategy, Ltd. 2017 Donna Burbank Donna is a recognised industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture. Her background is multi- faceted across consulting, product development, product management, brand strategy, marketing, and business leadership. She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. In past roles, she has served in key brand strategy and product management roles at CA Technologies and Embarcadero Technologies for several of the leading data management products in the market. As an active contributor to the data management community, she is a long time DAMA International member, Past President and Advisor to the DAMA Rocky Mountain chapter, and was recently awarded the Excellence in Data Management Award from DAMA International in 2016. She was on the review committee for the Object Management Group’s (OMG) Information Management Metamodel (IMM) and the Business Process Modeling Notation (BPMN). Donna is also an analyst at the Boulder BI Train Trust (BBBT) where she provides advices and gains insight on the latest BI and Analytics software in the market. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co-authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications. She can be reached at donna.burbank@globaldatastrategy.com Donna is based in Boulder, Colorado, USA. 2 Follow on Twitter @donnaburbank Today’s hashtag: #LessonsDM
  3. 3. Global Data Strategy, Ltd. 2017 Lessons in Data Modeling Series • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance 3 This Year’s Line Up  Related topic – Self Service BI
  4. 4. Global Data Strategy, Ltd. 2017 Agenda • What is Self Service Data Prep, “Data Munging” and “Data Wrangling”? • The Good, the Bad, and the Ugly • Integrating the Data Warehouse & Data Lake • Data Governance & Organizational Considerations 4 What we’ll cover today
  5. 5. Global Data Strategy, Ltd. 2017 What is Data Wrangling, Munging & Self-Service Data Prep? Data wrangling (sometimes referred to as Data munging) is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. - Wikipedia, June 2017 Data munging … is sometimes used for vague data transformation steps that are not yet clear to the speaker. - Wikipedia, June 2017 As their name implies, the key ingredient of data preparation platforms is their ability to provide self-service capabilities that allow knowledgeable users (but who are not IT experts) to combine, transform and cleanse relevant data prior to analysis: to "prepare" it. Most tools in this category are targeted at business analysts but there are products aimed more at data scientists. - Philip Howard, Bloor Research 5
  6. 6. Global Data Strategy, Ltd. 2017 Aimed at Business Stakeholders & Data Scientists • According to a recent DATAVERSITY survey on Emerging Trends in Data Architecture, new and disparate roles are often involved in developing a data architecture. • Below is a “sneak peak” of the results (due to be published in October). 6 Answer Response Percent Data Architect 90.0% Data Modeler 65.3% Enterprise Architect 66.5% Business Architect 51.2% Systems Developer 17.1% Programmer 16.5% Database Administrator (DBA) 37.6% Data Scientist 27.6% ETL or Database Developer 36.5% Business Stakeholder(s) 32.9% Program Manager 12.9% Data Quality Administrator 30.0% Data Governance Officer 50.0% Don't know 2.4% Other (please specify) 8.2% What role(s) are typically responsible for creating a Data Architecture? [Select all that apply] While Data Architects & related roles are still responsible for the bulk of data architecture decisions, often with traditional ETL techniques. Business Stakeholders and Data Scientists also play a significant role, often with self-service data prep tools.
  7. 7. Global Data Strategy, Ltd. 2017 Sample Tools in the Self Service Data Prep • The following list of products and vendors are commonly considered in the Self Service Data Preparation category. • This list is not inclusive and is not an endorsement of any product, but is meant to indicate the type of product we’re talking about today. 7 • Pure Play Vendors • Alation • Alteryx • Paxata • Tamr • Trifacta • Traditional data integration vendors • Informatica • Syncsort (Unify) • Etc. • BI vendors • Pentaho • Tableau • Qlik • Etc.
  8. 8. Global Data Strategy, Ltd. 2017 Good Wrangling and Bad Wrangling 8 Bad Wrangling Good Wrangling • Performed because a solid data architecture is lacking – i.e. work- arounds & cleanup. • Done to avoid data governance restrictions. • Increases Confusion & Decreases Time to Insight • Part of data exploration & analysis • Done within data governance restrictions. • Leverages defined standards (e.g. Reference Data) • Produces Faster Time to Insight
  9. 9. Global Data Strategy, Ltd. 2017 The Reluctant Wrangler 9 Raw data used in Self-Service Analytics and BI environments is often so poor that many data scientists and BI professionals spend an estimated 50 – 90% of their time cleaning and reformatting data to make it fit for purpose.(4 Source: DataCenterJournal.com Correcting poor data quality is a Data Scientist’s least favorite task, consuming on average 80% of their working day Source: Forbes 2016
  10. 10. Global Data Strategy, Ltd. 2017 Data Wrangling? … or Herding Cats? 10
  11. 11. Global Data Strategy, Ltd. 2017 Reporting is Only as Good as the Underlying Architecture & Definitions 11 • Modern tools make it easy to create visual reports & graphs from data. • But without business context, or “metadata”, these reports are of little value. What does ‘F2’ refer to? Are there standard code sets? Does this number represent a date? Computing report…elapsed time 10 hours, 27 seconds… Why does it take so long for the report to run? • A robust data architecture provides data sets that have: • Business context & definition • Common structure & formatting • Fast & easily-reportable data sets
  12. 12. Global Data Strategy, Ltd. 2017 Today’s Reporting Data Sets are Complex • Reporting today goes beyond traditional relational databases, which adds to the complexity of preparing data to create effective and intuitive reports and analytics. 12 COBOL Legacy Systems JCL Spreadsheets Media Social Media IoTOpen Data Databases Data Models Documents Data In Motion
  13. 13. Global Data Strategy, Ltd. 2017 Disparate Data Sources • The 2016 DATAVERSITY Emerging Trends in Metadata survey revealed some interesting findings about what types of data & metadata organizations will be managing now and in the future. • Not all are easily managed in traditional data modeling tools (although many are…) 13 = Supported by most data modeling tools Now Future
  14. 14. Global Data Strategy, Ltd. 2017 In other words…Herding Cats 14
  15. 15. Global Data Strategy, Ltd. 2017 Paradigm Shift in the Way We Look at “Reporting” Traditional • Top-Down, Hierarchical • Design, then Implement • “Passive”, Push technology • “Manageable” volumes of information • “Stable” rate of change • Business Intelligence “Big Data” / Exploration • Distributed, Democratic • Discover and Analyze • Collaborative, Interactive • Massive volumes of information • Rapid and Exponential rate of growth • Data Science Design Implement Discover Analyze
  16. 16. Global Data Strategy, Ltd. 2017 “Traditional” way of Looking at the World: Hierarchies • Carolus Linnaeus in 1735 established a hierarchy/taxonomy for organizing and identifying biological systems. Kingdom Phylum Class Order Family Genus Species
  17. 17. Global Data Strategy, Ltd. 2017 “New” Way of Looking at the World - Emergence In philosophy, systems theory, science, and art, emergence is the way complex systems and patterns arise out of a multiplicity of relatively simple interactions. - Wikipedia I love my new Levis jeans. Is Levi coming to my party? Sale #LEVIS 20% at Macys. LOL. TTYL. Leving soon.
  18. 18. Global Data Strategy, Ltd. 2017 Data Warehouse vs. Data Lake 18 Data Warehouse Data Lake A Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure & requirements are not defined until the data is needed. A Data Warehouse is a storage repository that holds current and historical data used for creating analytical reports. Data structures & requirements are pre-defined, and data is organized & stored according to these definitions.
  19. 19. Global Data Strategy, Ltd. 2017 Integrating the Data Lake & Traditional Data Sources • The Data Lake has a different architecture & purpose than traditional data sources such as data warehouses. • But the two environments can co-exist to share relevant information. 19 Data Analysis & Discovery – Data Lake Enterprise Systems of Record Data Governance & Collaboration Master & Reference Data Data Warehouse Data MartsOperational Data Security & Privacy Sandbox Lightly Modeled Data Data Exploration Reporting & Analytics Advanced Analytics Self-Service BI Standard BI Reports
  20. 20. Global Data Strategy, Ltd. 2017 Combining DW & Big Data Can Provide Valuable Information • There are numerous ways to gain value from data • Relational Database and Data Warehouse systems are one key source of value • Customer information • Product information • Big Data can offer new insights from data • From new data sources (e.g. social media, IoT) • By correlating multiple new and existing data sources (e.g. network patterns & customer data) • Integrating DW and Big Data can provide valuable new insights. • Examples include: • Customer Experience Optimization • Churn Management • Products & Services Innovation New InsightsData Warehouse 20
  21. 21. Global Data Strategy, Ltd. 2017 Organizational Siloes 21 Data Lake & Data Scientist • Exploratory projects • Quick wins • Often Little documentation & governance Data Warehouse & Data Architects • Enterprise reporting • Long-term projects • Data Standards • Metadata & Governance Data Warehouse • Too often, there are organizational & cultural silos that limit the sharing between the Data Lake and Data Warehouse Data Lake
  22. 22. Global Data Strategy, Ltd. 2017 Organizational Siloes 22 Self-Service Data Prep & BI Reporting • Exploratory projects • Quick wins • Little documentation & governance Data Warehouse & Traditional BI Reporting • Enterprise reporting • Long-term projects • Data Standards • Metadata & Governance Data Warehouse • Unfortunately, these siloes often also exist between business users and traditional data warehouse & BI architects Report requirements thrown ‘over the wall’….and wait… Departmental Database
  23. 23. Global Data Strategy, Ltd. 2017 Reducing Time to Insight is a Key Driver for Self Service Data Prep • According to a TDWI’s Best Practices Report on “Improving Data Preparation for Business Analytics” from Q3 2016, the following are key drivers for Self-Service Data Preparation • 81% Shorten time to business insight • 76% Increase data-driven decision making • 53% Improve reaction time to business conditions • 49% Operational efficiency for frontline works • 43% Gain a single, complete view of relevant data 23 • The most popular sources include traditional ones: • 87% Relational databases • 83% Data warehouse • 79% Spreadsheet or desktop database Departmental Database
  24. 24. Global Data Strategy, Ltd. 2017 Finding Balance – Model What Matters 24 • It’s important to find a balance between • Managing & modeling “trusted data sets” • Giving users the flexibility to explore. • Most users will find these trusted data sets a welcome asset, but don’t want to be restricted from doing data exploration when appropriate. IoT Log Files Data Warehouse Master Data Reference Data Structure Flexibility & Exploration
  25. 25. Global Data Strategy, Ltd. 2017 Find a Balance in Implementing Data Architecture • Find the Right Balance • Data Architecture projects can have the reputation for being overly “academic”, long, expensive, etc. • No architecture at all can cause chaos. • When done correctly, Data Architecture helps improve efficiency and better align with business priorities 25 Focus on Business Value Business Value Too Academic, nothing gets done Too “Wild West”, nothing gets done - chaos
  26. 26. Global Data Strategy, Ltd. 2017 Implement Fit-for-Purpose Data Modeling & Governance • The data modeling & governance rigor depends on the usage and purpose of data • As a general rule, the more the data is shared across & beyond the organization, the more formal governance needs to be 26 Core Enterprise Data Functional & Operational Data Exploratory Data Reference & Master Data Core Enterprise Data • Common data elements used by multiple stakeholders across Bus, LOBs, functional areas, applications, etc. • Highly governed • Highly published & shared Functional & Operational Data • Lightly modeled & prepared data for limited sharing & reuse • Collaboration-based governance • May be future candidates for core data Exploratory Data • Raw or lightly prepped data for exploratory analysis • Mainly ad hoc, one-off analysis • Light touch governance Examples • Operational Reporting • Non-productionized analytical model data • Ad hoc reporting & discovery Examples • Raw data sets for exploratory analytics • External & Open data sources Examples • Common Financial Metrics: for Financial & Regulatory Reporting • Common Attributes: Core attributes reused across multiple areas (e.g. Customer name, Account ID, Address) Master & Reference Data • Common data elements used by multiple stakeholders across functional areas, applications, etc. • Highly governed • Highly published & shared Examples • Reference Data: Procedure codes, Country Codes, etc • Master Data: Location, Customer, Product
  27. 27. Global Data Strategy, Ltd. 2017 Summary • As more business stakeholders see the value of data, Self Service Data Preparation is on the rise • Common users include data scientists and business stakeholders • While the use cases for these two stakeholder categories are different, both are driven by the need for: • Time to Value • Freedom to Explore • Create a Data Governance Framework that provides “just enough” governance • Allowing flexibility where appropriate • Applying rigor and structure where necessary • Providing trusted data sets for all • Data Modeling used correctly will: • Increase time to insight • Increase collaboration • Increase business value • Happy Wrangling!
  28. 28. Global Data Strategy, Ltd. 2017 About Global Data Strategy, Ltd • Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • Clear & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. 28 Data-Driven Business Transformation Business Strategy Aligned With Data Strategy Visit www.globaldatastrategy.com for more information
  29. 29. Global Data Strategy, Ltd. 2017 Contact Info • Email: donna.burbank@globaldatastrategy.com • Twitter: @donnaburbank @GlobalDataStrat • Website: www.globaldatastrategy.com 29
  30. 30. Global Data Strategy, Ltd. 2017 White Paper: Emerging Trends in Metadata Management 30 Free Download • Download from www.dataversity.net • Also available on www.globaldatastategy.com
  31. 31. Global Data Strategy, Ltd. 2017 Lessons in Data Modeling Series • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance 31 This Year’s Line Up
  32. 32. Global Data Strategy, Ltd. 2017 Questions? 32 Thoughts? Ideas?

×