Contenu connexe

Plus de Precisely(20)


Keeping the Pulse of Your Data: Why You Need Data Observability to Improve Data Quality

  1. Keeping the Pulse of Your Data: Why You Need Data Observability to Improve Data Quality
  2. Housekeeping Webinar Audio • Today’s webcast audio is streamed through your computer speakers • If you need technical assistance with the web interface or audio, please reach out to us using the Q&A box Questions Welcome • Submit your questions at any time during the presentation using the Q&A box. If we don't get to your question, we will follow-up via email Recording and slides • This webinar is being recorded. You will receive an email following the webinar with a link to the recording and slides 2
  3. Speakers Julie Skeen Sr. Product Marketing Manager Shalaish Koul Principal Sales Engineer
  4. Agenda • Introduction to data observability • How data observability works • Use case examples & demonstration • Q&A 4
  5. 47% of newly created data records have at least one critical error 68% of organizations say disparate data negatively impacts their organization 84% of CEOs say that they are concerned about the integrity of the data they are making decisions on Data integrity is a business imperative
  6. Introduction to Data Observability • Data downtime disrupts critical data pipelines and processes that power downstream analytics and operations • Lack of visibility around health of data reduces confidence in business decisions • Traditional manual methods do not scale, are error-prone, and are resource intensive 6
  7. • “W. Edwards Deming The Father of Quality Management” started the observability concept 100 years ago • Observability is a key foundational concept of SPC, Lean, Six Sigma and any process dependent on building quality into repetitive tasks • Using statistical methods to control complex processes to ensure quality data products over time What is Data Observability? 7 IDC; Phil Goodwin and Stewart Bond, “IDC Market Glance: DataOps, 2Q21” (June 2021) Gartner, Hype Cycle for Data Management, 2022, Melody Chien, Ankush Jain, Robert Thanaraj, June 30, 2022
  8. Why Now? 8 • Businesses are more data-driven than ever • Problematic events are infrequent but can be catastrophic • User’s data expertise has evolved along with expectations to do more with it • Data proliferation and technology diversification • AI has evolved to support the complexity of the problem
  9. Data Observability is proactive, not reactive 9
  10. QA is done at the time of development Random issues are surfaced Users find and report defects 10 10 Typical Data Products and Pipelines Traditionally, the quality of a data product or pipeline is ensured during the development process and not throughout the operational lifecycle. Data Product(s) X Data Source #1 ? Data Source #2 ? Data Source #3 ? Data Source #4 ? Create and/or Source The Data Transform Data Enrich / Blend / Merge Data Publish an Expose Data P r o c e s s
  11. 11 11 Data Pipelines with Data Observability Data Observability tools the performance of data products and processes in order to detect significant variations before they result in the creation of erroneous work product in reports, analytics, insights and outcomes. Data Source #1 Data Source #2 Data Source #3 ! Data Source #4 Create and/or Source The Data Transform Data Enrich / Blend / Merge Data Publish an Expose Data P r o c e s s Issues identified and resolved prior to final product O b s e r v e Data Product(s)
  12. How Data Observability Works
  13. Intelligent Analysis Identifies Anomalies 13 AI identifies trends that traditional methods cannot easily find
  14. Data Observability and Data Quality 15 Rules Metadata • Alerts and dashboards for overall data health trending and threshold analysis • Anomaly detection based on volume, freshness, distribution and schema metadata • Predictive analysis simulating human intelligence to identify potential adverse data integrity events “Observability is the missing piece today to give our data stewards access to data discovery insights without having to go to IT for queries or reports” - Jean-Paul Otte, CDO, Degroof Petercam
  15. • Provides a single, searchable inventory of data assets • Allows technical users to easily search, explore, understand, and collaborate on critical data assets • Visualizes relationships, lineage, and business impact of your data • Supports the sharing of knowledge, comments, and surveys • Enables data stewards to monitor, audit, certify, and track data across its lifecycle through integrated data governance THE IMPORTANCE OF AN INTEGRATED DATA CATALOG
  16. Demonstration 17 • Alerts and Alerts Management – Volume, Data drifts, Schema drifts etc • Integrated Data Catalog • How to create and configure Observers • Self-served Data Discovery using Profiling
  17. Demonstration Recap 22 • Alerts and Alerts Management – Volume, Data drifts, Schema drifts etc • Integrated Data Catalog • How to create and configure Observers • Self-served Data Discovery using Profiling
  18. of your data with continuous measuring and monitoring associated with erroneous analytics that impact business decisions when outliers and anomalies are identified to solve operational issues and the cost of adverse events 1 2 3 4 when issues occur by understanding the cause 5 Data Observability benefits
  19. Proactively uncover data anomalies and take action before they become costly downstream issues
  20. The modular, interoperable Precisely Data Integrity Suite contains everything you need to deliver accurate, consistent, contextual data to your business - wherever and whenever it’s needed. 25
  21. 7 strong modules deliver exceptional value Data Integration Data Observability Data Governance Data Quality Geo Addressing Spatial Analytics Data Enrichment Break down data silos by quickly building modern data pipelines that drive innovation Proactively uncover data anomalies and take action before they become costly downstream issues Manage data policy and processes with greater insight into your data’s meaning, lineage, and impact Deliver data that’s accurate, consistent, and fit for purpose across operational and analytical systems Verify, standardize, cleanse, and geocode addresses to unlock valuable context for more informed decision making Derive and visualize spatial relationships hidden in your data to reveal critical context for better decisions Enrich your business data with expertly curated datasets containing thousands of attributes for faster, confident decisions
  22. Questions?
  23. Thank you

Notes de l'éditeur

  1. Welcome to our session today, I want to thank you all for joining us and let you know how excited we are to be with you today to talk about data observability and how it can help to improve your data quality.
  2. Just a bit of housekeeping before we get started. If you have any questions today please put them in the Q&A box. This session is being recorded and you will receive an email following the webinar with a link to the recording and slides.
  3. I’d like to introduce your speakers for today’s session. My name is Julie Skeen. I am a Sr. Product Marketing Manager at Precisely responsible for Data Quality and Observability. And with me I have my colleague Shalaish. …
  4. The plan for our session today is to share some introductory information about data observability. We will then discuss how data observability works and show you some use case examples in action. We will allow time at the end to answer questions. Let’s jump in…
  5. Here you see a few stats from Forbes, the Harvard Business Review, and Precisely’s own Data Trends Survey. Looking at these - When two-thirds of organizations say siloed data negatively impacts their data initiatives and almost half of newly created data records have at least one critical error, it is no wonder that 84% of CEOs doubt the integrity of the data on which they make decisions! So….let’s learn how data observability can help.
  6. There are a number of business challenges that occur that can be improved by using a data observability solution. See if any of these sound familiar. Something goes wrong within the data pipeline that impacts downstream operations or analytics. You might experience this as an email from IT saying that your BI tool is unavailable. Do you ever experience a lack of confidence in decision making based on the data in your BI tool or advanced analytics processes? Does your team find that writing scripts or other manual methods that were used in the past to look for operational data issues no longer scale as data volumes increase? If any of these challenges resonate, then your organization can benefit from a data observability solution.
  7. So what is data observability? Observability itself is not a new concept.  It started over 100 years ago and is a key concept in many process methodologies and is used in industries such as manufacturing as well as software development.  What is newer is applying these concepts to data. Data Observability ensures the reliability of your processes and analytics by alerting you to potential data integrity events.  It answers the question, “Is my data ready to be used?”    And by the term “used” we mean anywhere a business depends on the data being accurate.   This obviously means a lot of things to a lot of different people.   If you are dependent on a BI report, you may ask, is the data that is feeding my reports correct?  If you are a data engineer, moving data through pipelines, you may want to know if the data is being transferred correctly.  If you are a data scientist building advanced data science models, you want to know if the models are reflective of recent data changes or need to be retrained. Take for example a simple process where the finance dept makes business decisions based on a daily report of online orders.  If for some reason the data feeding the report from the source systems is incorrect, the insights and outcomes based on the report will be flawed, with obvious negative downstream impacts.  If this all sounds similar to data quality that’s because it is another way to ensure quality of your data. We will talk more about how data observability relates to traditional data quality in a few minutes.
  8. First we want to talk about why data observability is more important now than ever We all know businesses are using data for more purposes, and ultimately becoming more dependent on it. ​  Think about if you were driving a car while looking at your phone. Your primary goal is to get from point A to B safely, but if you are constantly inundated with distractions and you are trying to look at your phone while driving, this can cause you to drive off the road or collide with another vehicle. The same applies to data in your business. Your objective is to run your business, but if you are constantly distracted worrying about data issues that MIGHT happen then you’ll drive off the road. Similar to the gauges in your car you want only relevant alerts to help you drive your business successfully. These data issues are more relevant today for a variety of reasons, but they can be grouped into 2 main categories: data proliferation and technology diversification. By Data Proliferation I simply mean, there is more data – a lot more data. A Forbes Study estimates we’re creating 2.5 quintillion/exabytes of data every day, and this data is spanning a variety of locations such as cloud, on-prem, and hybrid cloud not to mention the movement of data across all these locations. By Technology Diversification I mean pivotal business transformation initiatives empowered by amazing next generation tech that represents a rethinking of established legacy systems. These efforts almost always span a diversity of vendors, applications, and technologies such as streaming, IOT and AI/ML. Data consumers, users and producers cannot take their hands off the wheel to validate that the data is ready for use as they need to stay focused on steering the business.   Data observability enables business value not only when fast insights allow for quick decisions, but also when the data being used for insights is trusted.
  9. It’s one thing to identify data issues, but more importantly data issues need to be corrected before the data is used in making decisions. Data issues will happen.  No system or process is perfect. Proactively addressing issues prevents them from impacting the business. As you see in this picture, Data Observability shines a light on your potential data issues with passive user interaction.  This captures the essence of Data Observability and just how simple it is to shine light on a potential problem and change course verses having to salvage the wreckage.   If there is one takeaway from this overview, please remember: Data Observability is proactive and intended to improve data reliability and reduce the data downtime.  Using a variety of techniques, Data Observability surfaces issues in source systems before they become significant.  We’re going to show you a few examples of those techniques today based on volume and data drift detection methods that answer questions such as:  Is my data ready to use?  Do I have all my data?  Do I have the right data?  And with data proliferation and technology diversification at play, reactive methods simply do not scale, are error-prone and resource intensive.   As the adage goes, an ounce of prevention is worth a pound of cure. 
  10. The process of managing the data life cycle and data journey and monitoring an enterprise has become incredibly sophisticated and complex. It is not out of the ordinary to see thousands of pipelines and transformations spanning hundreds of data sources. Data quality is often validated at the final delivered stage. Comparing this to a traditional manufacturing process, it's the equivalent of ensuring the quality of the finished product with a post-manufacturing inspection of the product. And as you can imagine, this process is incredibly costly from both time and a risk perspective. The same concept applies to your typical data products such as analytics, reports, applications and any pipelines or processes driving an outcome. For a data pipeline it means a stakeholder is finding the issue and reporting it to the creator of the analytics. Again, it means having to go back to an earlier stage after the production is thought to be complete.
  11. Contrast that process with what it looks like when you add data observability. Data observability enables the user to visualize the data process and see deviations from the typical patterns. What that means in this example is you see a typical data process that spans multiple data sources and transformations. This is a simple view, but in reality, there could be hundreds of different transformations spanning many different data sources and as it moves through the pipeline, it is observed at each stage, ensuring the entire process is stable. Data source #3,is applying enrichment as well as blending and merging of data. You can see that there’s some sort of anomaly that has the potential to jeopardize the final data product. Catching it early in this stage allows the appropriate resource to be notified, and the issue assessed and resolved before the data is made available for consumption. Many studies have been published validating the cost savings of finding issues earlier in the lifecycle. This cost savings can be significant. The early resolution eliminates wasted time and resources in latter stages of the pipeline, not to mention the risk of negative business outcomes. It’s critical that data issues are discovered and remediated before business decisions are made based on inaccurate analytics.
  12. How does data observability work? It is really broken down into three main sets of capabilities. The first is discovering the data that you want to observe and collecting information about those assets through a variety of techniques and tools. The second component performs the analysis to identify any adverse data integrity events. The analysis can get quite sophisticated. It often implements modern AI and ML methods to crunch massive amounts of metadata and related information, Finally, is action - and that's bringing those alerts and insights into the forefront for both manual and automated resolution and is essentially the step to do something about the data issues that have been found.
  13. Let’s look a little deeper at the key capability of data observability analysis – anomaly detection. It might not be obvious when we talk about the analysis components of data observability that the underpinning of this capability set is extensive intelligence powering those insights. Outlier detection for identifying anomalies has been proven to be an effective technique in many use cases and is an integral part of data observability. Here you can see a few typical patterns of anomaly detection used in data observability. This is just scratching the surface of AI and ML methods used to determine outliers. If you are familiar with these types of methods you will see this includes a variety of methods such as random noise, step changes, trends and others, many which are very complex. If you aren’t familiar with those specifics, the main takeaway here is that there is extensive Artificial Intelligence & Machine Learning that is supporting the anomaly detection so that, while you can build specific rules in data observability – you don’t have to because the system will learn what to expect from your data and will alert you when anything appears outside the norm.
  14. Another key capability set in data observability is the action step. This is where you can see visually see alerts that have occurred on your data pipelines and those alerts can be proactively pushed out via notifications. Here you can see an example of a volume alert and the related assets that are impacted by this alert. ---
  15. When we look at Data Observability and traditional Data Quality we consider them as complimentary capabilities, and there is some overlap. Data Observability may be under the same umbrella as data quality or it might be owned with DataOps.  Both focus on the use of metadata AND the traditional Data Quality dimensions of accuracy, completeness, conformity. Both benefit from integrations with the data catalog and are critical for any data governance initiatives. The biggest difference is how each capability set performs this evaluation.  Data Observability emphasizes the identification of anomalies and outliers in data based on patterns over time. Think of this as is similar to human inference – how you or I would look at a data trend line and draw a conclusion - verses the static, predefined rules you find as a part of most data quality tools. While we, at Precisely, offer both Data Observability and Data Quality as distinct capabilities sets, we also make sure that both sets of functionality complement each other to ensure customers get the most possible value from the solutions.
  16. The other capability that is critical to data observability is an integrated data catalog. There are number of benefits that we see from having a data catalog integrated with data observability: The catalog provides a single, searchable inventory of data assets and sllows technical users to easily search, explore, and understand the data It also allows users to visualize relationships, lineage, and business impact to the data and enables collaboration through a variety of mechanisms It also enables data stewards to monitor, audit, certify, and track data across its lifecycle ----- Now we want to show you some examples of situations that happen when things don’t go as planned with data pipelines and give you a view into how data observability helps to address each of these scenarios.
  17. Shalaish
  18. OK. And with that, I'm going to hand over to my colleague Shalaish who is going to show this in action.
  19. Shalaish
  20. Shalaish
  21. Shalaish
  22. Shalaish
  23. Thanks Shalaish. Now that you’ve heard about what data observability is and seen how it can apply to specific use cases I want to review the benefits of data observability. Many of these may be apparent to you from what you’ve seen. First is understanding data health – as the system continuously measures and monitors what is happening you can utilize dashboards to understand the health across your data landscape. Data observability can reduce the risks associated with erroneous business intelligence and advanced analytics that have the potential to impact a variety of business decisions Proactive alerts are provided when the intelligence determines there is an anomaly or outlier and that notification is shown both visually and pushed to the appropriate users Data Observability also enables you to reduce the time to solve operational issues and reduce the cost of potentially adverse events And finally, it allows you to quicky remediate the issues, and integrated data quality capabilities further expedite this process.
  24. Before I close out, I want to mention that the product you saw today is Precisely’s Data Observability solution which is part of Precisely’s Data Integrity Suite.
  25. The Precisely Data Integrity Suite is modular, interoperable and contains everything you need to deliver accurate, consistent, contextual data to your business. The Precisely data integrity suite is set of seven interoperable modules that enable your business to build trust in your data.
  26. The suite has been built so you can start wherever you are in your data integrity journey. This means that the modules are designed to be implemented either together or stand alone, with best-in-class capabilities. For example, you can start with data observability and layer in other modules over time. Here you can see a brief view of the modules of the Precisely Data Integrity Suite.
  27. Now we will address a few questions before we finish up for today. Shalaish, the first question is 1. How does Data Observability work with other applications?  The next question is: 2. What type of user would use Data Observability? Those are all the questions we have time for today. If we did not get to your question we will follow up with you via email.
  28. Thank you for joining us today. If you would like to learn more about data observability we have a provided a resource for your. Thank you for your time and attention.