Publicité
Publicité

Contenu connexe

Présentations pour vous(20)

Similaire à Quicker Insights and Sustainable Business Agility Powered By Data Virtualization (A/NZ)(20)

Publicité

Plus de Denodo (20)

Publicité

Quicker Insights and Sustainable Business Agility Powered By Data Virtualization (A/NZ)

  1. How Data Virtualization impacts AI/ML projects Chris Day Director, APAC Sales Engineering cday@denodo.com 14 April 2021
  2. 3 Advanced Analytics & Machine Learning Projects Need Data Improving patient outcomes • Data includes patient demographics, family history, patient vitals, lab test results, claims data etc. Predictive maintenance • Maintenance data logs, data coming in from sensors – including temperature, running time, power level duration etc. Predicting late payment • Data includes company or individual demographics, payment history, customer support logs etc. Preventing frauds • Data includes the location where the claim originated, time of the day, claimant history and any recent adverse events. Reducing customer churn • Data includes customer demographics , products purchased, products used, pat transaction, company size, history, revenue etc.
  3. 4 VentureBeat AI, July 2019 87% of data science projects never make it into production
  4. 5 The Scale of the Problem
  5. What is Data Virtualization?
  6. 7 What is Data Virtualization? Consume in business applications Combine related data into views Connect to disparate data sources 2 3 1
  7. 8 What is Data Virtualization?
  8. 9 Data Virtualization: Unified Data Integration and Delivery • Data Abstraction: decoupling applications from data sources • Data Integration without replication or relocation of physical data • Easy Access to Any Data, high performance and real-time/ right-time • Unified metadata, security & governance across all data assets • Dynamic Data Catalog for self-service data services and easy discovery • Data Delivery in any format with intelligent query optimization
  9. Tackling the Data Problem
  10. 11 Typical Data Science Workflow A typical workflow for a data scientist is: 1. Gather the requirements for the business problem 2. Identify useful data ▪ Ingest data 3. Cleanse data into a useful format 4. Analyze data 5. Prepare input for your algorithms 6. Execute data science algorithms (ML, AI, etc.) ▪ Iterate steps 2 to 6 until valuable insights are produced 7. Visualize and share Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  11. 12 Where Does Your Time Go? • 80% of time – Finding and preparing the data • 10% of time – Analysis • 10% of time – Visualizing data Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  12. 13 Where Does Your Time Go? A large amount of time and effort goes into tasks not intrinsically related to data science: • Finding where the right data may be • Getting access to the data ▪ Bureaucracy ▪ Understand access methods and technology (noSQL, REST APIs, etc.) • Transforming data into a format easy to work with • Combining data originally available in different sources and formats • Profile & cleanse data to eliminate incomplete or inconsistent data points
  13. 14 Data Scientist Workflow Identify useful data Modify datainto auseful format Analyzedata Executedata science algorithms (ML,AI, etc.) Prepare for MLalgorithm
  14. 15 Identify Useful Data If the company has a virtual layer with a good coverage of data sources, this task is greatly simplified. ▪ A data virtualization tool like Denodo can offer unified access to all data available in the company. ▪ It abstracts the technologies underneath, offering a standard SQL interface to query and manipulate. To further simplify the challenge, Denodo offers a Data Catalog to search, find and explore your data assets.
  15. 16 Data Scientist Workflow Identify useful data Modify datainto auseful format Analyzedata Execute data science algorithms (ML,AI, etc.) Prepare for MLalgorithm
  16. 17 Data Virtualization offers the unique opportunity of using standard SQL (joins, aggregations, transformations, etc.) to access, manipulate and analyze any data. Cleansing and transformation steps can be easily accomplished in SQL. Its modeling capabilities enable the definition of views that embed this logic to foster reusability. Ingestion And Data Manipulation Tasks
  17. 18 McCormick Uses Denodo to Provide Data to Its AI Project Background ▪ McCormick’s AI and machine learning based project required data that was stored in internal systems spread across 4 different continents and in spreadsheets. ▪ Portions of data in the internal systems and spreadsheets that were shared with McCormick's research partner firms needed to be masked and at the same time unmasked when shared internally. ▪ McCormick wanted to create a data service that could simplify the process of data access and data sharing across the organisation and be used by the analytics teams for their machine learning projects.
  18. 19 • Data Quality • Multiple Brands • Which Data to Use?
  19. 20 McCormick – Multi-purpose Platform Solution Highlights ▪ Agile Data Delivery ▪ High Level of Reuse ▪ Single Discovery & Consumption Platform
  20. 21 Data Virtualization Benefits for McCormick ▪ Machine learning and applications were able to access refreshed, validated and indexed data in real time, without replication, from Denodo enterprise data service. ▪ The Denodo enterprise data service gave the business users the capability to compare data in multiple systems. ▪ Spreadsheets now the exception. ▪ Ensure the quality of proposed data and services.
  21. 22 ✓ Denodo can play key role in the data science ecosystem to reduce data exploration and analysis timeframes. ✓ Extends and integrates with the capabilities of notebooks, Python, R, etc. to improve the toolset of the data scientist. ✓ Provides a modern “SQL-on-Anything” engine. ✓ Can leverage Big Data technologies like Spark (as a data source, an ingestion tool and for external processing) to efficiently work with large data volumes. ✓ New and expanded tools for data scientists and citizen analysts: “Apache Zeppelin for Denodo” Notebook. Data Virtualization Benefits for AI and Machine Learning Projects
  22. 23 https://bit.ly/2Qb5tYB
  23. 24 https://bit.ly/3dVe1Ll
  24. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.
Publicité