Contenu connexe

Présentations pour vous(20)

The Data Lake and Getting Buisnesses the Big Data Insights They Need

  1. Name · Title · Dunn Solutions 2017 Jose Hernandez· Director of Analytics · Dunn Solutions
  2. Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A
  3. Dunn Solutions Delivers Velocity to Businesses Dunn Solutions is a digital commerce and business transformation consultancy focused on delivering velocity to our clients. Velocity is achieved by the combination of both speed and direction. Dunn Solutions helps our clients achieve speed by automating business processes and direction using advanced analytics. Our teams align with organizations to optimize their unique processes and help them discover the most profitable routes to business success.
  4. Dunn Solutions is a full-service IT consulting firm founded in 1988 Raleigh, NC Delivery  Training Bangalore, India Delivery Minneapolis Delivery  Training Chicago Delivery
  5. Practice Areas Application Development • Portals • e-Commerce & Content Managed Websites • Mobile App Development • Custom App Development • Search Engine Optimization Training • Certified SAP, Liferay, Microsoft • Classroom, On- site, Computer Based & Virtual • Mentoring & Custom Training Frameworks • Accountable Care Orgs (ACO’s) • Corporate Legal • Higher Education • Optical Shop Solutions Analytics • Data Lakes • IoT • Predictive Analytics • Machine Learning • e-Commerce • Analytics • Cloud - BI Platforms • DW & Data Integration
  6. Selected Clients
  7. PartnershipsPartnerships
  8. Business Intelligence Big Data Data IntegrationBusiness Analytics Data Repositories • KPI’s and Metrics • Dashboards • Exploration and Visualization • Ad Hoc Analysis & Reporting • Data Mining • Predictive Analytics • Prescriptive Analytics • R, AzureML • Hadoop, • Hive, Sqoop, Spark • NoSQL • MapReduce • Data Lakes • Columnar • In-memory • EIM (Data Integration & Data Quality • Dimensional Modeling Analytics Practice
  9. • Develop Forecasting Models • Productionizing Predictive Models • Retail Analytics • Machine Learning • Data Lakes • Big Data • Integration with Data Warehouses • Migrate your Data Warehouse to the Cloud with Azure and AWS • Migrate SAP BusinessObjects deployments • Full Lifecycle Data Warehouse Development • Extend Data Warehouse to the Cloud • Massive Data Warehouses in the Cloud • Snowflake Analytics Services Migration Services Big Data Services Data Warehousing Services Analytics Services in the Cloud
  10. Azure HDInsight Azure Event Hubs Azure SQL Data Warehouse Azure Stream Analytics Azure Machine Learning Azure Data Lake Azure Training Partner Microsoft Azure Consulting Services
  11. Amazon EMR Amazon Machine Learning Amazon Kinesis Firehose Amazon Lambda Amazon DynamoDB Amazon Redshift Amazon IoT Amazon Web Services Consulting
  12. • U.S. based management of teams and client communications • All resources interviewed and approved by DSG leadership • Right Model/Right Project • U.S. only • U.S.-- India • India only (EMEA clients) • Mature and proven • Phased approach • Project sensitive Software • Engineering methodology • Certified Quality Processes • Current technology awareness • Risk awareness Process Technology People Dunn Solutions Global Delivery Model
  13. Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A
  14. Jose Hernandez, Director of Analytics
  15. Warning! Today’s data consumer is very demanding, and rightly so! 80% of consumers need KPIs and operational data – The data warehouse is ideal for them. 10-15% of consumers do more analysis; they use the data warehouse as a source, but dive back into source systems to get more data. The rest of the consumers do very deep data analysis – this includes data scientists. They are voracious data consumers and data creators! (IT can’t keep up with them)
  16. Access to information… • What information? • When? • How much? Analyze the data.. • What tools? • What kind of data? Savvy Data Consumers Needs A: any, all, even data not though of A: anytime, now would be great A: all of it, as much as there is A: whatever tool is need (lots of great tools are available commercially and open source) A: all kinds
  17. • Business users rely heavily on IT • IT controls access to the data • Accessing data across sources is very challenging • Schema on write* What about the enterprise data warehouse? • Does not provide just-in-time data • Requires lots of lead time • Limited to the “required” data Traditional Data Storage and Management Challenges KPI The demand for data has never been greater! *Sorry  I could not avoid this terminology, more in a bit….
  18. The Data Lake Provides Relief
  19. A Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. It also provides compute power to work the data. What is a Data Lake? The Data Lake is about democratization of information It provides your organization a cost effective way to store information for later processing It lets your information consumers and researchers focus on finding the next big thing, not wasting time finding the data For the techies in the crowd….
  20. James Dixon of Pentaho is credited with coining the phrase “Data Lake” Origin of the Data Lake Dixon’s analogy… Think of the data mart as bottled water; cleansed, packaged and delivered for your consumption The data lake is a man made reservoir of water in its natural state, no processing
  21. Feed the data starved users Make it easy to consume and combine Deliver the data just-in-time Store all kinds of data (whether you have a specific need today or not), and lots of it Worry about how it’s going to be used later (schema on read) Provide boundless playgrounds • To store data • To process data Purpose of the Data Lake
  22. Warning! The Data Lake does not replace the Enterprise Data Warehouse!
  23. Comparing the Data Lake to the Data Warehouse Data Warehouse • Data focuses on Business Processes • Highly processed & massaged • Tabular & structured • Lots of effort on design & build • Optimized for data retrieval • Highly governed Data Lake • Stores everything • Unprocessed / RAW • Unstructured, semi- structured, structured • Democratization of data • Shared data stewardship • Provides compute power
  24. It’s Not Just About Data Storage Storing and accessing data is only part of the Data Lake’s Purpose The Data Lake must also provide the ability to: • Massively process data (usually in place) • Process and combine structured, semi-structured and un- structured data • Grow and shrink in both storage and compute power as needed • Onboard data very fast • Perform advanced analytics (massively process data)
  25. Supporting Top-Down and Bottom-Up Data Warehouses use the Top-Down approach Data Lakes use the Bottom-up approach From specific instance into a generalized conclusion From generalized principles (known to be true) to a specific conclusion Descriptive Predictive
  26. What Does a Data Lake Look Like?
  27. Filling the Data Lake Types of data • Structured Data • Semi-structured Data • Unstructured Data No schema is applied at load time Data loads very fast The Data Lake is infinitely deep and can hold all data
  28. Supports many uses • Data Exploration • Staging for the Data Warehouse • Data enrichment • Predictive analytics • Mixing disparate data • Apply schema on demand (on read) • Processing massive amounts of data • Sandboxes for experimentation Consuming from the Data Lake
  29. Warning Don’t let your Data Lake turn into a data swamp! It’s not the Wild, Wild, West. Governance is still needed. Data consumers must also be citizen data stewards. Include metadata (data about your data) Don’t contaminate the Data Lake with bad data (get it from trusted sources) Data Lakes hold all data; however set and enforce boundaries. Have a vision for your data lake; know what it will be used for.
  30. Access and Security • It’s a data playground, even playgrounds have rules • Not all the data should be available to all users (confidential information that must be protected) • Is the data sensitive in nature? Are there laws governing the data that require encryption? Data Quality • poor quality data, don’t put it in your data lake • Trust the source Security and Governance
  31. Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A
  32. Voracious Data Consumers Must Be Served! Getting back to the 10% of users that need all the data; the Data Scientists Your organizations success and survival depends on • Innovation • Efficiency • finding the next big thing • getting (and keeping) an edge The data scientists and data analysts give you the ability to do this. The data lake supports: • Predictive Analytics • Prescriptive Analytics • Machine Learning • Experimentation (A/B Testing) • Qualitative data analysis – help steer strategic decisions
  33. How Does a Data Lake Complement the EDW? Your enterprise data warehouse is home to historical data and metrics that feed your KPIs, PIs based on your business processes. It does this by extracting, transforming and loading the data required to support your “known” KPIs and metrics. What if you determined that some data element was needed to provide a KPI you should have been tracking? You would add that to your data warehouse and start populating from that point forward. Too bad, wish I would have thought of this sooner, there are some historical trends that I would be able to identify 
  34. Give Super Powers to your Data Warehouse! Imaging you could go back in time! In the previous scenario you did not have the historical data because: a. It was not being captured because it wasn’t considered b. The EDW staging area is transient and typically only goes back for a short period of time The Data Lake would have given your data warehouse the ability to go back in time! The data lake can serve as a great staging area for your EDW. It can store transactional data from the beginning of time: a. Letting you go back in time and reconstruct your EDW to incorporate the information you did not consider b. Also it would allow you to rebuild your EDW from day one in the event of a catastrophic failure
  35. Warning! Deploying a data lake is very expensive and challenging. So don’t!
  36. Do It in the Cloud! Reliable & Trusted Pay for what you use Supporting Tools Easily Scales
  37. Delight Your Data Consumers! You’re wondering whether the Data Lake can help you with your data starved consumers. The simple answer is yes. You don’t have to start huge (that’s the beauty of cloud based data lakes). We can get you started immediately. Your data consumers will be very happy. Contact us info@dunnsolutions.com
  38. Question & Answers Jose Hernandez· Director of Analytics · Dunn Solutions Watch for more webinars featuring how Data Scientist “do their thing” with Data Lakes in the cloud!