Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Designing the Next Generation Data Lake

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 34 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Designing the Next Generation Data Lake (20)

Publicité

Plus récents (20)

Designing the Next Generation Data Lake

  1. 1. 1 Copyright  2018 All rights reserved. George Trujillo Designing the Next Generation Data Lake George Trujillo Jr. www.linkedin.com/in/georgetrujillo @georgetrujillo
  2. 2. 2 Copyright  2018 All rights reserved. George Trujillo, Jr. Director of Global Enablement NE Tier One Data Specialist, COE Master Principal Big Data Specialist Vice President of Big Data Managing Director of Big Data Chief Executive Officer
  3. 3. 3 Copyright  2018 All rights reserved. George Trujillo, Jr.  20+ years Oracle: RAC, Data Warehousing, Data Guard, Oracle Middle-Tier, …  Recognized Oracle Double ACE  Independent Oracle Users Group (IOUG) Board of Directors  Served on Oracle Fusion Council & Oracle Beta Leadership Council  Recognized as one of the “Oracles of Oracle” by IOUG  Sun Microsystem's Ambassador for Appl. Middleware Platform  Recognized VMware vExpert  VMware Certified Instructor (VCI)  MySQL Certified DBA
  4. 4. 4 Copyright  2018 All rights reserved. Agenda  Vision and Direction  Analytic Platforms Have to Change  What is Causing Change  How are Hadoop, Big Data and Data Lakes Changing  Impacts of Cloud Technologies  Self Driving Data Platforms  Evolving Big Data Architectures  Impact to You
  5. 5. 5 Copyright  2018 All rights reserved. Imagining the Speed of Trains What can be more palpably absurd than the prospect held out of locomotives traveling twice as fast as stagecoaches? The Quarterly Review, March, 1825.
  6. 6. 6 Copyright  2018 All rights reserved. The Speed of Trains Today, Tomorrow? 270 mph? 4000 mph?
  7. 7. 7 Copyright  2018 All rights reserved. The Future of Movies "Who in Hades wants to hear actors talk?" --H.M. Warner, Warner Brothers, 1927
  8. 8. 8 Copyright  2018 All rights reserved. What do we need, or want? Would a silent movie “customer” panel in 1927 have come up with green screens, computer animation and 3-D?
  9. 9. 9 Copyright  2018 All rights reserved. Are you pointed at the Right Target? Can you innovate with linear thought? How can you improve your organizations ability to deliver insight faster avoiding linear thought?
  10. 10. 10 Copyright  2018 All rights reserved. What do we need, or want? How do you help keep your company from being at a competitive disadvantage?
  11. 11. 11 Copyright  2018 All rights reserved. What Do All These Have in Common? “Space Travel is Impossible”, Lee De Forest, inventor of the vacuum tube, 1957 Telephones and the Internet are just toys  1890: “Telephones were considered for the fancy of the rich, it’s ridiculous to consider the cost required to lay telephone wires across a city let alone the country or the world.”  1980s: “The Internet is ridiculous because: it’s ridiculous to consider the cost required to lay cables across a city let alone the country or the world.” "Remote shopping, while entirely feasible, will flop.” — Time Magazine, 1966 “The more important fundamental laws and facts of physical science have all been discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote.” – Albert A. Michelson, physicist, 1894. “We’ll never put our data in the cloud”, 2016 “An invention has to make sense in the world it finishes in, not in the world it started.“
  12. 12. 12 Copyright  2018 All rights reserved. So Where Are Analytical Platforms Headed?  Analytical platforms are not keeping up with business demands today  Most data lakes have been built one use case at a time Culture eats strategy for breakfast Data Marshall YardData Refinery Data Lake Enterprise Data Hub Data Reservoir Data Warehouses
  13. 13. 13 Copyright  2018 All rights reserved. Are We Ready For the Future, Predictions by 2025  80% production apps will be in the cloud  Two SaaS Suite providers will have 80% market share  Number of corporate-owned data centers will decrease by 80%.  80% of IT budgets will be spent on cloud services.  80% of IT budgets will be spent on business innovation, and only 20% on system maintenance.  All enterprise data will be stored in the cloud  100% of application development and testing will be done in the cloud  Enterprise clouds will be the most secure place for IT processing
  14. 14. 14 Copyright  2018 All rights reserved. How to Compete, When Everything is Getting Faster
  15. 15. 15 Copyright  2018 All rights reserved. Challenges Today
  16. 16. 16 Copyright  2018 All rights reserved. Starting New Projects Compute (CPUs) Data Warehouse Networking Proof of Concept Storage Data Mart
  17. 17. 17 Copyright  2018 All rights reserved. Resources for Projects
  18. 18. 18 Copyright  2018 All rights reserved. Resources for Projects
  19. 19. 19 Copyright  2018 All rights reserved. How Do We Improve Our Analytical Platforms?
  20. 20. 20 Copyright  2018 All rights reserved. Cloud Technologies are Changing Data Lake Strategies  Cloud technologies are adding significant new capabilities and flexibility to data lakes  A characteristic of a data lake is a storage repository  Object storage has significant strategies over HDFS  Replication to data centers  Detach compute from storage  Lower cost storage  Dynamic scaling reduces the need for YARN
  21. 21. 21 Copyright  2018 All rights reserved. Data Architecture DLM (Batch, Microbatch) Web HDFS Storm (Streaming) Kafka (Messaging) Source Data CRM Social Connection Ratings/Revi ews Jive Article Comments Ask/Answer Social Data LinkedIn Facebook Twitter ED W File JMS REST Streamin g Data Ingestion Transactional (PI, WI, FI) FBSI, FPRS, FILI Tools (Talend, Trifecta, …) PIG HIVE Raw Layer Serving Layer Access Layer Data Lake - Ingest, Storage, Compute, Analytics Grid HCatalog (Schema metadata repository) Scheduling (Control-M ?, Oozie, Talend, etc.) Speed LayerSqoop Flume
  22. 22. 22 Copyright  2018 All rights reserved. Data Architecture Raw Layer (Oracle Object Store, S3, HDFS, …) Serving Layer (Oracle Object Store, S3, HDFS) Access Layer Data Lake - Ingest, Storage, Compute, Analytics Grid Speed Layer (Spark, NoSQL, Alluxio, LLAP, …)
  23. 23. 23 Copyright  2018 All rights reserved. Compute (Yarn) Storage (HDFS) Service Discovery (Zookeeper) Libraries, Notebooks inside Cluster Tightly coupled storage and compute HDFS as the Data Lake Artifacts stored inside cluster 2 3 Big Data 1.0 – Monolithic Architecture
  24. 24. 24 Copyright  2018 All rights reserved. Compute (Yarn) Storage (Cloud Storage) Service Discovery (Zookeeper) Libraries, Notebooks etc Outside Cluster Independent Elastic Compute and Storage Cloud Storage as the Data Lake Artifacts stored outside cluster Big Data 2.0 – A Micro Services Based Architecture
  25. 25. 25 Copyright  2018 All rights reserved. Directionally Correct Yesterday Today Tomorrow Sun OS HP-UX AIX Windows … Hortonworks Cloudera MapR Oracle Distribution of Hadoop … Oracle Cloud Amazon Microsoft …
  26. 26. 26 Copyright  2018 All rights reserved. “Status Quo is Latin for “the mess we’re in” – Ronald Reagan  "It’s easier to let disillusionment with data inspire inertia than work to tame the data beast”
  27. 27. 27 Copyright  2018 All rights reserved. Critical Factors for Success For Enterprise Data Platforms Data Architecture Data Governance Data Security
  28. 28. 28 Copyright  2018 All rights reserved. More Management Tasks Than People to Do the Work Less time on Administration Less time on Infrastructure Less time on Patching, Upgrades Less time on Ensuring Availability Less time on Tuning Less time on Troubleshooting More time on Innovation More time on Design More time on New Applications More time on Analytics More time on Securing data More time on Delivering
  29. 29. 29 Copyright  2018 All rights reserved. Empowering Users Streaming Engine Data Lake Enterprise Data & Reporting Discovery Lab Input Events Execution Innovation Discovery Output Data Structured Enterprise Data Notebooks/Analytic Services Object Store Hadoop/HDFS Actionable Events Actionable Metrics Actionable Data Sets
  30. 30. 30 Copyright  2018 All rights reserved. The Power of SQL – Unified Query with Big Data SQL Hive DN DN DN DN ORACLE SQL Engine Storage Table Table Big Data-enabled Oracle Tables Python GraphRnode.js JavaREST SQL Data Local Processing Big Data SQL Cells Leverage Metadata Oracle Big Data SQL Oracle Data Visualization
  31. 31. 31 Copyright  2018 All rights reserved. The First Self-Driving Database – OOW October 2017  The Autonomous Data Warehouse Cloud  Easy  Automated management  Automated tuning: Simply load data and run  Fast  Based on Oracle’s unique data warehouse technology  Elastic  Instant scaling of compute or storage with no downtime
  32. 32. 32 Copyright  2018 All rights reserved. Determine your Target Big Data Strategy Hadoop Data Lakes Analytics Strategy Requirements, Capabilities Centralized Data Architecture Don’t Focus on Technology Focus on Delivering Results
  33. 33. 33 Copyright  2018 All rights reserved. Summary How Will:  Impact of Cloud Technologies  Object Storage  Micro Services Architecture  Self Driving Data Platforms  Speed to Insight Impact Future:  Projects  Career goals  Skill Development
  34. 34. 34 Copyright  2018 All rights reserved. Questions Thank you Questions?

×