Publicité
Publicité

Contenu connexe

Similaire à Better Architecture for Data: Adaptable, Scalable, and Smart(20)

Publicité
Publicité

Better Architecture for Data: Adaptable, Scalable, and Smart

  1. A Better Architecture for Data: Adaptable, Scalable, and Smart Paul Boal & Adam Doyle June 8, 2018ST LOUIS
  2. Agenda 1. Modern Data Architecture Myths 2. Characteristics of Modern Data Architecture a. Governed, Secure b. Adaptable, Customer Centric, Collaborative c. Flexible, Elastic, Simple, Resilient d. Smart, Automated 3. Reference Data Architecture 4. How do I get there? 5. Recap 2
  3. Myths 3
  4. MYTH #1 A modern data architecture is not a single technology or single vendor solution. Modern data architectures combine a portfolio of technologies to create an ecosystem with certain characteristics. Just install Hadoop 4
  5. MYTH #2 NoSQL technologies provide an efficient way to manage and access data under certain circumstances, but traditional relational databases and SQL continue to provide the most powerful way to organize and query well-known data. Modern must mean NoSQL 5
  6. MYTH #3 We talk a lot about the accelerating growth of data, the decreasing cost of storage and compute power, and the power of data science. It's convenient to believe that throwing all of this into a pot and simmering will produce results while we wait. The truth is that applying data, technology, and analytics still requires planning, analysis, and careful execution. Big data is magical pixie dust 6
  7. MYTH #4 Not all data is created equal. Sometimes you might have unreliable or invalid data that will obfuscate results if used inappropriately. Using extraneous data can make analysis more complicated by adding time to filter the data set and select features. Sometimes more just means more work. More data is always better 7
  8. MYTH #5 One of the characteristics of a modern data architecture is flexibility, meaning that your modernization should be developed incrementally, implementing new capabilities in a way that integrates with and slowly supplants existing limited technologies. I have to replace everything I have right now 8
  9. Characteristics 9
  10. Governed, Secure 10
  11. Governed, 11 The architecture and its components have to evolve and adapt in ways that are intentional and informed by enterprise strategy. Make collaboration the default. Communicate and then communicate some more. Treat every component as if another team may want to use it, too. Accessing information should be easy and should effortlessly ensure that users are knowingly using the right information for the right purpose. Security as an enabler of usage, not a denier of access. Track and log access for audit purposes and for learning. Secure
  12. ING Apache Atlas Open Metadata and Governance - APIs, notification systems, integration of metadata, security, and governance related tools 12 Governed, Secure https://www.slideshare.net/Hadoop_Summit/open-metadata-and-governance-with-apache-atlas?qid=6ea30d4f-15af-46ad-b580-349f78bb7752&v=&b=&from_search=9
  13. Frameworks and Tools Open Source Core Apache Atlas - Open Metadata Management Apache NiFi - Data Provenance Apache Sentry/Ranger - Fine-grained Access Control 13 Governed, Secure Vendor Participants
  14. Adaptable, Customer Centric, Collaborative It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change. ~Charles Darwin 14
  15. Adaptable, 15 The more you deliver, the more you will learn about what is really needed, so be prepared to change and build solutions that can change easily. Agile data modeling. Agile analytics. Focus on delivering solutions that make sense to the people who will use them rather than following standards and rules above all else. The DBMS is not your user. Ralph Kimball and Edgar Codd are not your users. The Architecture Review Board is not your user. Customer Centric, Solutions that are interactively designed and built by a team with diverse capabilities and backgrounds can produce a result better than what any one individual would have done . Collaboration is more than requirements gathering. Collaboration is something that has to happen every day. Communicate, communicate, communicate. And then communicate. Collaborative
  16. Agile Data 16 Adaptable, Customer Centric, Collaborative http://agiledata.org/
  17. Tools and Techniques Model Storming Rapid experimentation Data science environments Wherescape, Snowflake, ThoughtSpot 17 Adaptable, Customer Centric, Collaborative
  18. Simple, Elastic, Resilient, Flexible Notice that the stiffest tree is most easily cracked while the bamboo or willow survives by bending with the wind. -Bruce Lee 18
  19. Simple, 19 Individual components should only be as complex as necessary. Reduce inter- dependencies. Use shared components. The system can easily had an increase in data volume, users, or complexity. Distributed computing. Cloud. DevOps. Errors in data or processing don't cause large parts of the system to fail. Isolate components. Tolerate, isolate, and report bad data. Change to the system is easy to accommodate and doesn't break other components. Microservices. Versioned interfaces. Backward compatibility. Elastic, Resilient, Flexible
  20. EarEcstasy 20 Data staging and Data Lake only contain needed data. Each data pipeline is only as complex as it needs to be to deliver on a narrow scope. Data is only integrated as needed, keeping processes simple. Simple, Elastic, Resilient, Flexible https://www.slideshare.net/AmazonWebServices/aws-summit-singapore-get-to-know-your-customers-modern-data-architecture-93784711
  21. Tools and Technologies 21 Cloud-based Infrastructure Cloud-native Services DevOps Containers Open Source Simple, Elastic, Resilient, Flexible
  22. Automated, Smart 22 I'm afraid I can't make that into a star schema, Dave. We are going through the process where software will automate software, automation will automate automation. -Mark Cuban
  23. Automated, 23 Automate tasks needed to optimize the function of the system, to detect significant changes, and to alert users when attention is needed. Metadata injection. Schema change detection. Anomaly detection. Alerting Schema detection. Self-tuning databases. Jeopardy champion. Data shaping, data quality recommendations. Natural Language Processing. Machine Learning. Recommender systems. Deep Learning. Smart
  24. EXAMPLE 83% reduction in workload matching complex, low quality data with contextual analysis 24 Automated, Smart
  25. TOOLS Integrated Machine Learning Integrated Search Intelligent Data Classification Natural Language Processing 25 Automated, Smart
  26. Reference Architecture 26
  27. Modern Data Architecture 27 Everything should be made as simple as possible, but not simpler. - A. Einstein
  28. Next steps 29
  29. How do I get there from here? 30 Start with something you understand well from a business perspective. Select specific, valuable, measurable business cases. Add simple machine learning use cases. Identify use cases to move from a batch processing system to a streaming solution.
  30. Recap 31
  31. The Myths are Just Myths 32 ● You don't "just need Hadoop" - You may not even need Hadoop at all! ● NoSQL has a place, but that isn't the entire solution either. ● There's no magical pixie dust here. This transformation will take real work. ● More data is not necessarily better - no matter how much we data hoarders want it to be. ● By definition, you have to incrementally create your modern data architecture, because it also has to continue to evolve.
  32. Governed, Secure 33 Maintain data and the data architecture in a way that makes governance and security a natural and easy part of doing work.
  33. Adaptable, Customer Centric, Collaborative 34 Apply data toward real challenges and opportunities that focus on customers and be willing and able to pivot as needed.
  34. Simple, Elastic, Resilient, Flexible 35 Build your data architecture, your teams, and your processes in a way that creates a high capacity for change.
  35. Automated, Smart 36 Create systems that can do more of the work of ingestion, storage, and integration without your intervention.
  36. Thank You! 37

Notes de l'éditeur

  1. Intro and Myths - Paul Characteristics A, B - Paul Characteristics C, D - Adam Reference Architecture - Adam How do I Get There - Adam or Paul or Back-and-Forth Recap - Paul
  2. These characteristics describe the processes by which your data is maintained. Maybe here we want to tell stories about companies that didn’t secure their data (Target, Equifax, Schnucks)
  3. These characteristics describe the processes by which your data is maintained. Maybe here we want to tell stories about companies that didn’t secure their data (Target, Equifax, Schnucks)
  4. These characteristics describe the processes by which your data is maintained.
  5. These characteristics describe the processes by which your data is maintained.
  6. These characteristics describe the way in which you use your data. Built for purpose
  7. These characteristics describe the way in which you use your data. Built for purpose
  8. These characteristics describe the way in which you use your data.
  9. These characteristics describe the way in which you use your data.
  10. These characteristics describe the architecture and its capacity to change.
  11. These characteristics describe the architecture and its capacity to change.
  12. These characteristics describe the architecture and its capacity to change.
  13. These characteristics describe the way in which your data is integrated. Informatica ClAIre
  14. These characteristics describe the way in which your data is integrated. Informatica ClAIre
  15. These characteristics describe the way in which your data is integrated.
  16. These characteristics describe the way in which your data is integrated.
  17. These characteristics describe the architecture and its capacity to change.
  18. Processing data - Mastering, Integration, De-identification, Data Warehouse/Data Mart for reporting with rigor Provisioning - Pie in the Sky - I’d like some “Net Sales”
Publicité