Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Seminario Big Data

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 66 Publicité

Seminario Big Data

Télécharger pour lire hors ligne

Analytics, machine e deep learning, data/event streaming
Big data streaming: abilitare la macchina del tempo
Real time event streaming e nuovi paradigmi concettuali:
- Transazioni distribuite
- Consistenza eventuale
- Proiezioni materializzate
Real time event streaming e nuovi paradigmi architetturali:
- Enterprise service bus
- Event store
- Database delle proiezioni
Cenni di Domain Driven Design: una visione strategica della modellazione del proprio dominio di business nell'era dei bi Data.

Analytics, machine e deep learning, data/event streaming
Big data streaming: abilitare la macchina del tempo
Real time event streaming e nuovi paradigmi concettuali:
- Transazioni distribuite
- Consistenza eventuale
- Proiezioni materializzate
Real time event streaming e nuovi paradigmi architetturali:
- Enterprise service bus
- Event store
- Database delle proiezioni
Cenni di Domain Driven Design: una visione strategica della modellazione del proprio dominio di business nell'era dei bi Data.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (19)

Similaire à Seminario Big Data (20)

Publicité

Plus par Roberto Messora (18)

Publicité

Seminario Big Data

  1. 1. Seminario Big Data Ing. Roberto Messora Lecco, 23 Novembre 2017
  2. 2. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  3. 3. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  4. 4. Big Data, Analytics, AI, Machine Learning, Deep Learning 4
  5. 5. Big Data, Analytics, AI, Machine Learning, Deep Learning 5
  6. 6. Big Data, Analytics, AI, Machine Learning, Deep Learning 6
  7. 7. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  8. 8. Data is often considered to be the crown jewels of an organization. 1) Most companies already use analytics in the form of reports and dashboards to help run their business. This is largely based on well structured data from operational systems that conform to pre-determined relationships (“a single version of the truth”). 2) Big Data, however, doesn’t follow this structured model. The streams are all different and it is difficult to establish common relationships. But with its diversity and abundance come opportunities to learn and to develop new ideas – ideas that can help change the business (“a single version of the facts”) The architectural challenge is to bring the two paradigms together. So, rather than approach Big Data as a new technology silo, an organization should strive to create a unified information architecture – one that enables it to leverage all types of data, as situations demand, to promptly satisfy business needs. The objective of this workshop is to describe a reference architecture (and its implementation) that promotes a unified vision for information management and analytics. Executive Overview 8
  9. 9. Executive Overview 9
  10. 10. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  11. 11. The architecture is organized into views that highlight three focus areas: 1. universal information management 2. real-time analytics 3. intelligent processes They represent architecturally significant capabilities that are important to most organizations today. Big Data & Analytics Reference Architectures Conceptual View 11
  12. 12. Unified Information Management addresses the need to manage information holistically as opposed to maintaining independently governed silos. At a high level this includes: o High Volume Data Acquisition – The system must be able to acquire data despite high volumes, velocity, and variety. It may not be necessary to persist all data that is received. o Multi-Structured Data Organization and Discovery – The ability to navigate and search across different forms of data can be enhanced by the capability to organize data of different structures into a common schema. o Low Latency Data Processing – Data processing can occur at many stages of the architecture. In order to support the processing requirements of Big Data, the system must be fast and efficient. o Single Version of the Truth – When two people perform the same form of analysis they should get the same result. As obvious as this seems, it isn’t necessarily a small feat, especially if the two people belong to different departments or divisions of a company. Single version of truth requires architecture consistency and governance. Unified Information Management 12
  13. 13. Real-Time Analytics enables the business to leverage information and analysis as events are unfolding. At a high level this includes: o Speed of Thought Analysis – Analysis is often a journey of discovery, where the results of one query determine the content of the next. The system must support this journey in an expeditious manner. System performance must keep pace with the users’ thought process. o Interactive Dashboards – Interactive dashboards allow the user to immediately react to information being displayed, providing the ability to drill down and perform root cause analysis of situations at hand. o Advanced Analytics – Advanced forms of analytics, including data mining, machine learning, and statistical analysis enable businesses to better understand past activities and spot trends that can carry forward into the future. Applied in real-time, advanced analytics can enhance customer interactions and buying decisions, detect fraud and waste, and enable the business to make adjustments according to current conditions. o Event Processing – Real-time processing of events enables immediate responses to existing problems and opportunities. It filters through large quantities of streaming data, triggering predefined responses to known data patterns. Real-Time Analytics 13
  14. 14. A key objective for any Big Data and Analytics program is to execute business processes more effectively and efficiently. This means channeling the intelligence one gains from analysis directly into the processes that the business is performing. At a high level this includes: o Application-Embedded Analysis – Many workers today can be classified as knowledge workers; they routinely make decisions that affect business performance. Embedding analysis into the applications they use helps them to make more informed decisions. o Optimized Rules and Recommendations –With optimized rules and recommendations, insight from analysis is used to influence the decision logic as the process is being executed. o Guided User Navigation – Whenever possible the system should leverage the information available in order to guide the user along the most appropriate path of investigation. o Performance and Strategy Management – Analytics can also provide insight to guide and support the performance and strategy management processes of a business. It can help to ensure that strategy is based on sound analysis. Likewise, it can track business performance versus objectives in order to provide insight on strategy achievement. Intelligent Processes 14
  15. 15. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  16. 16. Big Data & Analytics Reference Architectures Logical View 16 The high-level logical view defines a multi-tier architecture template that can be used to describe many types of technology solutions.
  17. 17. Big Data & Analytics Reference Architectures Logical View 17 This layer includes the hardware and platforms on which the Big Data and Analytics components run. As shared infrastructure, it can be used to support multiple concurrent implementations, in support of, or analogous to, Cloud Computing. This layer includes infrastructure to support traditional databases, specialized Big Data management systems, and infrastructure that has been optimized for analytics.
  18. 18. Big Data & Analytics Reference Architectures Logical View 18 At the bottom are data stores that have been commissioned for specific purposes (g.e. individual operational data stores, CMS, etc.) These data stores represent sources of data that are ingested (upward) into the Logical Data Warehouse (LDW). The LDW represents a collection of data that has been provisioned for historical and analytical purposes. Above the LDW are components that provide processing and event detection for all forms of data. At the top of the layer are components that virtualize all forms of data for universal consumption.
  19. 19. Big Data & Analytics Reference Architectures Logical View 19 The Services Layer includes components that provide or perform commonly used services. Presentation Services and Information Services are types of Services in a Service Oriented Architecture (SOA). They can be defined, cataloged, used, and shared across solutions. Business Activity Monitoring, Business Rules, and Event Handling provide common services for the processing layer(s) above.
  20. 20. Big Data & Analytics Reference Architectures Logical View 20 The Process Layer represents components that perform higher level processing activities. For the purpose of Big Data and Analytics, this layer calls out several types of applications that support analytical, intelligence gathering, and performance management processes. The Interaction Layer is comprised of components used to support interaction with end users. Common artifacts for this layer include dashboards, reports, charts, graphs, and spreadsheets. In addition, this layer includes the tools used by analysts to perform analysis and discovery activities.
  21. 21. Big Data & Analytics Reference Architectures Logical View 21 The results of analysis can be delivered via many different channels. The architecture calls out common IP network based channels such as desktops and laptops, common mobile network channels such as mobile phones and tablets, and other channels such as email, SMS, and hardcopy. The architecture is supported by a number of components that affect all layers of the architecture. These include information and analysis modeling, monitoring, management, security, and governance.
  22. 22. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  23. 23. Big Data & Analytics Reference Architectures Technological View 23
  24. 24. It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system. It lets you store streams of records ia a fault-tolerant way. It lets you process streams of records as they occur. Apache Kafka 24 Apache Kafka™ is a distributed streaming platform. Website: https://kafka.apache.org/
  25. 25. Speed –up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk Ease of use – API in Java, Scala, Python and R Generality – powerful stack of libraries including SQL and DataFrames, Mllib for machine learning, GraphX and Spark Streaming Runs Everywhere - Spark runs on Hadoop, Mesos, standalone, or in the cloud Apache Spark 25 Apache Spark™ is a fast and general engine for large-scale data processing. Website: http://spark.apache.org/
  26. 26. Reference Architectures - Hadoop Classic Batch Architecture 26 Characteristics:  Batch oriented  Massive Storage  Multiuser jobs  Data Warehouse Replacement
  27. 27. Reference Architectures – Lambda Architecture 27 Batch Layer manages the master data set, an immutable, append-only set of raw data Speed Layer ingest streaming data or micro-batches and provide an «active partition» with a limited window of mutability Serving Layer output from the batch and speed layers are stored in the serving layer (BASE compliant)
  28. 28. Reference Architectures – Lambda Architecture 28 Complexity  Many moving parts  Restatement is difficult  Two code base must be kept in sync  Proper failure handling is complex
  29. 29. Reference Architectures – Kappa Architecture 29 Jay Kreps, the creating of Kafka and one of the first proponents of stream-based architectures, joking called his alternative the “Kappa Architecture”.
  30. 30. Reference Architectures – «Fast Data» Architecture 31
  31. 31. There are more options today for where to deploy a solution than ever before. At a high level the four options for deployment of architecture components are: 1) Public Cloud – In the public cloud model, a company rents resources from a third party. The most advanced usage of public cloud is where the business functionality is provided by the cloud provider (i.e., software-as-a-service). Public cloud might also be used as the platform upon which the business functionality is built (i.e., platform-as-a-service), or the public cloud may simply provide the infrastructure for the system (i.e.,infrastructure-as-a-service). 2) Private Cloud - Private cloud is the same as public cloud, but the cloud is owned by a company instead of being provided by a third party. Private clouds are ideal for hosting and integrating very large data volumes while keeping data secure behind corporate firewalls. 3) Managed Services – In this model a company owns the components of the system, but outsources some or all aspects of runtime operations. 4) Traditional IT – In this model a company owns and operates the system. These various options for deployment are not mutually exclusive. Deployment 32
  32. 32. Security 33 1) Authentication (Kerberos, LDAP, …) 2) Authorization (ACE, ACL, Sentry,…) 3) Encryption & Data Masking (Over-the-Wire Encryption, Encryption at Rest, Field- Level Encryption, Format-preserving Encryption) 4) Auditing & Data Lineage 5) Disaster Recovery & Backup The Keys to secure the enterprise Big Data platform are:
  33. 33. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  34. 34. Analytics - Data Science on Hadoop 35 Common Limitations
  35. 35. Analytics - Data Science 36 Notebooks combine code, output and narrative into a single document. Notebooks You can condunct analysis writing down code, results, ideas and thoughts. You have multiple languages and versions in a single multi-tenant environment. Easy to share Easy version control
  36. 36. 37 Data Science is the science of building data products. OVERT DATA PRODUCTS COVERT DATA PRODUCTS • Products where the data is clearly visible as part of the deliverable. • Descriptive Analysis • Dashboarding • Reporting • Deliver results rather than data; data is hidden. • Recommendation Engine • … Website:https://www.oreilly.com/ideas/evolution-of-data-products Analytics - Data Science Data Products
  37. 37. BENEFITS Analytics allows to better manage Customer Base and extract customer value Analyze customer profiles, behaviors and purchases and obtain a complete and strategic view of the most recurrent customer behaviors Develop a tailored proposition by customer segment to increase customer value along the whole client lifecycle Address marketing efforts based on customer insights and value Drive consumer segments to exploit product portfolio at the right time of their customer journey DIGITAL DIGITAL
  38. 38. Analytics will be carried out in order to offer actionable insights on customer and will follow a multi-step approach Business Objective &Question Business Actions 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% %Accounts Deciles Responders Non Responders Model Interpretation Modeling Data Preparation Data Exploration/Understanding Simple exploratory Analysis in order to understand the whole set of information available, identify problems in the data, and start observing relationships among variables. Use of data visualization techniques for exploring the set of information Data is prepared for data mining and machine learning models Imputation of missing values, computation of new variables potentially useful for the business question, transformation of variables to make them meaningful for the problem to be solved Models are implemented Available data is used and synthesized to answer the business question, by identifying relationships among target variable and input variables It may be a recursive process based also on sampling data and assessing models and results Model results are interpreted in order to be useful for business strategy and actions.
  39. 39. OBJECTIVES ANALYTICAL MODEL … that can be answered through specific statistical models and approaches CustomerValue Customer Life Time New Customer identification and engagement Clienteling & Caring Program Actions to retain leaving customers Churn Model ENGAGE NEW CUSTOMERS NURTURE & DEVELOP LOYALTY CUSTOMERS RETAIN LEAVING CUSTOMERS +   Clickstream & Content Analysis Next Best Offer Analysis Segmentation (deterministical vs behavioural) Propensity Model
  40. 40. Why Algorithms Analysis Propensity Models The model assigns a propensity score to each customer and allows to priorite initiatives Propensity model allows to estimate Re-purchasing probability of customers Retargeting Optimization: predict the likehood of booking a flight for potential customers Up-selling propensity: Reservation upgrade or ancillary services proposal Etc. Address marketing investments on customer with highest propensity to: – Increase up-selling – Increase cross-selling – Increase active customers – Increase redemption of marketing campaigns Regressions Decision Trees Random Forests Neural Networks Support Vector Machines Ensemble Models … What + 
  41. 41. Why Algorithms Analysis Behavioural Segmentation Behavioral segmentation follows a statistical clustering algorithm which: Identify most significant variables for the analysis Aggregate customers into mutually exclusive groups with similar behavioral patterns, by creating clusters are as similar as possible Customer affiliation to a specific cluster varies overtime, based on his behavior Get strategic insight on customer base to increase loyalty and value Tailor contact strategy (“the right action for the right customer”) Enhance the website experience Increase the redemption rate for targeting marketing campaigns Data transformation Factor analysis Unsupervised Clustering models What +  
  42. 42. Why Algorithms Analysis Churn Models Churn analysis is a multivariate data mining technique that assigns a score to customer attrition It estimates the probability that a customer will not buy from a company anymore or for a given period of time Historical data on customers leaving the company will be investigated in order to identify anticipatory signals. Information on flying behavior, enriched data (lifestyle, interests, motivation, SOW, price sensitivity) and customer hyper-profile will be used to compare churn vs loyal behavior Optimization of costs and marketing activities in customer retention Identification of high risk customers sorted by profitability Increase active customers Regressions Decision Trees Random Forests Neural Networks Support Vector Machines Ensemble Models … What 
  43. 43. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  44. 44. The Database Log: the real database 45
  45. 45. The Event Store: an entity history 46
  46. 46. Event Driven Architecture: segregating Command & Queries 47
  47. 47. Event Driven Architecture: Eventual Consistency 48
  48. 48. Event Driven Architecture: the Domain Model 49
  49. 49. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  50. 50. Place the project's primary focus on the core domain and domain logic Base complex designs on a model of the domain Initiate a creative collaboration between technical and domain experts to iteratively refine a conceptual model that addresses particular domain problems. Concepts – Context: the setting in which a word or statement appears that determines its meaning – Domain: a sphere of knowledge (ontology), influence, or activity. The subject area to which the user applies a program is the domain of the software – Model: a system of abstractions that describes selected aspects of a domain and can be used to solve problems related to that domain – Ubiquitous Language: a language structured around the domain model and used by all team members to connect all the activities of the team with the software – Bounded context: explicitly define the context within which a model applies. Explicitly set boundaries in terms of team organization, usage within specific parts of the application – Context map: Identify each model in play on the project and define its bounded context. This includes the implicit models of non-object-oriented subsystems. Name each bounded context, and make the names part of the ubiquitous language. Describe the points of contact between the models, outlining explicit translation for any communication Domain Driven Design: Concepts 51
  51. 51. Entity: An object that is not defined by its attributes, but rather by a thread of continuity and its identity Value Object: an object that contains attributes but has no conceptual identity. They should be treated as immutable Aggregate: a collection of objects that are bound together by a root entity, otherwise known as an aggregate root. The aggregate root guarantees the consistency of changes being made within the aggregate by forbidding external objects from holding references to its members Domain Event: a domain object that defines an event (something that happens). A domain event is an event that domain experts care about Service: when an operation does not conceptually belong to any object. Following the natural contours of the problem, you can implement these operations in services Domain Driven Design: Building Blocks 52
  52. 52. Domain Model: Service 53
  53. 53. Bounded Contexts: coordinate Domain Models 54
  54. 54. Big Data, Analytics, AI, Machine Learning, Deep Learning Executive Overview Big Data & Analytics Reference Architectures Conceptual View Big Data & Analytics Reference Architectures Logical View Big Data & Analytics Reference Architectures Technological View Analytics Overview and Case Studies Event Store Domain Model Cloudera Agenda
  55. 55. Cloudera Product Mapping View 56
  56. 56. Cloudera Manager 57 Cloudera Manager is an end-to-end application for managing CDH clusters. Cloudera Manager sets the standard for enterprise deployment by delivering granular visibility into and control over every part of the CDH cluster—empowering operators to improve performance, enhance quality of service, increase compliance and reduce administrative costs.
  57. 57. Cloudera Navigator 58
  58. 58. Cloudera Navigator Optimizer 59 How can you assess the risk and true cost of offloading ETL and analytic workloads and understand what it takes to get there? o Cloudera Navigator Optimizer gives you the insights and risk-assessments you need to build out a comprehensive strategy for Hadoop success. Simply upload your existing SQL workloads to get started, and Navigator Optimizer will identify relative risks and development costs for offloading these to Hadoop based on compatibility and complexity. o To efficiently optimize performance for the latest technologies, like Hive and Impala, you need visibility into what users are doing with the data and when the queries themselves are to blame. Cloudera Navigator Optimizer gives you that visibility and lets you focus optimization efforts on critical areas and best practices.
  59. 59. Cloudera Security 60
  60. 60. Cloudera Data Science Workbench 61 What is Cloudera Data Science Workbench?
  61. 61. Cloudera Data Science Workbench 62 Data Science on Hadoop
  62. 62. Cloudera Data Science Workbench Architecture 63
  63. 63. Cloudera Data Science Workbench Architecture 64
  64. 64. Cloudera Product Mapping View 65 Cloudera Enterprise is available on a subscription basis in five editions, each designed for your specific needs. – Essentials provides superior support and advanced management for core Apache Hadoop – Data Science and Engineering for programmatic preparation and predictive modeling – Operational DB for online applications and real-time serving – Analytic DB for BI and SQL analytics – The Enterprise Data Hub gives you everything you need to become information-driven, with complete use of the platform.
  65. 65. Data Engineering in the Cloud 66
  66. 66. BI/Analytics in the Cloud 67

×