Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Chargement dans…3
×
3 sur 21

Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman Mohajerian of Teradata

9

Partager

As integrated web analytics evolves to both a service oriented and event based model, there will be higher emphasis on moving toward event based analytics. Business analytics is moving from purely counts of analytics to time-series, relationship and usage analytics. Examples of web analytics that can take advantage of this architecture are conversions analytics or cross channel marketing.
The advantage of storing raw event data is that you have maximum flexibility for analysis. For example, you can trace the sequence of pages that one person visited over the course of their session. You can’t do that if you’ve squashed all the events into e.g. counters. That sort of analysis is really important for some offline processing tasks, such as training a recommender system (“people who bought X also bought Y”, that sort of thing). For such use cases, it’s best to simply keep all the raw events, so that you can later feed them all into your shiny new machine learning system.
In this session we are going to elaborate on using Kafka, an Event Processing framework (e.g. Storm or Spark Streaming) and either Hadoop or EDW for building an Event Driven Architecture.

Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman Mohajerian of Teradata

  1. 1. Event Driven Architecture for Web Analytics Peyman Mohajerian June 2015 Big Data Day LA
  2. 2. 2 Why Event Analytics? Understanding the Customer Experience Building Your Business Around Your Customer The Customer Thread Solution Framework Use Cases Teradata Listener Contents © 2014 Teradata2
  3. 3. 3 The questions analytics are intended to answer… © 2014 Teradata Event Analytics answer “Why” and “How” Traditional BI answers…
  4. 4. 4 Mobile Event analytics focuses on how the business looks to the customer © 2014 Teradata Turning the Analytic View of the Customer 180o What segment are you in?When to you last visit? Traditional BI focuses on how the customer looks to the business What did you buy? Why did they make me that offer? Why do they keep sending me emails? How do I make that selection? Why does the agent keep asking for the same information?
  5. 5. 5 © 2014 Teradata Applications Deliver the Company’s Brand and Customer Experience Social Media The Customer Marketing Channels Mobile Apps Devices & Form-factors Entirety of applications combine to deliver the full customer experience Today they are mostly designed in a silo’d manner Applications are not designed to solicit and extract customer experience data well At the core of application design should be the considerations for obtaining and delivering information about the customer experience
  6. 6. 6 © 2014 Teradata The Customer Experience Universe Day 1 Day 3 Day 7 Day 17 Day 21 Day 25 IM Campaign Fragment Email Campaign Fragment Customers Services Fragment PaidSearch LandingPage CreateAccount TXN AttachedCC EmailSent EmailOpened EmailLinkClicked EmailClicked AccountLogin BannerAd1Impression BannerAd2Impression AddBank EmailSent EmailSent TXN AccountLogin HelpCenter EnterDispute C.S.EmailSent EmailOpened EmailLinkClicked HelpCenterHP DisputePage VirtualAgent CallsIntoIVR IVR:DisputeWorkflow TransferredtoAgent DisputeResolved C.S.SurveyEmailed Social Media The Customer Marketing Channels Mobile Apps Devices & Form-factors A universe of customer experience data: • Create threads • Build graphs • Identify patterns
  7. 7. 7 © 2014 Teradata Event Analytics Ecosystem Social Media Email Marketing Display Marketing Website Activity Customer Account Products Transactions Customer Care Event Repository EAP Metadata Dictionary & Library Core Event Dictionary, Library & Data Source Adapters Custom Business Event Dictionary & Library Machine Learning Customer Experience Best Offers Digital Marketing Applications ReportingHigh Speed Query & Reporting APIs Guided UI Driven Analytics Funnel Path Graph Guided UI Funnel & Path Processing Functions Graph Engine & Functions Business Analyst Business Analyst
  8. 8. 8 Event Analytics Ecosystem EAP Metadata Dictionary & Library Core Event Dictionary, Library & Data Source Adapters Custom Business Event Dictionary & Library Event Repository Offers Best Offers Machine Learning A/B Testing Reporting High Speed Query & Reporting APIs Guided UI Driven Analytics Funnel Path Graph Guided UI Funnel & Path Processing Functions Graph Engine & Functions Business Analyst Business Analyst Product, Customer and Transaction Data Mobile Apps Web Site Activity Social Media Display & Search Marketing Customer State eComm Customer Care 3rd Party TrackingBatch Ingest Data Dictionary Event Pattern Matching & Scoring Decisioning Buffer Serve LWIftp Aster Analytic Engine Event Metadata Dictionary Guided UI Funnel Reporting UI Processing Engine Dashboard Engine Dashboard API R-T Events for Decisioning Dashboard API Data WarehouseProduct, Customer, Transaction Event Processing & Event Repository Event Processing Engine HDFS (Time) Event Repository (HBase) Event Repository (Hive) Stream Ingest Spark
  9. 9. 9 • Funnel and pathing analytics are a class of analytics used across the company to analyze user behavior, conversion and product experience • Funnel analytics are complex due to: – Source categorization – Visitor identification – Pathing – Attribution – Conversion • Can be built using a single Guided UI without needing to write-code or move data; allowing analytics to scale at the speed of business © 2014 Teradata Funnel Analytics Use Case Account Transaction Actions
  10. 10. 10 • Frequently Used Aster SQL-MapReduce Functions can be run without knowing SQL. • Forms build dynamically to display necessary parameters based on the analytics being run. • Results can be visualized, published and shared with others for refresh and reuse. © 2014 Teradata Execute Advanced Analytics with Ease 321
  11. 11. 11 • Machine learning is a collection of algorithms to: – Detect hidden patterns in data – Create useful predictions about unseen data – Decision making under uncertainty • Event Repository provides the universe of customer events; a trusted set of events • Machine Learning algorithms can continuously search through the Event Repository looking for complex patterns of interesting behavior; triggering actions © 2014 Teradata Machine Learning on Event Repository Event Repository
  12. 12. 12 © 2014 Teradata Machine Learning Use Cases • Production recommendations • Market Basket Analysis • Event/Activity/Behavioral Analysis • Campaign management and optimization • Market and consumer segmentations Day 1 Day 3 Day 7Day 17 Day 21 Day 25 IM Campaign Fragment Email Campaign Fragment Customers Services Fragment PaidSearch LandingPage CreateAccount TXN AttachedCC EmailSent EmailOpened EmailLinkClicked EmailClicked AccountLogin BannerAd1Impression BannerAd2Impression AddBank EmailSent EmailSent TXN AccountLogin HelpCenter EnterDispute C.S.EmailSent EmailOpened EmailLinkClicked HelpCenterHP DisputePage VirtualAgent CallsIntoIVR IVR:DisputeWorkflow TransferredtoAgent DisputeResolved C.S.SurveyEmailed Time-series Classification Clustering
  13. 13. 13 UNIFIED DATA ARCHITECTURE Security, Workload Management Applications INTEGRATED DATA WAREHOUSE DATA PLATFORM INTEGRATED DISCOVERY PLATFORM Security, Workload ManagementREAL TIME PROCESSING TERADATA PORTFOLIO FOR HADOOP TERADATA DATABASE TERADATA ASTER DATABASE RESTFULAPI LISTENINGFRAMEWORK RESTFULAPI APPFRAMEWORK
  14. 14. Listener Framework
  15. 15. 15 Teradata Listener Common Data Integration Platform for Streaming Data Simplifies data integration across the enterprise Provides a platform for (near) real-time applications Scalable and reliable to support the entire enterprise Open and API based to encourage use Teradata Confid
  16. 16. 16 Listener Data Flow Data flow sequence from sources to target systems 1 6 Multiple sources Write to firehose Read mini-batches INGEST FIREHOSE Write to streams Read mini-batches Write tuples SOURCES ROUTER STREAMS WRITERS SYSTEMS Teradata Confid
  17. 17. 17 INGEST SERVICES Ingesting Data Visualizing the flow through Ingest Services { "uuid":"79f3325f-c75c-4f98-b01e- c4845f69f58c", "source":"6fde3548-65ed-4fa5- 927c-dfc06f1691c6", "data":{ "foo":"bar" }, "time":"2015-01-27T16:17:57Z", "hour":"16", "minute":"17", } { foo":"ba r" } { "uuid":"79f3325f-c75c-4f98-b01e- c4845f69f58c" } Teradata Confid
  18. 18. 18 FIREHOSE Regulating Pressure Distributed write-ahead logging allows bursts of data without impacting systems 1 8 Apache Kafka Resilient and durable; Horizontally scalable; Built for maximum throughput. Asynchronous Reads & W Producers append to the log at their own pace; Consumers read at their own pace; Latest data is always in memory. Teradata Confid
  19. 19. 19 {…}, {…} {…}, {…} {…}, {…} CONSUM E FIREHOSE Routing Data Demuxing data into streams based on rules ROUTER STREAM 1 STREAM … STREAM n Built on Apache Spark Resilient and durable; Horizontally scalable. Rule Based Initially based on API key (per registered data source); Eventually enable combined streams. Teradata Confid
  20. 20. 20 {…}, {…} {…}, {…} Writing Data Write to target systems through Spark jobs {…}, {…} TERADATA WRITER STREAM 1 STREAM … STREAM n ASTER WRITER HDFS WRITER Built on Apache Spark Resilient and durable; Horizontally scalable. Writer Options Initial support for Teradata, Aster and HDFS; Initially leverage JDBC batch writes for Teradata; Exploring rate limiting writers and other systems. Teradata Confid
  21. 21. 2121 © 2014 Teradata

Remarques

  • Over the years we’ve gotten good at building platforms to answer the What, Who, When & Where question. Most of the work is about making it more cost efficient, a bit more flexible and improve data quality. For me that is important but not game changing for the business.

    Event analytics is about focusing on the WHY and HOW questions.

    How are things (events) related or how are customers related by their experiences?

    In my view of the world the business user should have just as much ease answering the “why” questions as they have today with getting answers to the “what” questions.
    Behavioral Analytics is a subset of business analytics that focuses on how and why users of eCommerce platforms, online games, & web applications behave. While business analytics has a more broad focus on the who, what, where and when of business intelligence, behavioral analytics narrows that scope, allowing one to take seemingly unrelated data points in order to extrapolate, predict and determine errors and future trends. It takes a more holistic and human view of data, connecting individual data points to tell us not only what is happening, but also how and why it is happening.
    Behavioral analytics utilizes user data captured while the web application, game, or website is in use by analytic platforms like Google Analytics. Platform traffic data like navigation paths, clicks, social media interactions, purchasing decisions and marketing responsiveness is all recorded. Also, other more specific advertising metrics like click-to-conversion time, and comparisons between other metrics like the monetary value of an order and the amount of time spent on the site.[1] These data points are then compiled and analyzed, whether by looking and the timeline progression from when a user first entered the platform until a sale was made, or what other products a user bought or looked at before this purchase. Behavioral analysis allows future actions and trends to be predicted based on all the data collected.
  • How the business looks to the customer
    The customer experiences the company across the entirety of applications that company has developed and deployed. Applications more so represent the Brand of the company
    Most applications are not designed to solicit and extract the customer experience data well. There are 2 major ways data is obtained from applications
    Web-site tagging
    Very detailed logging data for engineers for application development and application operational performance
    One is too aggregate and difficult to administer; the other is too engineering oriented
    Furthermore applications are designed within themselves and mostly are not designed to thinking about the experiences across other applications and channels. Stitching the customer experience across multiple applications is difficult.
  • How the business looks to the customer
    The customer experiences the company across the entirety of applications that company has developed and deployed. Applications more so represent the Brand of the company
    Most applications are not designed to solicit and extract the customer experience data well. There are 2 major ways data is obtained from applications
    Web-site tagging
    Very detailed logging data for engineers for application development and application operational performance
    One is too aggregate and difficult to administer; the other is too engineering oriented
    Furthermore applications are designed within themselves and mostly are not designed to thinking about the experiences across other applications and channels. Stitching the customer experience across multiple applications is difficult.
  • The problem is big
    7 sources by client
    Ability to customize for the consumer
  • Ingestion: depending on the type of source TD has IP; basically there are 2 types of sources: streaming & batch. For streaming TD Listener will be the advocated solution; for batch TB has 2 pieces of IP for ingestion (Light-weight ingestion (LWI) & Buffer Server).
    Light-weight ingestion (LWI) is for large 3rd party files like Omniture. Instead of having to FTP OMNI to a landing server; LWI connects directly to FTP and pulls the file and lands into HDFS in time-partitions.
    Buffer Server is a set of IP that is designed to ingest large numbers of small files, concatenate them together to large files that are more Hadoop friendly and lands them into HDFS time-partitions.
    Event Processing & Repository
    TB has designed (but not yet implemented) 2 pieces of IP in this area
    Event Processing: built using M/R it converts the incoming data sources into event objects (3 processing steps include: pre-pend an event header, pre-pend an event type header and resolve incoming ID (cookie, GUID, customer, email address, etc.) to a specific customer. Populates event records into Hbase. The Event Processing Engine processese both streaming and batch sources
    Event Repository is an HBase schema that is to central storage for all events
    Dashboard Engine
    TB has built IP that allows quickly building KPI’s from the Event Repository. Using a UI, a developer can quickly aggregate metrics into an Hbase schema onto top of which tools like Tableau can optimall run
    Guided, Metadata-driven Discovery Event Analytics


  • Teradata has a solution
    Allows users to define the pattern they are wanting to match
    The UI guides the analyst to do this

    Steps:
    Give me visibility to all the event available in this repository
    This allows you to define the flow
    Allows to desired patterns to be run against that flow

    All the data and Metadata is behind this, allowing us to gain insights  greater view of the customer
  • As we continue to advance our UDA you will see us focusing on Restful APIs for both loading data and for accessing data across the UDA.

    For loading you will also see us advance with the Listening Framework

    Listening Framework (Restful API)
    The Listening Framework will provide the ability to open database sessions, submit SQL requests and access responses, and access metadata. Requiring zero client install, the Listening Framework is ideal for web, mobile, scripting languages. This does not replace traditional ETL, it provides a lightweight framework to easily subscribe to new data sources in a easy frictionless model
     
    For Access you will see us extend AppCenter into a App Framework that goes beyond just Aster across the entire UDA
     
    App Framework (Restful API)
    AppCenter makes it easy, fast & efficient for organizations to benefit from big data analytics & discovery by simplifying the app building & consumption experience.  AppCenter is a framework to build, deploy, configure and consume big data apps. Using AppCenter, customers or Teradata PS can build new big data apps or configure (and customize) an existing app to deliver faster value from big data analytics. Apps include analytical apps like customer churn analytics, product affinity analytics and fraud analytics and can be horizontal or industry focused depending on the use case.
     
    AppCenter provides the following capabilities:
    For IT, It provides a visual, GUI & standards-based app building and configuration experience where personas like data scientists, analysts and developers can easily, quickly and seamlessly build, configure, deploy and share big data apps. AppCenter services include a web based portal to deploy and consume apps, common services like authentication, data access, user interface, visualization and app building API’s and SDK and RESTful API’s.
    For Business, it speeds up the process of pervasively deploying & benefiting from big data analytics. For business users & analysts, it provides an interactive, visual, web based experience to analyze, view & share the results of big data apps helping organizations innovate with powerful big data insights.

    While initially specific to Aster, this approach of Restful API based integration of end-user access tools and applications will drive future capabilities for interacting with all components of the UDA.




  • Lightweight distributed RESTful endpoint for a message or messages;
    Validate API key for all messages sent;
    All messages sent get a UUID;
    All messages are wrapped with a lightweight metadata envelope and sent to the Firehose.
    Future state can be rate limited.
  • Handling traffic spikes gracefully is a critical ability of a modern data ingestion platform. This is especially true of a self-service data ingestion platform. We are leveraging Apache Kafka as a distributed write ahead log in order to help satisfy the need to handle bursts of data flowing through the system as well as smooth operation through updates, failovers and other anomalies. Because Kafka supports partitioned topics we can support horizontal scaling for our Firehose and individual Streams. Adding capacity for any topic is as simple as ensuring an adequate number of running Kafka nodes and making a configuration change to add a new partition.

    Using a write ahead log as our data bus allows us to have a great deal of asynchronicity between writers and readers. Kafka holds history of a topic for a configurable amount of time and/or space. If the applications reading data from the topic fail or are unavailable for any reason there is no impact to upstream parts of the system. This gives us a way to build up "pressure" in a very recoverable way. As a system we measure the volume of data flowing through the various parts of our system and with that information we can adjust resources to where it is most needed such as adding another node for the Router application if we notice the Firehose is being written to consistently faster than it is being read from.
  • The application responsible for reading data from the Firehose and sorting it into its relevant Streams is the Router. The Router is built on Apache Spark and is thus horizontally scalable. Determining what Streams an event will land in depends on a set of rules which are managed through the Application Services. Initially these rules are static and the events are grouped into Streams based on the API key used to send them to the Ingest Services. In the future we plan to support more advanced options for grouping streams.
  • ×