SlideShare une entreprise Scribd logo
1  sur  6
White Paper:
Causata Big Data Architecture
December 2012




© 2012 Causata Inc. · All Rights Reserved
TABLE OF CONTENTS


        · Introduction                            1


        · Event Storage in HBase                  1


        · Writing Data into Causata               2


        · Data Principles                         2


        · Customer Identities & Event Timelines   2


        · Predictive Profiles                     3


        · Model Scores & Behavioral Predictors    3


        · Reading Data from Causata               4


        · Summary                                 4


        · Contact                                 4




© 2012 Causata Inc. · All Rights Reserved
Introduction

Causata’s customer experience management applications are built upon parallel big data storage that enables the
efficient analysis of terabytes of diverse, granular, multi-structured customer data.

Stitching together unstructured and structured customer interaction data from any digital source or channel,
Causata then assembles it into concise, structured customer records suitable for ad-hoc analysis, predictive
modeling, and advanced machine learning.

Causata’s data storage layer is customer and event-oriented so every single customer interaction is stored in full
detail, using parallel storage and computation to provide low-latency access to each customer’s record set to drive
real-time actions and decisions.




Event Storage in HBase

At its lowest architectural level, Causata utilizes HBase to store a vast set of granular event data. HBase is a highly
scalable data store that forms part of the open-source Hadoop product suite, and provides a robust, inexpensive
way to store every individual customer interaction.

Causata stores detailed customer interaction records from any digital channel, such as a web click, a product
purchase, an email or a tweet. Each data point is recorded as a simple set of key-value pairs called an event. For
example, a product purchase might have a SKU, a brand, a price, a size and color; a web click might have a URL, a
page category, a browser type, a language setting and a time zone. Causata turns this messy, multi-structured event
data into structured data for analysis, sometimes called ‘rectangular data’ because each customer record has the
same set of computed fields.

Causata’s implementation of HBase supports the flexibility to add new customer interaction data types
easily. Causata does not have a traditional fixed or relational data schema. Data from any source can be loaded or
streamed into Causata, and the structure and signal extraction are applied later, when the data is read.

In order to enable fast access to individual customer records, data is stored redundantly in Causata across




© 2012 Causata Inc. · All Rights Reserved                                                                           Page 1
multiple servers. This protects against data loss and enables high-volume data retrieval and analysis through the
use of parallel processing.


Writing Data into Causata

Causata has a simple HTTP Data Connector, to which an event is written as a JSON object. Because Causata is
schema-free, it is easy to input any digital customer interaction – behavioral, social or transactional.

Causata consumes real-time feeds or streams, log and CSV files, ODBC connections to databases and data
warehouses, and plugs-in easily to any ETL including open source tools Pentaho and Talend. Data can be loaded or
streamed into Causata from an existing Hadoop or HBase data store by running a map reduce job to generate input
events into Causata.

Examples of data sets feeding Causata including web and email analytics, web tags and tag management systems,
mobile apps, social data streams, CRM and ERP data, machine logs, and data management platforms (DMPs).


Data Principles

Causata was designed with three data principles in mind:

Scalability, Flexibility, and Low Latency

Scalability across terabytes of unstructured customer interaction records relies on parallel computing – sharing the
data storage across horizontally scaling servers and performing the analytic processes in parallel, close to the data.

Flexibility is essential to cope with rapid and unpredictable changes in how customer data is generated and
consumed. Causata does not impose a fixed database schema, and allows the definition of customer records for
analysis to be made dynamically at query time.

Low Latency data access is critical to both allow business analysts and marketing scientists to perform
interactive investigation of the data, and to drive real-time personalized marketing decisions from the data
analysis. This means retrieval and assembly of customer profiles in 50 milleseconds or less, including their very
latest interactions across multiple channels.


Customer Identities & Event Timelines

A key element of Causata’s big data engine is its Identity Graph. By observing patterns of identifiers that occur
together, Causata builds up a graph connecting identifiers to an individual and ascribes each data fragment to
the correct customer. This picture becomes richer over time as new pieces of linking customer information
are recorded.

For example, if a customer logs into her web account from home and then a week later does the same from her
work computer, both cookies become linked and the two sets of web activity data are merged into a single event
stream, providing a richer profile for that customer.




© 2012 Causata Inc. · All Rights Reserved                                                                           Page 2
Data from email, mobile, social, and bricks-and-mortar channels are easily combined in the same way, by
matching identifiers such as credit and loyalty cards, account numbers, email addresses, and telephone numbers.

The Identity Graph adjusts to new connection events, providing as complete a picture as possible of an individual
customer at any point in time.

Causata organizes and stores interaction data by individual customer, forming a single event-based Customer
Timeline. Retaining the detailed event sequence, in chronological time order, allows business analysts to analyze
cause and effect in customer behavior, and to investigate specific scenarios or path analyses. This essential time
ordering is typically lost in other data systems, such as when data is pre-aggregated in a data warehouse.


Predictive Profiles

Event streams or Customer Timelines are valuable for path analysis, but are difficult to consume for ad-hoc
analysis or statistical modeling. Causata distills customer event streams and their descriptive attributes into a set
of predictive variables, or aggregates, computed over specific timescales.

For example, total spend in the past month is computed by summing the prices of all of a customer’s purchase
events in that period. Useful industry-specific variables for Financial Services, Communications, and Digital Media
are pre-built within Causata and are also easily set up and managed by business analysts.

Causata leverages its parallel compute power to calculate these variables on demand as customer data is read.
Calculation on demand ensures that customer profiles are always up to date and takes into account the customer’s
most recent activity. New predictors or variables can be defined in seconds and are then immediately available
through customer profiles.


Model Scores & Behavioral Predictors

Causata provides pre-built regression models to determine the accuracy or predictive power of variables based on
cause and effect. These linear and logistic regression models enable analysts and marketing scientists to quickly
identify the most valuable variables for their customer analyses.

Once an analyst or modeler builds a statistical, predictive model, it can be imported and deployed in seconds to
Causata for real-time, on-demand execution. Each time an individual customer profile is requested or updated, any
applicable model is evaluated for that customer, ensuring that the scores in the customer’s predictive profile are
always up-to-date. Model execution is performed in parallel across the cluster as profiles are assembled, and model
scores are computed just like any other variable.

Since a predictive model score is just like any other variable in a customer’s Predictive Profile, it can be used in
queries, for example, to retrieve event streams, predictive profiles or even just a list of all customers with a high
predicted probability of churn. Scores can also be used in real-time decision-making — for instance, to determine
what content to show on a web page or to guide a call-center agent towards the optimal cross-sell offer for a
customer.




© 2012 Causata Inc. · All Rights Reserved                                                                         Page 3
Reading Data from Causata

Data is retrieved from Causata at either the customer or event level.

At the customer level, a familiar Causata SQL query language allows queries to be framed around customer
behavior, enabling the business analyst or data scientist to ask structured questions of unstructured data. These
queries are executed in parallel across all the data stores, returning event streams, predictive profiles or modeled
scores. The queries may include combinations of specific events, profile variables, and predictive scores to select
customer records.

A simple example query by an analyst in a retail bank, for example, might select all customers who have utilized
online bill pay from a mobile device in the last week, and who have downloaded a promotional bank email in the
last 90 days. The output is a structured set of records for every customer who satisfies this query, in a predictive
record set for analysis. By allowing the analyst to ask new questions of a massive data set, Causata saves a huge
amount of time traditionally wasted in ‘data-wrangling.’

Analysts and marketing scientists can choose to run a complete query for all customers who meet specific criteria
or just retrieve a sample for initial analysis. Causata arranges the customer data to ensure that any sample is
statistically unbiased and can be used for reliable analysis.

Causata SQL enables analysts to leverage data visualization tools such as Tableau, QlikView, and Excel for further
analysis, dashboarding and reporting. Statistical modelers can query and access Causata data directly from their
R environment, and then easily import their R models into Causata for real-time operational scoring.

Causata event data can also be queried using Hadoop tools such as Hive and Cloudera Impala, which respectively
enable batch and interactive querying of Causata’s raw event data. This is valuable for queries not structured
specifically around individual customer behavior, but rather for traditional macro segmentation business
intelligence analyses.


Summary

Causata consumes multi-structured customer data from all digital channels, connects and stores it by customer
event, and assembles it into an optimal format for customer analysis and prediction.

A powerful Causata SQL query language allows the retrieval of customer records in a predictive record set structure
for predictive analysis, and the underlying HBase event storage may be queried using standard Hadoop tools.

Causata scales to millions of customer records and is a highly flexible application, making it easy to add new data
sources and ask new questions of the data. Low latency access to individual predictive profiles enables real-time
actions, tailored to the individual customer.


To learn more about us, visit us at causata.com, follow us on Twitter @Causata, or contact us for a demo.




© 2012 Causata Inc. · All Rights Reserved                                                                         Page 4

Contenu connexe

Tendances

Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBMongoDB
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting SuiteClassic Polo
 
Webinar: How Financial Firms Create a Single Customer View with MongoDB
 Webinar: How Financial Firms Create a Single Customer View with MongoDB Webinar: How Financial Firms Create a Single Customer View with MongoDB
Webinar: How Financial Firms Create a Single Customer View with MongoDBMongoDB
 
Sanjeet Kumar
 Sanjeet Kumar Sanjeet Kumar
Sanjeet Kumaritplant
 
Implementing bi in proof of concept techniques
Implementing bi in proof of concept techniquesImplementing bi in proof of concept techniques
Implementing bi in proof of concept techniquesRanjith Ramanan
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paperjuly12jana
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Presentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligencePresentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligenceDEDE IRYAWAN
 

Tendances (20)

Microstrategy
MicrostrategyMicrostrategy
Microstrategy
 
Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDB
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting Suite
 
Webinar: How Financial Firms Create a Single Customer View with MongoDB
 Webinar: How Financial Firms Create a Single Customer View with MongoDB Webinar: How Financial Firms Create a Single Customer View with MongoDB
Webinar: How Financial Firms Create a Single Customer View with MongoDB
 
Sanjeet Kumar
 Sanjeet Kumar Sanjeet Kumar
Sanjeet Kumar
 
Implementing bi in proof of concept techniques
Implementing bi in proof of concept techniquesImplementing bi in proof of concept techniques
Implementing bi in proof of concept techniques
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Microsoft Business Intelligence
Microsoft Business IntelligenceMicrosoft Business Intelligence
Microsoft Business Intelligence
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
Data vault
Data vaultData vault
Data vault
 
Enterprise business Inteligence
Enterprise business Inteligence Enterprise business Inteligence
Enterprise business Inteligence
 
Data Mapping eBook
Data Mapping eBookData Mapping eBook
Data Mapping eBook
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Presentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligencePresentasi 1 - Business Intelligence
Presentasi 1 - Business Intelligence
 
Classification of data
Classification of dataClassification of data
Classification of data
 
JDV Big Data Webinar v2
JDV Big Data Webinar v2JDV Big Data Webinar v2
JDV Big Data Webinar v2
 

Similaire à White Paper: Causata Big Data Architecture

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeCognizant
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET Journal
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...NexJ Systems Inc.
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefLindy-Anne Botha
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsRob Winters
 
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...SAP Solution Extensions
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeSG Analytics
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Platform_Technical_Overview
Platform_Technical_OverviewPlatform_Technical_Overview
Platform_Technical_OverviewKatia Mar
 

Similaire à White Paper: Causata Big Data Architecture (20)

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Data lake
Data lakeData lake
Data lake
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data Analytics
 
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Platform_Technical_Overview
Platform_Technical_OverviewPlatform_Technical_Overview
Platform_Technical_Overview
 

Dernier

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Dernier (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

White Paper: Causata Big Data Architecture

  • 1. White Paper: Causata Big Data Architecture December 2012 © 2012 Causata Inc. · All Rights Reserved
  • 2. TABLE OF CONTENTS · Introduction 1 · Event Storage in HBase 1 · Writing Data into Causata 2 · Data Principles 2 · Customer Identities & Event Timelines 2 · Predictive Profiles 3 · Model Scores & Behavioral Predictors 3 · Reading Data from Causata 4 · Summary 4 · Contact 4 © 2012 Causata Inc. · All Rights Reserved
  • 3. Introduction Causata’s customer experience management applications are built upon parallel big data storage that enables the efficient analysis of terabytes of diverse, granular, multi-structured customer data. Stitching together unstructured and structured customer interaction data from any digital source or channel, Causata then assembles it into concise, structured customer records suitable for ad-hoc analysis, predictive modeling, and advanced machine learning. Causata’s data storage layer is customer and event-oriented so every single customer interaction is stored in full detail, using parallel storage and computation to provide low-latency access to each customer’s record set to drive real-time actions and decisions. Event Storage in HBase At its lowest architectural level, Causata utilizes HBase to store a vast set of granular event data. HBase is a highly scalable data store that forms part of the open-source Hadoop product suite, and provides a robust, inexpensive way to store every individual customer interaction. Causata stores detailed customer interaction records from any digital channel, such as a web click, a product purchase, an email or a tweet. Each data point is recorded as a simple set of key-value pairs called an event. For example, a product purchase might have a SKU, a brand, a price, a size and color; a web click might have a URL, a page category, a browser type, a language setting and a time zone. Causata turns this messy, multi-structured event data into structured data for analysis, sometimes called ‘rectangular data’ because each customer record has the same set of computed fields. Causata’s implementation of HBase supports the flexibility to add new customer interaction data types easily. Causata does not have a traditional fixed or relational data schema. Data from any source can be loaded or streamed into Causata, and the structure and signal extraction are applied later, when the data is read. In order to enable fast access to individual customer records, data is stored redundantly in Causata across © 2012 Causata Inc. · All Rights Reserved Page 1
  • 4. multiple servers. This protects against data loss and enables high-volume data retrieval and analysis through the use of parallel processing. Writing Data into Causata Causata has a simple HTTP Data Connector, to which an event is written as a JSON object. Because Causata is schema-free, it is easy to input any digital customer interaction – behavioral, social or transactional. Causata consumes real-time feeds or streams, log and CSV files, ODBC connections to databases and data warehouses, and plugs-in easily to any ETL including open source tools Pentaho and Talend. Data can be loaded or streamed into Causata from an existing Hadoop or HBase data store by running a map reduce job to generate input events into Causata. Examples of data sets feeding Causata including web and email analytics, web tags and tag management systems, mobile apps, social data streams, CRM and ERP data, machine logs, and data management platforms (DMPs). Data Principles Causata was designed with three data principles in mind: Scalability, Flexibility, and Low Latency Scalability across terabytes of unstructured customer interaction records relies on parallel computing – sharing the data storage across horizontally scaling servers and performing the analytic processes in parallel, close to the data. Flexibility is essential to cope with rapid and unpredictable changes in how customer data is generated and consumed. Causata does not impose a fixed database schema, and allows the definition of customer records for analysis to be made dynamically at query time. Low Latency data access is critical to both allow business analysts and marketing scientists to perform interactive investigation of the data, and to drive real-time personalized marketing decisions from the data analysis. This means retrieval and assembly of customer profiles in 50 milleseconds or less, including their very latest interactions across multiple channels. Customer Identities & Event Timelines A key element of Causata’s big data engine is its Identity Graph. By observing patterns of identifiers that occur together, Causata builds up a graph connecting identifiers to an individual and ascribes each data fragment to the correct customer. This picture becomes richer over time as new pieces of linking customer information are recorded. For example, if a customer logs into her web account from home and then a week later does the same from her work computer, both cookies become linked and the two sets of web activity data are merged into a single event stream, providing a richer profile for that customer. © 2012 Causata Inc. · All Rights Reserved Page 2
  • 5. Data from email, mobile, social, and bricks-and-mortar channels are easily combined in the same way, by matching identifiers such as credit and loyalty cards, account numbers, email addresses, and telephone numbers. The Identity Graph adjusts to new connection events, providing as complete a picture as possible of an individual customer at any point in time. Causata organizes and stores interaction data by individual customer, forming a single event-based Customer Timeline. Retaining the detailed event sequence, in chronological time order, allows business analysts to analyze cause and effect in customer behavior, and to investigate specific scenarios or path analyses. This essential time ordering is typically lost in other data systems, such as when data is pre-aggregated in a data warehouse. Predictive Profiles Event streams or Customer Timelines are valuable for path analysis, but are difficult to consume for ad-hoc analysis or statistical modeling. Causata distills customer event streams and their descriptive attributes into a set of predictive variables, or aggregates, computed over specific timescales. For example, total spend in the past month is computed by summing the prices of all of a customer’s purchase events in that period. Useful industry-specific variables for Financial Services, Communications, and Digital Media are pre-built within Causata and are also easily set up and managed by business analysts. Causata leverages its parallel compute power to calculate these variables on demand as customer data is read. Calculation on demand ensures that customer profiles are always up to date and takes into account the customer’s most recent activity. New predictors or variables can be defined in seconds and are then immediately available through customer profiles. Model Scores & Behavioral Predictors Causata provides pre-built regression models to determine the accuracy or predictive power of variables based on cause and effect. These linear and logistic regression models enable analysts and marketing scientists to quickly identify the most valuable variables for their customer analyses. Once an analyst or modeler builds a statistical, predictive model, it can be imported and deployed in seconds to Causata for real-time, on-demand execution. Each time an individual customer profile is requested or updated, any applicable model is evaluated for that customer, ensuring that the scores in the customer’s predictive profile are always up-to-date. Model execution is performed in parallel across the cluster as profiles are assembled, and model scores are computed just like any other variable. Since a predictive model score is just like any other variable in a customer’s Predictive Profile, it can be used in queries, for example, to retrieve event streams, predictive profiles or even just a list of all customers with a high predicted probability of churn. Scores can also be used in real-time decision-making — for instance, to determine what content to show on a web page or to guide a call-center agent towards the optimal cross-sell offer for a customer. © 2012 Causata Inc. · All Rights Reserved Page 3
  • 6. Reading Data from Causata Data is retrieved from Causata at either the customer or event level. At the customer level, a familiar Causata SQL query language allows queries to be framed around customer behavior, enabling the business analyst or data scientist to ask structured questions of unstructured data. These queries are executed in parallel across all the data stores, returning event streams, predictive profiles or modeled scores. The queries may include combinations of specific events, profile variables, and predictive scores to select customer records. A simple example query by an analyst in a retail bank, for example, might select all customers who have utilized online bill pay from a mobile device in the last week, and who have downloaded a promotional bank email in the last 90 days. The output is a structured set of records for every customer who satisfies this query, in a predictive record set for analysis. By allowing the analyst to ask new questions of a massive data set, Causata saves a huge amount of time traditionally wasted in ‘data-wrangling.’ Analysts and marketing scientists can choose to run a complete query for all customers who meet specific criteria or just retrieve a sample for initial analysis. Causata arranges the customer data to ensure that any sample is statistically unbiased and can be used for reliable analysis. Causata SQL enables analysts to leverage data visualization tools such as Tableau, QlikView, and Excel for further analysis, dashboarding and reporting. Statistical modelers can query and access Causata data directly from their R environment, and then easily import their R models into Causata for real-time operational scoring. Causata event data can also be queried using Hadoop tools such as Hive and Cloudera Impala, which respectively enable batch and interactive querying of Causata’s raw event data. This is valuable for queries not structured specifically around individual customer behavior, but rather for traditional macro segmentation business intelligence analyses. Summary Causata consumes multi-structured customer data from all digital channels, connects and stores it by customer event, and assembles it into an optimal format for customer analysis and prediction. A powerful Causata SQL query language allows the retrieval of customer records in a predictive record set structure for predictive analysis, and the underlying HBase event storage may be queried using standard Hadoop tools. Causata scales to millions of customer records and is a highly flexible application, making it easy to add new data sources and ask new questions of the data. Low latency access to individual predictive profiles enables real-time actions, tailored to the individual customer. To learn more about us, visit us at causata.com, follow us on Twitter @Causata, or contact us for a demo. © 2012 Causata Inc. · All Rights Reserved Page 4