SlideShare une entreprise Scribd logo
1  sur  6
White Paper:
Causata Big Data Architecture
December 2012




© 2012 Causata Inc. · All Rights Reserved
TABLE OF CONTENTS


        · Introduction                            1


        · Event Storage in HBase                  1


        · Writing Data into Causata               2


        · Data Principles                         2


        · Customer Identities & Event Timelines   2


        · Predictive Profiles                     3


        · Model Scores & Behavioral Predictors    3


        · Reading Data from Causata               4


        · Summary                                 4


        · Contact                                 4




© 2012 Causata Inc. · All Rights Reserved
Introduction

Causata’s customer experience management applications are built upon parallel big data storage that enables the
efficient analysis of terabytes of diverse, granular, multi-structured customer data.

Stitching together unstructured and structured customer interaction data from any digital source or channel,
Causata then assembles it into concise, structured customer records suitable for ad-hoc analysis, predictive
modeling, and advanced machine learning.

Causata’s data storage layer is customer and event-oriented so every single customer interaction is stored in full
detail, using parallel storage and computation to provide low-latency access to each customer’s record set to drive
real-time actions and decisions.




Event Storage in HBase

At its lowest architectural level, Causata utilizes HBase to store a vast set of granular event data. HBase is a highly
scalable data store that forms part of the open-source Hadoop product suite, and provides a robust, inexpensive
way to store every individual customer interaction.

Causata stores detailed customer interaction records from any digital channel, such as a web click, a product
purchase, an email or a tweet. Each data point is recorded as a simple set of key-value pairs called an event. For
example, a product purchase might have a SKU, a brand, a price, a size and color; a web click might have a URL, a
page category, a browser type, a language setting and a time zone. Causata turns this messy, multi-structured event
data into structured data for analysis, sometimes called ‘rectangular data’ because each customer record has the
same set of computed fields.

Causata’s implementation of HBase supports the flexibility to add new customer interaction data types
easily. Causata does not have a traditional fixed or relational data schema. Data from any source can be loaded or
streamed into Causata, and the structure and signal extraction are applied later, when the data is read.

In order to enable fast access to individual customer records, data is stored redundantly in Causata across




© 2012 Causata Inc. · All Rights Reserved                                                                           Page 1
multiple servers. This protects against data loss and enables high-volume data retrieval and analysis through the
use of parallel processing.


Writing Data into Causata

Causata has a simple HTTP Data Connector, to which an event is written as a JSON object. Because Causata is
schema-free, it is easy to input any digital customer interaction – behavioral, social or transactional.

Causata consumes real-time feeds or streams, log and CSV files, ODBC connections to databases and data
warehouses, and plugs-in easily to any ETL including open source tools Pentaho and Talend. Data can be loaded or
streamed into Causata from an existing Hadoop or HBase data store by running a map reduce job to generate input
events into Causata.

Examples of data sets feeding Causata including web and email analytics, web tags and tag management systems,
mobile apps, social data streams, CRM and ERP data, machine logs, and data management platforms (DMPs).


Data Principles

Causata was designed with three data principles in mind:

Scalability, Flexibility, and Low Latency

Scalability across terabytes of unstructured customer interaction records relies on parallel computing – sharing the
data storage across horizontally scaling servers and performing the analytic processes in parallel, close to the data.

Flexibility is essential to cope with rapid and unpredictable changes in how customer data is generated and
consumed. Causata does not impose a fixed database schema, and allows the definition of customer records for
analysis to be made dynamically at query time.

Low Latency data access is critical to both allow business analysts and marketing scientists to perform
interactive investigation of the data, and to drive real-time personalized marketing decisions from the data
analysis. This means retrieval and assembly of customer profiles in 50 milleseconds or less, including their very
latest interactions across multiple channels.


Customer Identities & Event Timelines

A key element of Causata’s big data engine is its Identity Graph. By observing patterns of identifiers that occur
together, Causata builds up a graph connecting identifiers to an individual and ascribes each data fragment to
the correct customer. This picture becomes richer over time as new pieces of linking customer information
are recorded.

For example, if a customer logs into her web account from home and then a week later does the same from her
work computer, both cookies become linked and the two sets of web activity data are merged into a single event
stream, providing a richer profile for that customer.




© 2012 Causata Inc. · All Rights Reserved                                                                           Page 2
Data from email, mobile, social, and bricks-and-mortar channels are easily combined in the same way, by
matching identifiers such as credit and loyalty cards, account numbers, email addresses, and telephone numbers.

The Identity Graph adjusts to new connection events, providing as complete a picture as possible of an individual
customer at any point in time.

Causata organizes and stores interaction data by individual customer, forming a single event-based Customer
Timeline. Retaining the detailed event sequence, in chronological time order, allows business analysts to analyze
cause and effect in customer behavior, and to investigate specific scenarios or path analyses. This essential time
ordering is typically lost in other data systems, such as when data is pre-aggregated in a data warehouse.


Predictive Profiles

Event streams or Customer Timelines are valuable for path analysis, but are difficult to consume for ad-hoc
analysis or statistical modeling. Causata distills customer event streams and their descriptive attributes into a set
of predictive variables, or aggregates, computed over specific timescales.

For example, total spend in the past month is computed by summing the prices of all of a customer’s purchase
events in that period. Useful industry-specific variables for Financial Services, Communications, and Digital Media
are pre-built within Causata and are also easily set up and managed by business analysts.

Causata leverages its parallel compute power to calculate these variables on demand as customer data is read.
Calculation on demand ensures that customer profiles are always up to date and takes into account the customer’s
most recent activity. New predictors or variables can be defined in seconds and are then immediately available
through customer profiles.


Model Scores & Behavioral Predictors

Causata provides pre-built regression models to determine the accuracy or predictive power of variables based on
cause and effect. These linear and logistic regression models enable analysts and marketing scientists to quickly
identify the most valuable variables for their customer analyses.

Once an analyst or modeler builds a statistical, predictive model, it can be imported and deployed in seconds to
Causata for real-time, on-demand execution. Each time an individual customer profile is requested or updated, any
applicable model is evaluated for that customer, ensuring that the scores in the customer’s predictive profile are
always up-to-date. Model execution is performed in parallel across the cluster as profiles are assembled, and model
scores are computed just like any other variable.

Since a predictive model score is just like any other variable in a customer’s Predictive Profile, it can be used in
queries, for example, to retrieve event streams, predictive profiles or even just a list of all customers with a high
predicted probability of churn. Scores can also be used in real-time decision-making — for instance, to determine
what content to show on a web page or to guide a call-center agent towards the optimal cross-sell offer for a
customer.




© 2012 Causata Inc. · All Rights Reserved                                                                         Page 3
Reading Data from Causata

Data is retrieved from Causata at either the customer or event level.

At the customer level, a familiar Causata SQL query language allows queries to be framed around customer
behavior, enabling the business analyst or data scientist to ask structured questions of unstructured data. These
queries are executed in parallel across all the data stores, returning event streams, predictive profiles or modeled
scores. The queries may include combinations of specific events, profile variables, and predictive scores to select
customer records.

A simple example query by an analyst in a retail bank, for example, might select all customers who have utilized
online bill pay from a mobile device in the last week, and who have downloaded a promotional bank email in the
last 90 days. The output is a structured set of records for every customer who satisfies this query, in a predictive
record set for analysis. By allowing the analyst to ask new questions of a massive data set, Causata saves a huge
amount of time traditionally wasted in ‘data-wrangling.’

Analysts and marketing scientists can choose to run a complete query for all customers who meet specific criteria
or just retrieve a sample for initial analysis. Causata arranges the customer data to ensure that any sample is
statistically unbiased and can be used for reliable analysis.

Causata SQL enables analysts to leverage data visualization tools such as Tableau, QlikView, and Excel for further
analysis, dashboarding and reporting. Statistical modelers can query and access Causata data directly from their
R environment, and then easily import their R models into Causata for real-time operational scoring.

Causata event data can also be queried using Hadoop tools such as Hive and Cloudera Impala, which respectively
enable batch and interactive querying of Causata’s raw event data. This is valuable for queries not structured
specifically around individual customer behavior, but rather for traditional macro segmentation business
intelligence analyses.


Summary

Causata consumes multi-structured customer data from all digital channels, connects and stores it by customer
event, and assembles it into an optimal format for customer analysis and prediction.

A powerful Causata SQL query language allows the retrieval of customer records in a predictive record set structure
for predictive analysis, and the underlying HBase event storage may be queried using standard Hadoop tools.

Causata scales to millions of customer records and is a highly flexible application, making it easy to add new data
sources and ask new questions of the data. Low latency access to individual predictive profiles enables real-time
actions, tailored to the individual customer.


To learn more about us, visit us at causata.com, follow us on Twitter @Causata, or contact us for a demo.




© 2012 Causata Inc. · All Rights Reserved                                                                         Page 4

Contenu connexe

Tendances

Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBMongoDB
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting SuiteClassic Polo
 
Webinar: How Financial Firms Create a Single Customer View with MongoDB
 Webinar: How Financial Firms Create a Single Customer View with MongoDB Webinar: How Financial Firms Create a Single Customer View with MongoDB
Webinar: How Financial Firms Create a Single Customer View with MongoDBMongoDB
 
Sanjeet Kumar
 Sanjeet Kumar Sanjeet Kumar
Sanjeet Kumaritplant
 
Implementing bi in proof of concept techniques
Implementing bi in proof of concept techniquesImplementing bi in proof of concept techniques
Implementing bi in proof of concept techniquesRanjith Ramanan
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paperjuly12jana
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRyan Andhavarapu
 
Presentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligencePresentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligenceDEDE IRYAWAN
 

Tendances (20)

Microstrategy
MicrostrategyMicrostrategy
Microstrategy
 
Webinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDBWebinar: Making A Single View of the Customer Real with MongoDB
Webinar: Making A Single View of the Customer Real with MongoDB
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting Suite
 
Webinar: How Financial Firms Create a Single Customer View with MongoDB
 Webinar: How Financial Firms Create a Single Customer View with MongoDB Webinar: How Financial Firms Create a Single Customer View with MongoDB
Webinar: How Financial Firms Create a Single Customer View with MongoDB
 
Sanjeet Kumar
 Sanjeet Kumar Sanjeet Kumar
Sanjeet Kumar
 
Implementing bi in proof of concept techniques
Implementing bi in proof of concept techniquesImplementing bi in proof of concept techniques
Implementing bi in proof of concept techniques
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Microsoft Business Intelligence
Microsoft Business IntelligenceMicrosoft Business Intelligence
Microsoft Business Intelligence
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
Data vault
Data vaultData vault
Data vault
 
Enterprise business Inteligence
Enterprise business Inteligence Enterprise business Inteligence
Enterprise business Inteligence
 
Data Mapping eBook
Data Mapping eBookData Mapping eBook
Data Mapping eBook
 
Rev_3 Components of a Data Warehouse
Rev_3 Components of a Data WarehouseRev_3 Components of a Data Warehouse
Rev_3 Components of a Data Warehouse
 
Presentasi 1 - Business Intelligence
Presentasi 1 - Business IntelligencePresentasi 1 - Business Intelligence
Presentasi 1 - Business Intelligence
 
Classification of data
Classification of dataClassification of data
Classification of data
 
JDV Big Data Webinar v2
JDV Big Data Webinar v2JDV Big Data Webinar v2
JDV Big Data Webinar v2
 

Similaire à White Paper: Causata Big Data Architecture

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeCognizant
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET Journal
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...NexJ Systems Inc.
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefLindy-Anne Botha
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsRob Winters
 
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...SAP Solution Extensions
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeSG Analytics
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Platform_Technical_Overview
Platform_Technical_OverviewPlatform_Technical_Overview
Platform_Technical_OverviewKatia Mar
 

Similaire à White Paper: Causata Big Data Architecture (20)

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Data lake
Data lakeData lake
Data lake
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
IRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using QlikIRJET- Data Analytics & Visualization using Qlik
IRJET- Data Analytics & Visualization using Qlik
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
NexJ CDM Overview: Better Understand Customers with NexJ Customer Data Manage...
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
 
Architecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data AnalyticsArchitecting for Real-Time Big Data Analytics
Architecting for Real-Time Big Data Analytics
 
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
Capture and Feed Telecom Network Data and More Into SAP HANA - Quicky and Aff...
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Platform_Technical_Overview
Platform_Technical_OverviewPlatform_Technical_Overview
Platform_Technical_Overview
 

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

White Paper: Causata Big Data Architecture

  • 1. White Paper: Causata Big Data Architecture December 2012 © 2012 Causata Inc. · All Rights Reserved
  • 2. TABLE OF CONTENTS · Introduction 1 · Event Storage in HBase 1 · Writing Data into Causata 2 · Data Principles 2 · Customer Identities & Event Timelines 2 · Predictive Profiles 3 · Model Scores & Behavioral Predictors 3 · Reading Data from Causata 4 · Summary 4 · Contact 4 © 2012 Causata Inc. · All Rights Reserved
  • 3. Introduction Causata’s customer experience management applications are built upon parallel big data storage that enables the efficient analysis of terabytes of diverse, granular, multi-structured customer data. Stitching together unstructured and structured customer interaction data from any digital source or channel, Causata then assembles it into concise, structured customer records suitable for ad-hoc analysis, predictive modeling, and advanced machine learning. Causata’s data storage layer is customer and event-oriented so every single customer interaction is stored in full detail, using parallel storage and computation to provide low-latency access to each customer’s record set to drive real-time actions and decisions. Event Storage in HBase At its lowest architectural level, Causata utilizes HBase to store a vast set of granular event data. HBase is a highly scalable data store that forms part of the open-source Hadoop product suite, and provides a robust, inexpensive way to store every individual customer interaction. Causata stores detailed customer interaction records from any digital channel, such as a web click, a product purchase, an email or a tweet. Each data point is recorded as a simple set of key-value pairs called an event. For example, a product purchase might have a SKU, a brand, a price, a size and color; a web click might have a URL, a page category, a browser type, a language setting and a time zone. Causata turns this messy, multi-structured event data into structured data for analysis, sometimes called ‘rectangular data’ because each customer record has the same set of computed fields. Causata’s implementation of HBase supports the flexibility to add new customer interaction data types easily. Causata does not have a traditional fixed or relational data schema. Data from any source can be loaded or streamed into Causata, and the structure and signal extraction are applied later, when the data is read. In order to enable fast access to individual customer records, data is stored redundantly in Causata across © 2012 Causata Inc. · All Rights Reserved Page 1
  • 4. multiple servers. This protects against data loss and enables high-volume data retrieval and analysis through the use of parallel processing. Writing Data into Causata Causata has a simple HTTP Data Connector, to which an event is written as a JSON object. Because Causata is schema-free, it is easy to input any digital customer interaction – behavioral, social or transactional. Causata consumes real-time feeds or streams, log and CSV files, ODBC connections to databases and data warehouses, and plugs-in easily to any ETL including open source tools Pentaho and Talend. Data can be loaded or streamed into Causata from an existing Hadoop or HBase data store by running a map reduce job to generate input events into Causata. Examples of data sets feeding Causata including web and email analytics, web tags and tag management systems, mobile apps, social data streams, CRM and ERP data, machine logs, and data management platforms (DMPs). Data Principles Causata was designed with three data principles in mind: Scalability, Flexibility, and Low Latency Scalability across terabytes of unstructured customer interaction records relies on parallel computing – sharing the data storage across horizontally scaling servers and performing the analytic processes in parallel, close to the data. Flexibility is essential to cope with rapid and unpredictable changes in how customer data is generated and consumed. Causata does not impose a fixed database schema, and allows the definition of customer records for analysis to be made dynamically at query time. Low Latency data access is critical to both allow business analysts and marketing scientists to perform interactive investigation of the data, and to drive real-time personalized marketing decisions from the data analysis. This means retrieval and assembly of customer profiles in 50 milleseconds or less, including their very latest interactions across multiple channels. Customer Identities & Event Timelines A key element of Causata’s big data engine is its Identity Graph. By observing patterns of identifiers that occur together, Causata builds up a graph connecting identifiers to an individual and ascribes each data fragment to the correct customer. This picture becomes richer over time as new pieces of linking customer information are recorded. For example, if a customer logs into her web account from home and then a week later does the same from her work computer, both cookies become linked and the two sets of web activity data are merged into a single event stream, providing a richer profile for that customer. © 2012 Causata Inc. · All Rights Reserved Page 2
  • 5. Data from email, mobile, social, and bricks-and-mortar channels are easily combined in the same way, by matching identifiers such as credit and loyalty cards, account numbers, email addresses, and telephone numbers. The Identity Graph adjusts to new connection events, providing as complete a picture as possible of an individual customer at any point in time. Causata organizes and stores interaction data by individual customer, forming a single event-based Customer Timeline. Retaining the detailed event sequence, in chronological time order, allows business analysts to analyze cause and effect in customer behavior, and to investigate specific scenarios or path analyses. This essential time ordering is typically lost in other data systems, such as when data is pre-aggregated in a data warehouse. Predictive Profiles Event streams or Customer Timelines are valuable for path analysis, but are difficult to consume for ad-hoc analysis or statistical modeling. Causata distills customer event streams and their descriptive attributes into a set of predictive variables, or aggregates, computed over specific timescales. For example, total spend in the past month is computed by summing the prices of all of a customer’s purchase events in that period. Useful industry-specific variables for Financial Services, Communications, and Digital Media are pre-built within Causata and are also easily set up and managed by business analysts. Causata leverages its parallel compute power to calculate these variables on demand as customer data is read. Calculation on demand ensures that customer profiles are always up to date and takes into account the customer’s most recent activity. New predictors or variables can be defined in seconds and are then immediately available through customer profiles. Model Scores & Behavioral Predictors Causata provides pre-built regression models to determine the accuracy or predictive power of variables based on cause and effect. These linear and logistic regression models enable analysts and marketing scientists to quickly identify the most valuable variables for their customer analyses. Once an analyst or modeler builds a statistical, predictive model, it can be imported and deployed in seconds to Causata for real-time, on-demand execution. Each time an individual customer profile is requested or updated, any applicable model is evaluated for that customer, ensuring that the scores in the customer’s predictive profile are always up-to-date. Model execution is performed in parallel across the cluster as profiles are assembled, and model scores are computed just like any other variable. Since a predictive model score is just like any other variable in a customer’s Predictive Profile, it can be used in queries, for example, to retrieve event streams, predictive profiles or even just a list of all customers with a high predicted probability of churn. Scores can also be used in real-time decision-making — for instance, to determine what content to show on a web page or to guide a call-center agent towards the optimal cross-sell offer for a customer. © 2012 Causata Inc. · All Rights Reserved Page 3
  • 6. Reading Data from Causata Data is retrieved from Causata at either the customer or event level. At the customer level, a familiar Causata SQL query language allows queries to be framed around customer behavior, enabling the business analyst or data scientist to ask structured questions of unstructured data. These queries are executed in parallel across all the data stores, returning event streams, predictive profiles or modeled scores. The queries may include combinations of specific events, profile variables, and predictive scores to select customer records. A simple example query by an analyst in a retail bank, for example, might select all customers who have utilized online bill pay from a mobile device in the last week, and who have downloaded a promotional bank email in the last 90 days. The output is a structured set of records for every customer who satisfies this query, in a predictive record set for analysis. By allowing the analyst to ask new questions of a massive data set, Causata saves a huge amount of time traditionally wasted in ‘data-wrangling.’ Analysts and marketing scientists can choose to run a complete query for all customers who meet specific criteria or just retrieve a sample for initial analysis. Causata arranges the customer data to ensure that any sample is statistically unbiased and can be used for reliable analysis. Causata SQL enables analysts to leverage data visualization tools such as Tableau, QlikView, and Excel for further analysis, dashboarding and reporting. Statistical modelers can query and access Causata data directly from their R environment, and then easily import their R models into Causata for real-time operational scoring. Causata event data can also be queried using Hadoop tools such as Hive and Cloudera Impala, which respectively enable batch and interactive querying of Causata’s raw event data. This is valuable for queries not structured specifically around individual customer behavior, but rather for traditional macro segmentation business intelligence analyses. Summary Causata consumes multi-structured customer data from all digital channels, connects and stores it by customer event, and assembles it into an optimal format for customer analysis and prediction. A powerful Causata SQL query language allows the retrieval of customer records in a predictive record set structure for predictive analysis, and the underlying HBase event storage may be queried using standard Hadoop tools. Causata scales to millions of customer records and is a highly flexible application, making it easy to add new data sources and ask new questions of the data. Low latency access to individual predictive profiles enables real-time actions, tailored to the individual customer. To learn more about us, visit us at causata.com, follow us on Twitter @Causata, or contact us for a demo. © 2012 Causata Inc. · All Rights Reserved Page 4