SlideShare a Scribd company logo
1 of 29
Download to read offline
The Value of a Modern Data Architecture
with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Page 1
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• EDW’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 2
DATA	
  SOURCES	
  
Sources	
  

DATA	
  SYSTEMS	
  
Data	
  Systems	
  

APPLICATIONS	
  
Applica/ons	
  

Existing Data Architecture
Packaged	
  	
  
Analy/c	
  App	
  

RDBMS	
  

EDW	
  

Custom	
  	
  
Analy/c	
  App	
  

Discovery	
  
PlaEorm	
  

Tradi/onal	
  Sources	
  	
  
(RDBMS,	
  OLTP,	
  OLAP)	
  

© Hortonworks Inc. 2013

Page 3
Big Data Market Trends & Projections
15x
1 Zettabyte (ZB)
=
1 Billion TBs

growth rate of
machine
generated data
by 2020

The US has 1/3 of the world’s data
Big Data is 1 of 5 US GDP Game Changers
$325 billion incremental annual GDP from big data
analytics in retail and manufacturing by 2020

Big
Data
Explosion
© Hortonworks Inc. 2013

ñ

20%

% by which org’s
leveraging modern
info management
systems outperform
peers by 2015

Page 4
APPLICATIONS	
  

Traditional Data Architecture Pressured
Custom	
  
Applica/ons	
  

Business	
  
Analy/cs	
  

Packaged	
  
Applica/ons	
  

DATA	
  SYSTEMS	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

Discovery	
  
PlaEorm	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

DATA	
  SOURCES	
  

Source: IDC

	
  	
  	
  	
  	
  	
  	
  

	
  
Tradi/onal	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  New	
  Sources	
  	
  

	
   YSTEMS	
  
OLTP,	
  POS	
  S(RDBMS,	
  OLTP,	
  OLAP)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (sen/ment,	
  click,	
  geo,	
  sensor,	
  …)	
  

© Hortonworks Inc. 2013

	
  

Page 5
APPLICATIONS	
  

Modern Data Architecture Enabled
Custom	
  
Applica/ons	
  

Business	
  
Analy/cs	
  

Packaged	
  
Applica/ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

DATA	
  SOURCES	
  

DATA	
  SYSTEMS	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  

RDBMS	
  

	
  	
  	
  	
  	
  	
  	
  

EDW	
  

MANAGE	
  &	
  
MONITOR	
  

Discovery	
  
PlaEorm	
  

	
  
Tradi/onal	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  New	
  Sources	
  	
  

OLTP,	
  POS	
   	
  (RDBMS,	
  OLTP,	
  OLAP)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (sen/ment,	
  click,	
  geo,	
  sensor,	
  …)	
  
SYSTEMS	
  

© Hortonworks Inc. 2013

	
  

Page 6
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• EDW’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 7
What Data is Being Stored in Hadoop?
1.  Social
Understand how your customers feel about your brand and
products – right now

2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3.  Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines

4.  Geolocation
Analyze location-based data to manage operations where
they occur

Value

5.  Server Logs
Research logs to diagnose process failures and prevent
security breaches

6.  Unstructured (text, video, pictures, etc..)
Understand patterns in text across millions of unstructured
work products: web pages, emails, video, pictures and
documents

© Hortonworks Inc. 2013

Page 8
Modern Data Architecture Applied

DATA	
  SOURCES	
  
Sources	
  

DATA	
  Systems	
  
Data	
   SYSTEMS	
  

APPLICATIONS	
  
Applica/ons	
  

Shared Data Lake
Packaged	
  	
  
Analy/c	
  App	
  

RDBMS	
  

EDW	
  

Custom	
  	
  
Analy/c	
  App	
  

Discovery	
  
PlaEorm	
  

Tradi/onal	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  New	
  Sources	
  	
  

	
  	
  	
  	
  	
  	
  	
  	
  (RDBMS,	
  OLTP,	
  OLAP)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (sen/ment,	
  click,	
  geo,	
  sensor,	
  …)	
  

© Hortonworks Inc. 2013

•  Store all data and build/
enable applications on
shared “data lake”
•  As orgs mature they
move to this as a goal for
Hadoop

Infrastructure	
  -­‐	
  Data	
  Lake	
  
Modern	
  Data	
  Architecture	
  

•  Delivers broad value
across the enterprise

Page 9
Drivers for Hadoop Adoption
Modern Data Architecture
Hadoop has a central role in next
generation data architectures while
integrating with existing data systems

Driving Efficiency

Business Applications
Use Hadoop to extract insights that
enable new customer value and
competitive edge

Driving Opportunity

Big Data Sets
Existing
Traditional
Server log
Clickstream
© Hortonworks Inc. 2013

Emerging
Sentiment/Social
Machine/Sensor
Geo-locations
Page 10
3

Requirements for Hadoop Adoption
Requirements for Hadoop’s Role
in the Modern Data Architecture

Integrated

Interoperable with
existing data center
investments

Key Services
Skills

Platform, operational and
data services essential for
the enterprise

Leverage your existing
skills: development,
operations, analytics

© Hortonworks Inc. 2013

Page 11
APPLICATIONS	
  

Interoperating With Your Tools

DATA	
  SOURCES	
  

DATA	
  SYSTEMS	
  

Microsoft Applications

DEV	
  &	
  DATA	
  
TOOLS	
  

OPERATIONAL	
  
TOOLS	
  

Viewpoint

	
  
Tradi/onal	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  New	
  Sources	
  	
  

	
  	
  	
  	
  	
  	
  	
  	
  (RDBMS,	
  OLTP,	
  OLAP)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (sen/ment,	
  click,	
  geo,	
  sensor,	
  …)	
  
	
  

© Hortonworks Inc. 2013

Page 12
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• EDW’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 13
Shift from a Single Platform to an Ecosystem
"Logical" Data Warehouse

“We will abandon the old models
based on the desire to implement for
high-value analytic applications.”

“Big Data requirements are solved by
a range of platforms including
analytical databases, discovery
platforms, and NoSQL solutions
beyond Hadoop.”
Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.

14

2/28/14

Teradata Confidential
UNIFIED DATA ARCHITECTURE
ERP

MANAGE

MOVE

ACCESS
Marketing

Marketing
Executives

Applications

Operational
Systems

SCM

INTEGRATED DATA WAREHOUSE
CRM

Images

DATA
PLATFORM

Business
Intelligence

Audio
and Video

Data
Mining

Frontline
Workers

Customers
Partners

Engineers
Machine
Logs

DISCOVERY PLATFORM

Math
and Stats

Data
Scientists

Text
Languages

Business
Analysts

Web and
Social

USERS
SOURCES

ANALYTIC
TOOLS
UNIFIED DATA ARCHITECTURE
ERP

MANAGE

MOVE

ACCESS
Marketing

Marketing
Executives

Applications

Operational
Systems

SCM

INTEGRATED DATA WAREHOUSE
CRM

Images

DATA
PLATFORM

Business Intelligence
Predictive Analytics

Business
Intelligence

Operational Intelligence
Audio
and Video

Machine
Logs

Text

Fast Loading
Data
Mining

Filtering and
Processing

Online Archival

Customers
Partners

Engineers

DISCOVERY PLATFORM

Math
and Stats

Data
Scientists

Data Discovery
Path, graph, time-series analysis

Web and
Social

Frontline
Workers

Languages

Business
Analysts

Pattern Detection
USERS

SOURCES

ANALYTIC
TOOLS
TERADATA UNIFIED DATA ARCHITECTURE

ERP

MANAGE

MOVE

ACCESS
Marketing

Marketing
Executives

Applications

Operational
Systems

SCM

INTEGRATED DATA WAREHOUSE
CRM

Images

DATA
PLATFORM

Business
Intelligence

Audio
and Video

Data
Mining

Frontline
Workers

Customers
Partners

Engineers
Machine
Logs

DISCOVERY PLATFORM

Math
and Stats

Data
Scientists

Text
Languages

Business
Analysts

Web and
Social

USERS
SOURCES

ANALYTIC
TOOLS
Teradata Appliance for Hadoop

Value-Added Software Bringing Hadoop to Enterprise

Access: SQL-H™, Teradata Studio
Management: Viewpoint, TVI
Administration: Hadoop Builder,
Intelligent start/stop, DataNode
swap, deferred drive replace
High Availability : NameNode
HA, Master Machine Failover

Refining, Metadata,
Entity Resolution

HCatalog
18

2/28/14

Security & Data Access

Kerberos
Kerberos
Teradata Confidential
Modern Data Architecture Details
TVI – Proactive system monitoring tied to Teradata customer support

Alerts

Viewpoint

SOURCE
DATA
Sensor Log
Data

System
Health

Node
Health

Space
Usage

Capacity
Heatmap

Metrics
Analysis

Query/Visualization/
Reporting/Analytical
Tools and Apps

DB

JDBC/ODBC
Compliant Tool

KNOX
AMBARI

File

MAPREDUCE

Customer/
Inventory
Data
JMS

YARN

Clickstream
Data
REST
Flat Files

Services

HDFS

LOAD
SQOOP

EXTRACT

REFINE
HIVE

FLUME

PIG

NFS

ETL

Web HDFS

HCATALOG
(metadata services)

CUSTOM

HTTP

Sentiment
Analysis
Streaming
Data

STRUCTURING

© Hortonworks Inc. 2013

EXPORT

INTERACTIVE
Teradata
SQL-H

Analytical
Platforms
Aster Discovery
Platform

LOAD
Teradata IDW

SQOOP / HIVE

TDCH

Page 19
Teradata Vital Infrastructure (TVI)
PROACTIVE RELIABILITY, AVAILABILITY, AND MANAGEABILITY
1U server virtualizes system and cabinet management software
Server Management VMS
•  Cabinet Management Interface Controller (CMIC)
•  Service Work Station (SWS)
•  Automatically installed on base/first cabinet
VMS allows full
rack solutions
without additional
cabinet for
traditional SWS

Eliminates need
for expansion
racks, reducing
customers’ floor
space and energy
costs

Supports
Teradata
hardware and
Hadoop software

TVI Support for
Hadoop

62–70% of Incidents Discovered through TVI
20

2/28/14

Teradata Confidential
Standard SQL Access to Hadoop Data
Give business users on-the-fly access to data in Hadoop
Teradata SQL-H
Aster SQL-H

Hadoop
MR

•  Fast: Queries run on Teradata or Aster,
data accessed from Hadoop

Data

•  Standard: 100% ANSI SQL access to
Hadoop data

Data Filtering

•  Trusted: Use existing tools/skills and
enable self-service BI with granular
security

HCatalog

Hive

Pig

•  Efficient: Intelligent data access
leveraging the Hadoop HCatalog

21

2/28/14

Teradata Confidential

Hadoop Layer: HDFS
Teradata Unified Data Architecture™
Partners Support Many Layers

22

2/28/14

Teradata Confidential
Teradata Aster Discovery Portfolio:
Accelerate Time to Insights

Some of the 80+ out-of-the-box analytical apps
PATH ANALYSIS

TEXT ANALYSIS

Discover Patterns in Rows of
Sequential Data

Derive Patterns and Extract
Features in Textual Data

STATISTICAL ANALYSIS

High-Performance Processing of
Common Statistical Calculations

SEGMENTATION

MARKETING ANALYTICS

DATA TRANSFORMATION

Analyze Customer Interactions to
Optimize Marketing Decisions

Graph Analysis

Graph analytics processing and
visualization

23

2/28/14

Teradata Confidential

Discover Natural Groupings of
Data Points

Transform Data for More
Advanced Analysis

SQL-MapReduce
Visualization

Graphing and visualization tools
linked to key functions of the
MapReduce analytics library
More Accurate Customer Churn Prevention
Hadoop
captures, stores
and transforms
social, images
and call records

SOCIAL
FEEDS

Multi-Structured
Raw Data
Call Center
Voice Records

CLICKSTREAM
DATA

Call Data

Hadoop

Sentiment
Scores

Aster
Discovery
Platform

Data Sources
ETL Tools

24

2/28/14

Teradata Confidential

Analytic Results

Traditional Data Flow

Capture, Retain
and Refine Layer

Dimensional Data

Check Data
eMail

Aster does path
and pattern
analysis

Teradata
Integrated DW

Analysis +
Marketing
Automation
(Customer
Retention
Campaign)
MPP RDBMS + Hadoop Customer Successes

25

2/28/14

Teradata Confidential
Key Considerations For EDW and Hadoop
MPP RDBMS

Hadoop

Stable Schema

Evolving Schema

Leverages Structured Data

Structure Agnostic

ANSI SQL

Flexible Programming

Iterative Analysis

Batch Analysis

Fine Grain Security

N/A

Cleansed Data

Raw Data

Seeks

Scans

Updates/Deletes

Ingest

Service Level Agreements

Flexibility

Core Data

All Data

Complex Joins
Efficient Use of CPU/IO
26

Complex Processing
Low Cost of Storage

2/28/14

Teradata Confidential
Complete Consulting and Training
Services
Teradata Analytic
Architecture Services

Services to scope, design, build, operate and maintain an optimal UDA
approach for Teradata, Aster, and Hadoop

Teradata DI
Optimization

Assess structured/non-structured data, discuss data loading techniques,
determine best platform, optimize load scripts/processes

Teradata Big
Analytics

Assess data value/cost of capture, identify source of “exhaust” data, create
conceptual architecture, refine and enrich the data, implement initial
analytics in Aster or best-fit tool

Teradata Workshop
for Hadoop

Introduction workshop (across all of UDA)

Teradata Data Staging
for Hadoop

Load data into landing-area; set-up data exploration/refining area; Scope
architecture and analytics; set-up Hadoop repository; Load sample data

Teradata Platform for
Hadoop

Installation guidance and mentoring for Hadoop platform, D-I-Y after
installation

Teradata Managed
Services for Hadoop

Operations, management, administration, backup, security, process control
for Hadoop

Teradata Training
Courses for Hadoop

27

Areas of Focus

Two comprehensive, multi-day training offerings: 1) Administration of
Apache Hadoop and 2) Developing Solutions Using Apache Hadoop

2/28/14

Teradata Confidential
Discovering Deep Insights in Retail
Transforming Web Walks into DNA Sequences
Impact

Situation
Large retailer with 700M visits/
year, 2M customers / day look
at 1M products online
Problem
Increase ability of web content
owners to self-serve insights
Solution
Treat web walks like DNA
sequences of simple patterns.

28

2/28/14

•  Data: loaded logs into Hortonworks
•  Loaded 2 months of raw data in 1
hour, vs. 1 day on old system
•  Can load a day’s log data in 60 sec

•  Sessionize: Creates sequence for
visit, e.g., boils 20 customer clicks
down to 1 line:

•  <Home –Search -Look at Product Add to Basket – Pay – Exit>

•  Analyze: Business analysts can now
do path analysis
•  Act:
•  Segmentations by behavior can
increase conversion rates by 5-10%.
•  Web design changes can drive
another 10-20% more visitors into
the sales funnel
Teradata Confidential
Demo

Demo

29

2/28/14

Teradata Confidential

More Related Content

What's hot

Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...HostedbyConfluent
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief OverviewHal Kalechofsky
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentationDavid Rice
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
AWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesAWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesTobyWilman
 

What's hot (20)

Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Big Data
Big DataBig Data
Big Data
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
AWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesAWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - Slides
 

Similar to The Value of a Modern Data Architecture with Hadoop and Teradata

The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIKognitio
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Michael Hiskey
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessMarcos Quezada
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Hortonworks
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 

Similar to The Value of a Modern Data Architecture with Hadoop and Teradata (20)

The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BI
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycl...
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

The Value of a Modern Data Architecture with Hadoop and Teradata

  • 1. The Value of a Modern Data Architecture with Apache Hadoop and Teradata © Hortonworks Inc. 2013 Page 1
  • 2. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 2
  • 3. DATA  SOURCES   Sources   DATA  SYSTEMS   Data  Systems   APPLICATIONS   Applica/ons   Existing Data Architecture Packaged     Analy/c  App   RDBMS   EDW   Custom     Analy/c  App   Discovery   PlaEorm   Tradi/onal  Sources     (RDBMS,  OLTP,  OLAP)   © Hortonworks Inc. 2013 Page 3
  • 4. Big Data Market Trends & Projections 15x 1 Zettabyte (ZB) = 1 Billion TBs growth rate of machine generated data by 2020 The US has 1/3 of the world’s data Big Data is 1 of 5 US GDP Game Changers $325 billion incremental annual GDP from big data analytics in retail and manufacturing by 2020 Big Data Explosion © Hortonworks Inc. 2013 ñ 20% % by which org’s leveraging modern info management systems outperform peers by 2015 Page 4
  • 5. APPLICATIONS   Traditional Data Architecture Pressured Custom   Applica/ons   Business   Analy/cs   Packaged   Applica/ons   DATA  SYSTEMS   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   Discovery   PlaEorm   15x  Machine  Data  by  2020   40  ZB  by  2020   DATA  SOURCES   Source: IDC                 Tradi/onal                                          New  Sources       YSTEMS   OLTP,  POS  S(RDBMS,  OLTP,  OLAP)                              (sen/ment,  click,  geo,  sensor,  …)   © Hortonworks Inc. 2013   Page 5
  • 6. APPLICATIONS   Modern Data Architecture Enabled Custom   Applica/ons   Business   Analy/cs   Packaged   Applica/ons   DEV  &  DATA   TOOLS   DATA  SOURCES   DATA  SYSTEMS   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS                 EDW   MANAGE  &   MONITOR   Discovery   PlaEorm     Tradi/onal                                          New  Sources     OLTP,  POS    (RDBMS,  OLTP,  OLAP)                              (sen/ment,  click,  geo,  sensor,  …)   SYSTEMS   © Hortonworks Inc. 2013   Page 6
  • 7. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 7
  • 8. What Data is Being Stored in Hadoop? 1.  Social Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geolocation Analyze location-based data to manage operations where they occur Value 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Unstructured (text, video, pictures, etc..) Understand patterns in text across millions of unstructured work products: web pages, emails, video, pictures and documents © Hortonworks Inc. 2013 Page 8
  • 9. Modern Data Architecture Applied DATA  SOURCES   Sources   DATA  Systems   Data   SYSTEMS   APPLICATIONS   Applica/ons   Shared Data Lake Packaged     Analy/c  App   RDBMS   EDW   Custom     Analy/c  App   Discovery   PlaEorm   Tradi/onal                                        New  Sources                    (RDBMS,  OLTP,  OLAP)                    (sen/ment,  click,  geo,  sensor,  …)   © Hortonworks Inc. 2013 •  Store all data and build/ enable applications on shared “data lake” •  As orgs mature they move to this as a goal for Hadoop Infrastructure  -­‐  Data  Lake   Modern  Data  Architecture   •  Delivers broad value across the enterprise Page 9
  • 10. Drivers for Hadoop Adoption Modern Data Architecture Hadoop has a central role in next generation data architectures while integrating with existing data systems Driving Efficiency Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge Driving Opportunity Big Data Sets Existing Traditional Server log Clickstream © Hortonworks Inc. 2013 Emerging Sentiment/Social Machine/Sensor Geo-locations Page 10
  • 11. 3 Requirements for Hadoop Adoption Requirements for Hadoop’s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Key Services Skills Platform, operational and data services essential for the enterprise Leverage your existing skills: development, operations, analytics © Hortonworks Inc. 2013 Page 11
  • 12. APPLICATIONS   Interoperating With Your Tools DATA  SOURCES   DATA  SYSTEMS   Microsoft Applications DEV  &  DATA   TOOLS   OPERATIONAL   TOOLS   Viewpoint   Tradi/onal                                          New  Sources                    (RDBMS,  OLTP,  OLAP)                              (sen/ment,  click,  geo,  sensor,  …)     © Hortonworks Inc. 2013 Page 12
  • 13. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 13
  • 14. Shift from a Single Platform to an Ecosystem "Logical" Data Warehouse “We will abandon the old models based on the desire to implement for high-value analytic applications.” “Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms, and NoSQL solutions beyond Hadoop.” Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012. 14 2/28/14 Teradata Confidential
  • 15. UNIFIED DATA ARCHITECTURE ERP MANAGE MOVE ACCESS Marketing Marketing Executives Applications Operational Systems SCM INTEGRATED DATA WAREHOUSE CRM Images DATA PLATFORM Business Intelligence Audio and Video Data Mining Frontline Workers Customers Partners Engineers Machine Logs DISCOVERY PLATFORM Math and Stats Data Scientists Text Languages Business Analysts Web and Social USERS SOURCES ANALYTIC TOOLS
  • 16. UNIFIED DATA ARCHITECTURE ERP MANAGE MOVE ACCESS Marketing Marketing Executives Applications Operational Systems SCM INTEGRATED DATA WAREHOUSE CRM Images DATA PLATFORM Business Intelligence Predictive Analytics Business Intelligence Operational Intelligence Audio and Video Machine Logs Text Fast Loading Data Mining Filtering and Processing Online Archival Customers Partners Engineers DISCOVERY PLATFORM Math and Stats Data Scientists Data Discovery Path, graph, time-series analysis Web and Social Frontline Workers Languages Business Analysts Pattern Detection USERS SOURCES ANALYTIC TOOLS
  • 17. TERADATA UNIFIED DATA ARCHITECTURE ERP MANAGE MOVE ACCESS Marketing Marketing Executives Applications Operational Systems SCM INTEGRATED DATA WAREHOUSE CRM Images DATA PLATFORM Business Intelligence Audio and Video Data Mining Frontline Workers Customers Partners Engineers Machine Logs DISCOVERY PLATFORM Math and Stats Data Scientists Text Languages Business Analysts Web and Social USERS SOURCES ANALYTIC TOOLS
  • 18. Teradata Appliance for Hadoop Value-Added Software Bringing Hadoop to Enterprise Access: SQL-H™, Teradata Studio Management: Viewpoint, TVI Administration: Hadoop Builder, Intelligent start/stop, DataNode swap, deferred drive replace High Availability : NameNode HA, Master Machine Failover Refining, Metadata, Entity Resolution HCatalog 18 2/28/14 Security & Data Access Kerberos Kerberos Teradata Confidential
  • 19. Modern Data Architecture Details TVI – Proactive system monitoring tied to Teradata customer support Alerts Viewpoint SOURCE DATA Sensor Log Data System Health Node Health Space Usage Capacity Heatmap Metrics Analysis Query/Visualization/ Reporting/Analytical Tools and Apps DB JDBC/ODBC Compliant Tool KNOX AMBARI File MAPREDUCE Customer/ Inventory Data JMS YARN Clickstream Data REST Flat Files Services HDFS LOAD SQOOP EXTRACT REFINE HIVE FLUME PIG NFS ETL Web HDFS HCATALOG (metadata services) CUSTOM HTTP Sentiment Analysis Streaming Data STRUCTURING © Hortonworks Inc. 2013 EXPORT INTERACTIVE Teradata SQL-H Analytical Platforms Aster Discovery Platform LOAD Teradata IDW SQOOP / HIVE TDCH Page 19
  • 20. Teradata Vital Infrastructure (TVI) PROACTIVE RELIABILITY, AVAILABILITY, AND MANAGEABILITY 1U server virtualizes system and cabinet management software Server Management VMS •  Cabinet Management Interface Controller (CMIC) •  Service Work Station (SWS) •  Automatically installed on base/first cabinet VMS allows full rack solutions without additional cabinet for traditional SWS Eliminates need for expansion racks, reducing customers’ floor space and energy costs Supports Teradata hardware and Hadoop software TVI Support for Hadoop 62–70% of Incidents Discovered through TVI 20 2/28/14 Teradata Confidential
  • 21. Standard SQL Access to Hadoop Data Give business users on-the-fly access to data in Hadoop Teradata SQL-H Aster SQL-H Hadoop MR •  Fast: Queries run on Teradata or Aster, data accessed from Hadoop Data •  Standard: 100% ANSI SQL access to Hadoop data Data Filtering •  Trusted: Use existing tools/skills and enable self-service BI with granular security HCatalog Hive Pig •  Efficient: Intelligent data access leveraging the Hadoop HCatalog 21 2/28/14 Teradata Confidential Hadoop Layer: HDFS
  • 22. Teradata Unified Data Architecture™ Partners Support Many Layers 22 2/28/14 Teradata Confidential
  • 23. Teradata Aster Discovery Portfolio: Accelerate Time to Insights Some of the 80+ out-of-the-box analytical apps PATH ANALYSIS TEXT ANALYSIS Discover Patterns in Rows of Sequential Data Derive Patterns and Extract Features in Textual Data STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations SEGMENTATION MARKETING ANALYTICS DATA TRANSFORMATION Analyze Customer Interactions to Optimize Marketing Decisions Graph Analysis Graph analytics processing and visualization 23 2/28/14 Teradata Confidential Discover Natural Groupings of Data Points Transform Data for More Advanced Analysis SQL-MapReduce Visualization Graphing and visualization tools linked to key functions of the MapReduce analytics library
  • 24. More Accurate Customer Churn Prevention Hadoop captures, stores and transforms social, images and call records SOCIAL FEEDS Multi-Structured Raw Data Call Center Voice Records CLICKSTREAM DATA Call Data Hadoop Sentiment Scores Aster Discovery Platform Data Sources ETL Tools 24 2/28/14 Teradata Confidential Analytic Results Traditional Data Flow Capture, Retain and Refine Layer Dimensional Data Check Data eMail Aster does path and pattern analysis Teradata Integrated DW Analysis + Marketing Automation (Customer Retention Campaign)
  • 25. MPP RDBMS + Hadoop Customer Successes 25 2/28/14 Teradata Confidential
  • 26. Key Considerations For EDW and Hadoop MPP RDBMS Hadoop Stable Schema Evolving Schema Leverages Structured Data Structure Agnostic ANSI SQL Flexible Programming Iterative Analysis Batch Analysis Fine Grain Security N/A Cleansed Data Raw Data Seeks Scans Updates/Deletes Ingest Service Level Agreements Flexibility Core Data All Data Complex Joins Efficient Use of CPU/IO 26 Complex Processing Low Cost of Storage 2/28/14 Teradata Confidential
  • 27. Complete Consulting and Training Services Teradata Analytic Architecture Services Services to scope, design, build, operate and maintain an optimal UDA approach for Teradata, Aster, and Hadoop Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques, determine best platform, optimize load scripts/processes Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create conceptual architecture, refine and enrich the data, implement initial analytics in Aster or best-fit tool Teradata Workshop for Hadoop Introduction workshop (across all of UDA) Teradata Data Staging for Hadoop Load data into landing-area; set-up data exploration/refining area; Scope architecture and analytics; set-up Hadoop repository; Load sample data Teradata Platform for Hadoop Installation guidance and mentoring for Hadoop platform, D-I-Y after installation Teradata Managed Services for Hadoop Operations, management, administration, backup, security, process control for Hadoop Teradata Training Courses for Hadoop 27 Areas of Focus Two comprehensive, multi-day training offerings: 1) Administration of Apache Hadoop and 2) Developing Solutions Using Apache Hadoop 2/28/14 Teradata Confidential
  • 28. Discovering Deep Insights in Retail Transforming Web Walks into DNA Sequences Impact Situation Large retailer with 700M visits/ year, 2M customers / day look at 1M products online Problem Increase ability of web content owners to self-serve insights Solution Treat web walks like DNA sequences of simple patterns. 28 2/28/14 •  Data: loaded logs into Hortonworks •  Loaded 2 months of raw data in 1 hour, vs. 1 day on old system •  Can load a day’s log data in 60 sec •  Sessionize: Creates sequence for visit, e.g., boils 20 customer clicks down to 1 line: •  <Home –Search -Look at Product Add to Basket – Pay – Exit> •  Analyze: Business analysts can now do path analysis •  Act: •  Segmentations by behavior can increase conversion rates by 5-10%. •  Web design changes can drive another 10-20% more visitors into the sales funnel Teradata Confidential

Editor's Notes

  1. IDC study:http://cdn.idc.com/research/Predictions12/Main/downloads/IDCTOP10Predictions2012.pdfIDC projects that the digital universe will reach 40 zettabytes (ZB) by 2020, resulting in a 50-fold growth from the beginning of 2010According to the study, 2.8ZB of data will have been created and replicated in 2012.Machine-generated data is a key driver in the growth of the world’s data – which is projected to increase 15x by 2020.Report| McKinsey Global Institutehttp://www.mckinsey.com/insights/americas/us_game_changersGame changers: Five opportunities for US growth and renewalJuly 2013 | by Susan Lund, James Manyika, Scott Nyquist, Lenny Mendonca, and SreenivasRamaswamy“By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.”– Gartner, Mark Beyer, “Information Management in the 21st Century”By 2015, Gartner believes 65 percent of prepackaged analytic applications will have Hadoop already embedded.Gartner also sees a rising trend in “Hadoop-enabled database management systems” to help organizations deploy appliances and apps (virtual or physical) with Big Data capabilities baked-in.- http://channelnomics.com/2013/01/28/gartner-predicts-big-data-explosion/“Global data growth will outperform Moore’s law over the next few years.” – Forrester, http://blogs.forrester.com/holger_kisker/12-08-15-big_data_meets_cloud
  2. Let’s set some context before digging into the Modern Data Architecture.While overly simplistic, this graphic represents the traditional data architecture:- A set of data sources producing data- A set of data systems to capture and store that data: most typically a mix of RDBMS and data warehouses- A set of custom and packaged applications as well as business analytics that leverage the data stored in those data systems. Your environment is undoubtedly more complicated, but conceptually it is likely similar. This architecture is tuned to handle TRANSACTIONS and data that fits into a relational database.[CLICK] Fast-forward to recent years and this traditional architecture has become PRESSURED with New Sources of data that aren’t handled well by existing data systems. So in the world of Big Data, we’ve got classic TRANSACTIONS and New Sources of data that come from what I refer to as INTERACTIONS and OBSERVATIONS.INTERACTIONS come from such things as Web Logs, User Click Streams, Social Interactions &amp; Feeds, and User-Generated Content including video, audio, and images.OBSERVATIONS tend to come from the “Internet of Things”. Sensors for heat, motion, and pressure and RFID and GPS chips within such things as mobile devices, ATM machines, automobiles, and even farm tractors are just some of the “things” that output Observation data.
  3. As the volume of data has exploded, Enterprise Hadoop has emerged as a peer to traditional data systems. The momentum for Hadoop is NOT about revolutionary replacement of traditional databases. Rather it’s about adding a data system uniquely capable of handling big data problems at scale and doing so in a way that integrates easily with existing data systems, tools and approaches.This means it must interoperate with every layer of the stack:- Existing applications and BI tools- Existing databases and data warehouses for loading data to / from the data warehouse- Development tools used for building custom applications- Operational tools for managing and monitoringMainstream enterprises want to get the benefits of new technologies in ways that leverage existing skills and integrate with existing systems.
  4. It is for that reason that we focus on HDP interoperability across all of these categories:Data systemsHDP is endorsed and embedded with SQL Server, Teradata and moreBI tools: HDP is certified for use with the packaged applications you already use: from Microsoft, to Tableau, Microstrategy, Business Objects and moreWith Development tools: For .Net developers: Visual studio, used to build more than half the custom applications in the world, certifies with HDP to enable microsoft app developers to build custom apps with HadoopFor Java developers: Spring for Apache Hadoop enables Java developers to quickly and easily build Hadoop based applications with HDPOperational toolsIntegration with System Center, and with Teradata viewpoint
  5. Industry research shows the shift from a single system to an ecosystem where different technologies can unify and process data in the most efficient and specialized way to add the most valueGartner calls this movement the “logical data warehouse” which is being driven by a “desire for high-value analytics”EMA and 9sight research shows that, on average, most companies tackle big data with 3 systems including “analytic databases, discovery platforms, and NoSQL solutions” (more details below)===============When asked how many nodes (nodes refers to a separate system/DB in their architecture) were part of their Big Data initiatives, the EMA/9sight survey respondents indicated that a wide number of Hybrid Data Ecosystem nodes were part of their plans. The most common answer among the 255 respondents was a total of Three Hybrid Data Ecosystem nodes as part of the respondents’ Big Data Initiatives, showing that Big Data strategies are not limited to a single platform or solution. When the Two to Five Hybrid Data Ecosystem nodes indications are aggregated, over two thirds of respondents are included in this segment. This shows Big Data Initiatives are focused on more than just a single platform (e.g. Hadoop) augmentation of the core of operational platforms or the enterprise data warehouse. Rather, Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms and NoSQL solutions beyond Hadoop.
  6. This is an example of meaningful enterprise-level integration which minimizes data replication and increases analyst productivity. Closes gaps in Hadoop which will take them years and years to close. Leverage the scale and cost of Hadoop, but provide a proper SQL-compliant interface, performance, and higher analytic value with pre-built analytic functions that solve specific business problems like marketing attribution.
  7. We see common uses for Hadoop in capturing “dark data” such as email, call center IVR records, documents, and other “no schema” data which does not fit easily into a relational model without pre-processing. Hadoop provides a landing/staging/refining area to munge this data and make it available to join with other data. In some cases, the text can be parsed and “scored” for sentiment as a one-time batch job when interactivity isn’t required.
  8. From http://www.odbms.org/blog/2011/10/analytics-at-ebay-an-interview-with-tom-fastner/Ebay is rapidly changing, and analytics is driving many key initiatives like buyer experience, search optimization, buyer protection or mobile commerce. We are investing heavily in new technologies and approaches to leverage new data sources to drive innovation.We have 3 different platforms for Analytics:A) EDW: Dual systems for transactional (structured) data; Teradata 3.5PB and 2.5 PB spinning disk; 10+ years experience; very high concurrency; good accessibility; hundreds of applications.B) Singularity: deep Teradata system for semi-structured data; 36 PB spinning disk; lower concurrency that EDW, but can store more data; biggest use case is User Behavior Analysis; largest table is 1.2 PB with ~1.9 Trillion rows.C) Hadoop: for unstructured/complex data; ~40 PB spinning disk; text analytics, machine learning; has the User Behavior data and selected EDW tables; lower concurrency and utilization.When dealing with terabytes to petabytes of data, how do you ensure scalability and performance?Tom Fastner: EDW: We model for the unknown (close to 3rd NF) to provide a solid physical data model suitable for many applications, that limits the number of physical copies needed to satisfy specific application requirements. A lot of scalability and performance is built into the database, but as any shared resource it does require an excellent operations team to fully leverage the capabilities of the platformSingularity: The platform is identical to EDW, the only exception are limitations in the workload management due to configuration choices. But since we are leveraging the latest database release we are exploring ways to adopt new storage and processing patterns. Some new data sources are stored in a denormalized form significantly simplifying data modeling and ETL. On top we developed functions to support the analysis of the semi-structured data. It also enables more sophisticated algorithms that would be very hard, inefficient or impossible to implement with pure SQL. One example is the pathing of user sessions. However the size of the data requires us to focus more on best practices (develop on small subsets, use 1% sample; process by day),Hadoop: The emphasis on Hadoop is on optimizing for access. The reusability of data structures (besides “raw” data) is very low.Un-structured data is handled on Hadoop only. The data is copied from the source systems into HDFS for further processing. We do not store any of that on the Singularity (Teradata) system.