SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Senior Product Specialist
Big Data and Hadoop
The Challenge
Data fragmentation becomes the
barrier to business success
10 2
MAINFRAME
CLIENT-SERVER
WEB
SOCIAL
INTERNET
OF THINGS
CLOUD
Few
Employees
Many
Employees
Customers/
Consumers
Business
Ecosystems
Communities
& Society
Devices
& Machines
10 4
10 6
10 7
10 9
10 11
Front Office
ProductivityBack Office
Automation
E-Commerce
Line-of-Business
Self-Service
Social
Engagement
Real-Time
Optimization
1960s-1970s
1980s
1990s
2011
2014
2007
OS/360
TECHNOLOGY
USERS
VALUE
TECHNOLOGIES
SOURCES
BUSINESS
Data
Mart
Data
MartData
Mart
Data
Mart
Data
Mart
Data
Mart
Data
Mart
Data
Mart
Data
Mart
Batch ETL
Big Data Challenges
Volume, Variety, Velocity, Veracity
Where is
the data I
need?
Can I trust
this data?
Transactions,
OLTP, OLAP
Enterprise Data
Warehouse
Social Media, Web Logs
Machine Device,
Scientific
Documents and Emails
Source Data Analytic Systems
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
80% of the work in big data projects
is data integration and quality
“I spend more than half my time
integrating, cleansing, and
transforming data without doing
any actual analysis.”
“80% of the work in any data
project is in cleaning the data”
“70% of my value is an ability
to pull the data, 20% of my
value is using data-science…”
Why Informatica for Big Data & Hadoop
PowerCenter Big Data Edition
Big Transaction Data Big Interaction Data
Online Transaction
Processing (OLTP)
Oracle
DB2
Ingres
Informix
Sysbase
SQL Server
…
Cloud
Salesforce.com
Concur
Google App Engine
Amazon
…
Other Interaction Data
Clickstream
image/Text
Scientific
Genomoic/pharma
Medical
Medical/Device
Sensors/meters
RFID tags
CDR/mobile
…
Social Media & Web Data
Facebook
Twitter
Linkedin
Youtube
…
Big Data Processing
Online Analytical
Processing (OLAP) &
DW Appliances
Teradata
Redbrick
EssBase
Sybase IQ
Netezza
Exadata
HANA
Greenplum
DataAllegro
Asterdata
Vertica
Paraccel …
Web applications
Blogs
Discussion forums
Communities
Partner portals
…
Universal Data Access
High-Speed Data
Ingestion and
Extraction
ETL on Hadoop
Profiling on Hadoop
Complex Data
Parsing on Hadoop
Entity Extraction and
Data Classification on
Hadoop
No-Code Productivity
Business-IT
Collaboration
Unified Administration
the VibeTM virtual
data machine
9.6
Get Data Into and Out of
Hadoop
PowerExchange for Hadoop
Replication to Hadoop
Streaming to Hadoop
Data Archiving to Hadoop
Data
Warehouse
MDM
Applications
Data Ingestion and Extraction
Moving terabytes of data per hour
Replicate
Streaming
Batch Load
Extract
Archive Extract Low
Cost
Store
Transactions,
OLTP, OLAP
Social Media,
Web Logs
Documents,
Email
Industry
Standards
Machine Device,
Scientific
PowerExchange Connectors
Enterprise
Applications,
Software as a
Service (SaaS)
JDE EnterpriseOne
JDE World
Lotus Notes
Oracle E-Business Suite ✔
PeopleSoft Enterprise
Salesforce (salesforce.com) ✔
SAP NetWeaver ✔
SAP NetWeaver BI ✔
SAS
Siebel
Netsuite
Microsoft Dynamics
Databases and
Data
Warehouses
Adabas for UNIX, Windows
C-ISAM
DB2 for LUW ✔
Essbase
EMC/Greenplum
Informix Dynamic Server
Netezza Performance Server
ODBC
Oracle ✔
SQL Server ✔
Sybase
Teradata
Messaging
Systems
JMS ✔
MSMQ ✔
TIBCO ✔
webMethods Broker ✔
WebSphere MQ ✔
Technology
Standards
Email (POP, IMAP)
HTTP(S) ✔
LDAP ✔
Web Services ✔
XML
Mainframe
Adabas for z/OS ✔
Datacom ✔
DB2 for z/OS, z/Linux✔
IDMS ✔
IMS DB ✔
Oracle for z/Linux ✔
Teradata
WebSphere MQ for z/Linux ✔
VSAM ✔
Big Data
Asterdata,
Greenplum
Vertica
ParAccel
Microsoft PDW
Kognitio
Social Facebook, Twitter, LinkedIn DataSift, Kapow MongoDB
Hadoop HDFS HIVE HBASE
- Accessible in Real-time and/or via Change Data Capture (CDC)
NoSQL Support for HBase
11
Read
from HBase as
standard source
Write
to HBase as
standard target
Complete Mapping with
HBase Src/Tgt can
execute on hadoop
Sample HBase column
families
(Stored in JSON/complex
formats)
NoSQL Support for MongoDB
Access, integrate,
transform & ingest
MongoDB data into
other analytic
systems (e.g.
Hadoop, data
warehouse)
Access, integrate,
transform, & ingest
data into MongoDB
Sampling
MongoDB data &
flattening it to
relational format
IDR for Replicating to Hadoop
Supported
Distributions
•  Apache
•  0.20.203.x
•  0.20.204.x
•  0.20.205.x
•  0.23.x
•  1.0.x
•  1.1.x
•  2.x.x
•  Cloudera
•  CDH3
•  CDH4
EXTRACT APPLY
Source System Intermediate Files
Cycle_1.work directory
HDFS
Table 1 File
Table 2 File
…
Table N File
Schema.ini File
Real-Time Data Collection and Streaming
14
UltraMessagingBus
Publish/Subscribe
Leverage High Performance Messaging
Infrastructure Publish with Ultra
Messaging for global distribution without
additional staging or landing.
HDFS, HBase,
Targets
Web Servers,
Operations
Monitors, rsyslog,
SLF4J, etc.
Handhelds, Smart
Meters, etc.
Discrete Data
Messages
Sources
Zookeeper
Management
and Monitoring
Internet of Things,
Sensor Data
Real Time
Analysis, Complex
Event Processing
No SQL
Databases:
Cassandara, Riak,
MongoDB
Node
Node
Node
Node
Node
Node
Informatica Vibe Data Stream for Machine Data
15
•  High performance/efficient
streaming data collection over
LAN/WAN
•  GUI interface provides ease of
configuration, deployment & use
•  Continuous ingestion of real-time
generated data (sensors; logs;
etc.). Machine generated & other
data sources
•  Enable real-time interactions &
response
•  Real-time delivery directly to
multiple targets (batch/stream
processing)
•  Highly available; efficient;
scalable
•  Available ecosystem of light
weight agents (sources & targets)
Predictive Maintenance
with Event Processing and Analytics
United Technologies Aerospace Systems (UTAS)
provides engines and aircraft components to
leading commercial and defense manufacturers,
including the new Airbus A380 and Boeing B787.
The challenge:
•  5,000+ aircraft in service plus new design wins exponentially
increases the amount of sensor data being generated
•  “Power by the Hour” leasing model means the maintenance cost and
service outages falls to UTAS
•  No proactive capability to predict when a safety issue might occur
•  Once-per-day sensor readings moving to real-time, over-the-air
Archive to Hadoop
Compression Extends Hadoop Cluster Capacity
Without INFA Optimized
Archive Compression
With INFA Optimized
Archive 95% Compression
10 TB 10 TB 10 TB
10 TB replicated 3X = 30TB 10 TB compressed 95% = 500GB
Replicated 3X = 1.5 TB
20X less I/O bandwidth required
20 min vs. 1 min response time
8 hours vs. 24 mins backup window
500 GB 500 GB 500 GB
Parse and Prepare Data On
Hadoop
hParser and XMap
4. The DT engine can immediately
use this service to process data.
The DT Engine is fully
embeddable and can be invoked
using any of the supported APIs.
Java, C++, C, .NET, web services
For simple integration, a command
line interface is available to invoke
services.
Internal custom applications can
embed transformation services
using the various APIs.
PowerCenter leverages DT via the
Unstructured Data Transformation
(UDT).
This is a GUI transformation
widget in Powercenter which
wraps around the DT API and
engine.
DT can also be embedded in other middleware
technologies.
For some (WBIMB, WebMethods, BizTalk) INFA
provides similar GUI widgets (agents) for the
respective design environments.
For others the API layer can be used directly.
DT can be invoked in two general ways:
1.  Filenames can be passed to it, and DT will
directly open the file(s) for processing.
On the output side, DT can also directly
write to the filesystem.
2.  The calling application can buffer the data and send
buffers to DT for processing.
On the output side, DT can also write back to memory
buffers which are returned to the calling application.
Though not shown below, the engine fully supports multiple input
and output files or buffers as needed by the transformation.
Engine invocation is a shared library. The DT engine runs
fully within the process of the calling application.
It is not an external engine. This removes any overhead
from passing data between processes, across the network,
etc. The engine is also dynamically invoked and does not
need to be ‘started up’ or maintained externally.
The DT engine is also thread-safe and re-entrant.
This allows the calling application to invoke DT in multiple
threads to increase throughput.
A good example is DT’s support of PowerCenter partitioning
to scale up processing.
As shown below, the actual transformation logic is
completely independent of any calling application.
This means you can develop a transformation once, and
leverage it in multiple environments simultaneously resulting
in reduced development and maintenance times and lower
impact of change.
1. Developer uses Studio to
develop a transformation
2. Developer deploys transformation
to local service repository (directory).
All files needed for the transformation
are moved.
3. To deploy to the server, this service
folder is moved to the server via FTP,
copy, script, etc.
NOTE: If the server file system is mountable from
the developer machine directly, then step 2
would deploy directly to the server.
Parse and Prepare Data on Hadoop
S
Svc Repository
S
Flat Files &
Documents
Interaction dataIndustry StandardsXML
The broadest coverage for Big Data
social
Device/sensor
scientific
Productivity
•  Visual
parsing
environment
•  Predefined
translations
Any DI/BI architecture
PIG EDW
MDM
Example use cases
Call Detail record
•  Why Hadoop?
•  CDR – Large data sets every 7 seconds every mobile phone in
the region create a record
•  Desire to analyze behavior, location to personalize and
optimize pricing and ,marketing
hadoop … dt-hadoop.jar
… My_Parser /input/*/input*.txt
1.  Define parser in HParser
visual studio
2.  Deploy the parser on
Hadoop Distributed File
System (HDFS)
3.  Run HParser to extract
data and produce tabular
format in Hadoop
Parse and Prepare Data on Hadoop
How does it work?
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Profiling and Discovering
Data
Informatica Profiling & Data Discovery on
Hadoop
CUSTOMER_ID example COUNTRY CODE example
3. Drilldown Analysis (into Hadoop Data)
2. Value &
Pattern
Analysis of
Hadoop Data
1. Profiling Stats:
Min/Max Values, NULLs,
Inferred Data Types, etc.
Drill down into actual
data values to inspect
results across entire
data set, including
potential duplicates
Value and Pattern
Frequency to isolated
inconsistent/dirty data or
unexpected patterns
Hadoop Data Profiling
results – exposed to
anyone in enterprise
via browser
Stats to identify
outliers and
anomalies in data
Hadoop Data Profiling Results
Hadoop Data Domain Discovery
Finding functional meaning of Data in Hadoop
Leverage INFA rules/mapplets to
identify functional meaning of
Hadoop data
Sensitive data
(e.g. SSN, Credit Card number, etc.)
PHI: Protected Health Information
PII: Personally Identifiable Information
Scalable to look for/discover ANY Domain type
View/share report of data domains/
sensitive data contained in Hadoop.
Ability to drill down to see suspect
data values.
Transforming and Cleansing
Data
PowerCenter on Hadoop
Data Quality on Hadoop
No-code visual
development
environment
Preview results at
any point in the
data flow
PowerCenter developers are now Hadoop developers
Reuse and Import PC Metadata for Hadoop
Import existing
PC artifacts into
Hadoop
development
environment
Validate import
logic before the
actual import
process to ensure
compatibility
Natural Language Processing
Entity Extraction & Data Classification
Train NLP to find and
classify entities in
unstructured data
Address Validation & Data Cleansing
Configure Mapping for Hadoop Execution
No need to redesign
mapping logic to
execute on either
Traditional or Hadoop
infrastructure.
Configure where the
integration logic should
run – Hadoop or Native
SELECT
T1.ORDERKEY1 AS ORDERKEY2, T1.li_count, orders.O_CUSTKEY AS CUSTKEY, customer.C_NAME,
customer.C_NATIONKEY, nation.N_NAME, nation.N_REGIONKEY
FROM
(
SELECT TRANSFORM (L_Orderkey.id) USING CustomInfaTx
FROM lineitem
GROUP BY L_ORDERKEY
) T1
JOIN orders ON (customer.C_ORDERKEY = orders.O_ORDERKEY)
JOIN customer ON (orders.O_CUSTKEY = customer.C_CUSTKEY)
JOIN nation ON (customer.C_NATIONKEY = nation.N_NATIONKEY)
WHERE nation.N_NAME = 'UNITED STATES'
) T2
INSERT OVERWRITE TABLE TARGET1 SELECT *
INSERT OVERWRITE TABLE TARGET2 SELECT CUSTKEY,
count(ORDERKEY2) GROUP BY CUSTKEY;
Data Integration & Quality on Hadoop
Hive-QL
1.  Entire Informatica mapping
translated to Hive Query Language
2.  Optimized HQL converted to
MapReduce & submitted to Hadoop
cluster (job tracker).
3.  Advanced mapping transformations
executed on Hadoop through User
Defined Functions using Vibe
MapReduce
UDF
Example Mapping Execution
Source External
Flat File
Source External
Relational Data
Engine Repository
Source HDFS
File
Cluster of Linux Machines
Mapping logic
translated to HQL
and submitted
to Hadoop Cluster
Relational Data
streamed to
Hadoop for
processing
Target HDFS
FIle
Local flat file
staged
temporarily
on HDFS
Read HDFS
file data
Final
processed data
loaded into
HDFS file
Temp Staged
Lookup File
Orchestrating and
Monitoring Hadoop
Informatica Workflow &
Monitoring for Hadoop
Metadata Manager for Hadoop
Dynamic Data Masking for Hadoop
Mixed Workflow Orchestration
One workflow running tasks on hadoop and local environments
Cmd_Choose
LoadPath
MT_Load2Hadoop
+ Parse
Cmd_Load2
Hadoop
MT_Parse
Cmd_ProfileData MT_Cleanse
MT_Data
Analysis
Notification
Name Type Default Value Description
$User.LoadOptionPath Integer 2 Load path for workflow, depending on output of cmd task
$User.DataSourceConnection String HiveSourceConnection Source connection object
$User.ProfileResult Integer 100 Output from “profiling” commnad task.
Add
Edit
Remove
List of variables:
Informatica Corporation Confidential
Do Not Distribute.
Full traceability from workflow
to MapReduce jobs
View generated
Hive scripts
Unified Administration
Single Place to Manage & Monitor
Data Lineage and Business Glossary
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Hadoop Architecture Overview
•  PowerCenter on Hadoop
•  Data Quality on Hadoop
•  DT on Hadoop
•  Entity Extraction on Hadoop
•  Profiling on Hadoop
Execution on
HadoopPWX
for
HDFS
PWX
for
HDFS
PWX
for
Hive
MYSQL
Mercury Services
Hive
Client
HDFS
Infa-Lib
DataNode1
HParser
Map Reduce
RDBMS
Clients
PWXfor
Mercury
Transactions,
OLTP, OLAP
Documents,
Email
Social Media,
Web Logs
Machine Device,
Scientific
PowerCenter SE
Enterprise Grid
PowerCenter Services
PWXfor
PC
HDFS
Infa-Lib
HParser
Map Reduce
RDBMS
Clients
HDFS
Infa-Lib
HParser
Map Reduce
RDBMS
Clients
HDFS
Infa-Lib
HParser
Map Reduce
RDBMS
Clients
Hive
DataNode2 DataNode3NameNode Job Tracker
INFA Clients
40

Contenu connexe

Tendances

Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the BusinessDataWorks Summit
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInHari Shankar Sreekumar
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010Cloudera, Inc.
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 

Tendances (20)

Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Delivering Data Science to the Business
Delivering Data Science to the BusinessDelivering Data Science to the Business
Delivering Data Science to the Business
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedIn
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 

En vedette

Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)Amazon Web Services
 
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Etu Solution
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pubChao Zhu
 
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息Etu Solution
 
Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值Etu Solution
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution
 
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)Kuo-Chun Su
 
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Etu Solution
 
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)
Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)Joseph Kuo
 
新媒體-迷思解構 New Media - Myths Decoded
新媒體-迷思解構 New Media - Myths Decoded新媒體-迷思解構 New Media - Myths Decoded
新媒體-迷思解構 New Media - Myths DecodedCalvin C. Yu
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning IntroductionMark Chang
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)Amazon Web Services
 
New UI for Cost Center Planning
New UI for Cost Center PlanningNew UI for Cost Center Planning
New UI for Cost Center Planningtasmc
 
Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們Fred Chiang
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsHong-Linh Truong
 

En vedette (20)

Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
 
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享
 
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
Big Data Taiwan 2014 Track2-3: QlikView 與 Big Data ─ 從 Big Data 裡獲取重要信息
 
Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值Track C-2 洞見未來 - Tableau 創造大數據新價值
Track C-2 洞見未來 - Tableau 創造大數據新價值
 
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
Track A-1: Cloudera 大數據產品和技術最前沿資訊報告
 
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
Hadoop, the Apple of Our Eyes (這些年,我們一起追的 Hadoop)
 
推動數位革命
推動數位革命推動數位革命
推動數位革命
 
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
Big Data Taiwan 2014 Track2-1: SAP 善用足跡,預測未來 - 全方位的行銷視野
 
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)
Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)Establish The Core of  Cloud Computing Application  by Using Hazelcast (Chinese)
Establish The Core of Cloud Computing Application by Using Hazelcast (Chinese)
 
新媒體-迷思解構 New Media - Myths Decoded
新媒體-迷思解構 New Media - Myths Decoded新媒體-迷思解構 New Media - Myths Decoded
新媒體-迷思解構 New Media - Myths Decoded
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
數位媒體雲端儲存案例和技術分享 (AWS Storage Options for Media Industry)
 
New UI for Cost Center Planning
New UI for Cost Center PlanningNew UI for Cost Center Planning
New UI for Cost Center Planning
 
Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們Big Data 現象,以及現象中的我們
Big Data 現象,以及現象中的我們
 
大數據的基本概念(上)
大數據的基本概念(上)大數據的基本概念(上)
大數據的基本概念(上)
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data Analytics
 

Similaire à Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution

Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Use Of Database Management Systems To Meet Business Needs.
Use Of Database Management Systems To Meet Business Needs.Use Of Database Management Systems To Meet Business Needs.
Use Of Database Management Systems To Meet Business Needs.Jill Turner
 
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataCloudera, Inc.
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshareJulianna DeLua
 
WhatIsData-Blitz
WhatIsData-BlitzWhatIsData-Blitz
WhatIsData-Blitzpharvener
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big DataStratebi
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 

Similaire à Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution (20)

Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Use Of Database Management Systems To Meet Business Needs.
Use Of Database Management Systems To Meet Business Needs.Use Of Database Management Systems To Meet Business Needs.
Use Of Database Management Systems To Meet Business Needs.
 
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTDataHadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
Hadoop World 2011: Hadoop’s Life in Enterprise Systems - Y Masatani, NTTData
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare
 
WhatIsData-Blitz
WhatIsData-BlitzWhatIsData-Blitz
WhatIsData-Blitz
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big Data
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 

Plus de Etu Solution

終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現Etu Solution
 
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界Etu Solution
 
猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路Etu Solution
 
投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰Etu Solution
 
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡Etu Solution
 
啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道Etu Solution
 
Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策Etu Solution
 
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷Etu Solution
 
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構Etu Solution
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力Etu Solution
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Etu Solution
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Etu Solution
 
Opening: Big Data+
Opening: Big Data+Opening: Big Data+
Opening: Big Data+Etu Solution
 
數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術Etu Solution
 
Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享Etu Solution
 
打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」Etu Solution
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景Etu Solution
 
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案Etu Solution
 
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Etu Solution
 

Plus de Etu Solution (20)

終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現
 
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
 
猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路猜你喜歡:虛實並進,贏在全通路
猜你喜歡:虛實並進,贏在全通路
 
投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰投客所好:互聯內外,啟動投信藍海數據戰
投客所好:互聯內外,啟動投信藍海數據戰
 
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
致詞歡迎:Big Data 無所不在,Data Technology 無 C 不歡
 
啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道啟程:Data Technology 的待客之道
啟程:Data Technology 的待客之道
 
Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策Track C-1 大數據時代的產品 ─ 創新與洞察決策
Track C-1 大數據時代的產品 ─ 創新與洞察決策
 
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
Track C-3 Let's Play Marketing - 瘋創意 玩推薦 就該這樣搞行銷
 
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
Track A-3 Enterprise Data Lake in Action - 搭建「活」的企業 Big Data 生態架構
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力Data without Boundaries - 圍繞第一方數據,找到商業驅動力
Data without Boundaries - 圍繞第一方數據,找到商業驅動力
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
 
Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動Data Leaders in Action - 資料價值領袖風範與關鍵行動
Data Leaders in Action - 資料價值領袖風範與關鍵行動
 
Opening: Big Data+
Opening: Big Data+Opening: Big Data+
Opening: Big Data+
 
數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術數位媒體的客戶洞察行銷術
數位媒體的客戶洞察行銷術
 
Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享Hadoop Big Data 成功案例分享
Hadoop Big Data 成功案例分享
 
打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」打造一個讓企業賣更多的「氣象大數據平台服務」
打造一個讓企業賣更多的「氣象大數據平台服務」
 
那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景那些你知道的,但還沒看過的 Big Data 風景
那些你知道的,但還沒看過的 Big Data 風景
 
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
Big Data Taiwan 2014 Track1-1: 群體智慧‧想像無限 ─ 精準推薦解決方案
 
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
Big Data Taiwan 2014 Keynote 4: Monetize Enterprise Data – Big Data 在台灣的經典應用與行動
 

Dernier

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 

Dernier (20)

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 

Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution

  • 2. The Challenge Data fragmentation becomes the barrier to business success 10 2 MAINFRAME CLIENT-SERVER WEB SOCIAL INTERNET OF THINGS CLOUD Few Employees Many Employees Customers/ Consumers Business Ecosystems Communities & Society Devices & Machines 10 4 10 6 10 7 10 9 10 11 Front Office ProductivityBack Office Automation E-Commerce Line-of-Business Self-Service Social Engagement Real-Time Optimization 1960s-1970s 1980s 1990s 2011 2014 2007 OS/360 TECHNOLOGY USERS VALUE TECHNOLOGIES SOURCES BUSINESS
  • 3. Data Mart Data MartData Mart Data Mart Data Mart Data Mart Data Mart Data Mart Data Mart Batch ETL Big Data Challenges Volume, Variety, Velocity, Veracity Where is the data I need? Can I trust this data? Transactions, OLTP, OLAP Enterprise Data Warehouse Social Media, Web Logs Machine Device, Scientific Documents and Emails Source Data Analytic Systems
  • 5. 80% of the work in big data projects is data integration and quality “I spend more than half my time integrating, cleansing, and transforming data without doing any actual analysis.” “80% of the work in any data project is in cleaning the data” “70% of my value is an ability to pull the data, 20% of my value is using data-science…”
  • 6. Why Informatica for Big Data & Hadoop
  • 7. PowerCenter Big Data Edition Big Transaction Data Big Interaction Data Online Transaction Processing (OLTP) Oracle DB2 Ingres Informix Sysbase SQL Server … Cloud Salesforce.com Concur Google App Engine Amazon … Other Interaction Data Clickstream image/Text Scientific Genomoic/pharma Medical Medical/Device Sensors/meters RFID tags CDR/mobile … Social Media & Web Data Facebook Twitter Linkedin Youtube … Big Data Processing Online Analytical Processing (OLAP) & DW Appliances Teradata Redbrick EssBase Sybase IQ Netezza Exadata HANA Greenplum DataAllegro Asterdata Vertica Paraccel … Web applications Blogs Discussion forums Communities Partner portals … Universal Data Access High-Speed Data Ingestion and Extraction ETL on Hadoop Profiling on Hadoop Complex Data Parsing on Hadoop Entity Extraction and Data Classification on Hadoop No-Code Productivity Business-IT Collaboration Unified Administration the VibeTM virtual data machine 9.6
  • 8. Get Data Into and Out of Hadoop PowerExchange for Hadoop Replication to Hadoop Streaming to Hadoop Data Archiving to Hadoop
  • 9. Data Warehouse MDM Applications Data Ingestion and Extraction Moving terabytes of data per hour Replicate Streaming Batch Load Extract Archive Extract Low Cost Store Transactions, OLTP, OLAP Social Media, Web Logs Documents, Email Industry Standards Machine Device, Scientific
  • 10. PowerExchange Connectors Enterprise Applications, Software as a Service (SaaS) JDE EnterpriseOne JDE World Lotus Notes Oracle E-Business Suite ✔ PeopleSoft Enterprise Salesforce (salesforce.com) ✔ SAP NetWeaver ✔ SAP NetWeaver BI ✔ SAS Siebel Netsuite Microsoft Dynamics Databases and Data Warehouses Adabas for UNIX, Windows C-ISAM DB2 for LUW ✔ Essbase EMC/Greenplum Informix Dynamic Server Netezza Performance Server ODBC Oracle ✔ SQL Server ✔ Sybase Teradata Messaging Systems JMS ✔ MSMQ ✔ TIBCO ✔ webMethods Broker ✔ WebSphere MQ ✔ Technology Standards Email (POP, IMAP) HTTP(S) ✔ LDAP ✔ Web Services ✔ XML Mainframe Adabas for z/OS ✔ Datacom ✔ DB2 for z/OS, z/Linux✔ IDMS ✔ IMS DB ✔ Oracle for z/Linux ✔ Teradata WebSphere MQ for z/Linux ✔ VSAM ✔ Big Data Asterdata, Greenplum Vertica ParAccel Microsoft PDW Kognitio Social Facebook, Twitter, LinkedIn DataSift, Kapow MongoDB Hadoop HDFS HIVE HBASE - Accessible in Real-time and/or via Change Data Capture (CDC)
  • 11. NoSQL Support for HBase 11 Read from HBase as standard source Write to HBase as standard target Complete Mapping with HBase Src/Tgt can execute on hadoop Sample HBase column families (Stored in JSON/complex formats)
  • 12. NoSQL Support for MongoDB Access, integrate, transform & ingest MongoDB data into other analytic systems (e.g. Hadoop, data warehouse) Access, integrate, transform, & ingest data into MongoDB Sampling MongoDB data & flattening it to relational format
  • 13. IDR for Replicating to Hadoop Supported Distributions •  Apache •  0.20.203.x •  0.20.204.x •  0.20.205.x •  0.23.x •  1.0.x •  1.1.x •  2.x.x •  Cloudera •  CDH3 •  CDH4 EXTRACT APPLY Source System Intermediate Files Cycle_1.work directory HDFS Table 1 File Table 2 File … Table N File Schema.ini File
  • 14. Real-Time Data Collection and Streaming 14 UltraMessagingBus Publish/Subscribe Leverage High Performance Messaging Infrastructure Publish with Ultra Messaging for global distribution without additional staging or landing. HDFS, HBase, Targets Web Servers, Operations Monitors, rsyslog, SLF4J, etc. Handhelds, Smart Meters, etc. Discrete Data Messages Sources Zookeeper Management and Monitoring Internet of Things, Sensor Data Real Time Analysis, Complex Event Processing No SQL Databases: Cassandara, Riak, MongoDB Node Node Node Node Node Node
  • 15. Informatica Vibe Data Stream for Machine Data 15 •  High performance/efficient streaming data collection over LAN/WAN •  GUI interface provides ease of configuration, deployment & use •  Continuous ingestion of real-time generated data (sensors; logs; etc.). Machine generated & other data sources •  Enable real-time interactions & response •  Real-time delivery directly to multiple targets (batch/stream processing) •  Highly available; efficient; scalable •  Available ecosystem of light weight agents (sources & targets)
  • 16. Predictive Maintenance with Event Processing and Analytics United Technologies Aerospace Systems (UTAS) provides engines and aircraft components to leading commercial and defense manufacturers, including the new Airbus A380 and Boeing B787. The challenge: •  5,000+ aircraft in service plus new design wins exponentially increases the amount of sensor data being generated •  “Power by the Hour” leasing model means the maintenance cost and service outages falls to UTAS •  No proactive capability to predict when a safety issue might occur •  Once-per-day sensor readings moving to real-time, over-the-air
  • 17. Archive to Hadoop Compression Extends Hadoop Cluster Capacity Without INFA Optimized Archive Compression With INFA Optimized Archive 95% Compression 10 TB 10 TB 10 TB 10 TB replicated 3X = 30TB 10 TB compressed 95% = 500GB Replicated 3X = 1.5 TB 20X less I/O bandwidth required 20 min vs. 1 min response time 8 hours vs. 24 mins backup window 500 GB 500 GB 500 GB
  • 18. Parse and Prepare Data On Hadoop hParser and XMap
  • 19. 4. The DT engine can immediately use this service to process data. The DT Engine is fully embeddable and can be invoked using any of the supported APIs. Java, C++, C, .NET, web services For simple integration, a command line interface is available to invoke services. Internal custom applications can embed transformation services using the various APIs. PowerCenter leverages DT via the Unstructured Data Transformation (UDT). This is a GUI transformation widget in Powercenter which wraps around the DT API and engine. DT can also be embedded in other middleware technologies. For some (WBIMB, WebMethods, BizTalk) INFA provides similar GUI widgets (agents) for the respective design environments. For others the API layer can be used directly. DT can be invoked in two general ways: 1.  Filenames can be passed to it, and DT will directly open the file(s) for processing. On the output side, DT can also directly write to the filesystem. 2.  The calling application can buffer the data and send buffers to DT for processing. On the output side, DT can also write back to memory buffers which are returned to the calling application. Though not shown below, the engine fully supports multiple input and output files or buffers as needed by the transformation. Engine invocation is a shared library. The DT engine runs fully within the process of the calling application. It is not an external engine. This removes any overhead from passing data between processes, across the network, etc. The engine is also dynamically invoked and does not need to be ‘started up’ or maintained externally. The DT engine is also thread-safe and re-entrant. This allows the calling application to invoke DT in multiple threads to increase throughput. A good example is DT’s support of PowerCenter partitioning to scale up processing. As shown below, the actual transformation logic is completely independent of any calling application. This means you can develop a transformation once, and leverage it in multiple environments simultaneously resulting in reduced development and maintenance times and lower impact of change. 1. Developer uses Studio to develop a transformation 2. Developer deploys transformation to local service repository (directory). All files needed for the transformation are moved. 3. To deploy to the server, this service folder is moved to the server via FTP, copy, script, etc. NOTE: If the server file system is mountable from the developer machine directly, then step 2 would deploy directly to the server. Parse and Prepare Data on Hadoop S Svc Repository S Flat Files & Documents Interaction dataIndustry StandardsXML The broadest coverage for Big Data social Device/sensor scientific Productivity •  Visual parsing environment •  Predefined translations Any DI/BI architecture PIG EDW MDM
  • 20. Example use cases Call Detail record •  Why Hadoop? •  CDR – Large data sets every 7 seconds every mobile phone in the region create a record •  Desire to analyze behavior, location to personalize and optimize pricing and ,marketing
  • 21. hadoop … dt-hadoop.jar … My_Parser /input/*/input*.txt 1.  Define parser in HParser visual studio 2.  Deploy the parser on Hadoop Distributed File System (HDFS) 3.  Run HParser to extract data and produce tabular format in Hadoop Parse and Prepare Data on Hadoop How does it work?
  • 23. Profiling and Discovering Data Informatica Profiling & Data Discovery on Hadoop
  • 24. CUSTOMER_ID example COUNTRY CODE example 3. Drilldown Analysis (into Hadoop Data) 2. Value & Pattern Analysis of Hadoop Data 1. Profiling Stats: Min/Max Values, NULLs, Inferred Data Types, etc. Drill down into actual data values to inspect results across entire data set, including potential duplicates Value and Pattern Frequency to isolated inconsistent/dirty data or unexpected patterns Hadoop Data Profiling results – exposed to anyone in enterprise via browser Stats to identify outliers and anomalies in data Hadoop Data Profiling Results
  • 25. Hadoop Data Domain Discovery Finding functional meaning of Data in Hadoop Leverage INFA rules/mapplets to identify functional meaning of Hadoop data Sensitive data (e.g. SSN, Credit Card number, etc.) PHI: Protected Health Information PII: Personally Identifiable Information Scalable to look for/discover ANY Domain type View/share report of data domains/ sensitive data contained in Hadoop. Ability to drill down to see suspect data values.
  • 26. Transforming and Cleansing Data PowerCenter on Hadoop Data Quality on Hadoop
  • 27. No-code visual development environment Preview results at any point in the data flow PowerCenter developers are now Hadoop developers
  • 28. Reuse and Import PC Metadata for Hadoop Import existing PC artifacts into Hadoop development environment Validate import logic before the actual import process to ensure compatibility
  • 29. Natural Language Processing Entity Extraction & Data Classification Train NLP to find and classify entities in unstructured data
  • 30. Address Validation & Data Cleansing
  • 31. Configure Mapping for Hadoop Execution No need to redesign mapping logic to execute on either Traditional or Hadoop infrastructure. Configure where the integration logic should run – Hadoop or Native
  • 32. SELECT T1.ORDERKEY1 AS ORDERKEY2, T1.li_count, orders.O_CUSTKEY AS CUSTKEY, customer.C_NAME, customer.C_NATIONKEY, nation.N_NAME, nation.N_REGIONKEY FROM ( SELECT TRANSFORM (L_Orderkey.id) USING CustomInfaTx FROM lineitem GROUP BY L_ORDERKEY ) T1 JOIN orders ON (customer.C_ORDERKEY = orders.O_ORDERKEY) JOIN customer ON (orders.O_CUSTKEY = customer.C_CUSTKEY) JOIN nation ON (customer.C_NATIONKEY = nation.N_NATIONKEY) WHERE nation.N_NAME = 'UNITED STATES' ) T2 INSERT OVERWRITE TABLE TARGET1 SELECT * INSERT OVERWRITE TABLE TARGET2 SELECT CUSTKEY, count(ORDERKEY2) GROUP BY CUSTKEY; Data Integration & Quality on Hadoop Hive-QL 1.  Entire Informatica mapping translated to Hive Query Language 2.  Optimized HQL converted to MapReduce & submitted to Hadoop cluster (job tracker). 3.  Advanced mapping transformations executed on Hadoop through User Defined Functions using Vibe MapReduce UDF
  • 33. Example Mapping Execution Source External Flat File Source External Relational Data Engine Repository Source HDFS File Cluster of Linux Machines Mapping logic translated to HQL and submitted to Hadoop Cluster Relational Data streamed to Hadoop for processing Target HDFS FIle Local flat file staged temporarily on HDFS Read HDFS file data Final processed data loaded into HDFS file Temp Staged Lookup File
  • 34. Orchestrating and Monitoring Hadoop Informatica Workflow & Monitoring for Hadoop Metadata Manager for Hadoop Dynamic Data Masking for Hadoop
  • 35. Mixed Workflow Orchestration One workflow running tasks on hadoop and local environments Cmd_Choose LoadPath MT_Load2Hadoop + Parse Cmd_Load2 Hadoop MT_Parse Cmd_ProfileData MT_Cleanse MT_Data Analysis Notification Name Type Default Value Description $User.LoadOptionPath Integer 2 Load path for workflow, depending on output of cmd task $User.DataSourceConnection String HiveSourceConnection Source connection object $User.ProfileResult Integer 100 Output from “profiling” commnad task. Add Edit Remove List of variables: Informatica Corporation Confidential Do Not Distribute.
  • 36. Full traceability from workflow to MapReduce jobs View generated Hive scripts Unified Administration Single Place to Manage & Monitor
  • 37. Data Lineage and Business Glossary
  • 39. Hadoop Architecture Overview •  PowerCenter on Hadoop •  Data Quality on Hadoop •  DT on Hadoop •  Entity Extraction on Hadoop •  Profiling on Hadoop Execution on HadoopPWX for HDFS PWX for HDFS PWX for Hive MYSQL Mercury Services Hive Client HDFS Infa-Lib DataNode1 HParser Map Reduce RDBMS Clients PWXfor Mercury Transactions, OLTP, OLAP Documents, Email Social Media, Web Logs Machine Device, Scientific PowerCenter SE Enterprise Grid PowerCenter Services PWXfor PC HDFS Infa-Lib HParser Map Reduce RDBMS Clients HDFS Infa-Lib HParser Map Reduce RDBMS Clients HDFS Infa-Lib HParser Map Reduce RDBMS Clients Hive DataNode2 DataNode3NameNode Job Tracker INFA Clients
  • 40. 40