SlideShare une entreprise Scribd logo
1  sur  26
1
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Turning Petabytes of Data into
Millions in Cost Recapture for the
World’s Biggest Retailers
Case Study with PRGX Global
Jonathon Whitton| Director Data Services
2
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
DRIVE CUSTOMER INSIGHTS
IMPROVE PRODUCT &
SERVICES EFFICIENCY LOWER BUSINESS RISKS
Data is Transforming Business
MODERNIZE ARCHITECTURE
3
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
OPERATIONS
DATA
MANAGEMENT
BATCH REAL-TIME
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT SECURITY
FILESYSTEM RELATIONAL NoSQL
STORE
INTEGRATE
BATCH STREAM SQL SEARCH SDK
Cloudera Enterprise
Making Hadoop Fast, Easy, and Secure for the Modernized Architecture
Hadoop is a new kind
of data platform.
• One place for unlimited data
• Unified data access
Cloudera makes it:
• Fast for business
• Easy to manage
• Secure without compromise
4
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Talend: A History of Delivering Innovation
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
(Revenue Growth)
Data
Integration
Master Data
Management
Data
Quality
Big Data
Application
Integration
Hadoop 2.0
Spark &
Cloud
• Our mission: enable the
data driven enterprise
• The most advanced Big
Data integration platform
• 10X faster development
• One solution for batch &
streaming
• Deploy machine learning in
minutes
Data
Preparation
1st to market for
Spark
1st on YARN &
Hadoop 2.0
1st to deliver a DI and
AI platform
1st ETL open source
tool
5
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
CLOUDERA ENTERPRISE
Talend and Cloudera
Enterprise Data Warehouse
Unstructured Data
Data Sources
Structured Data
Archive
Load
Applications
Reporting
BI System
Modeling
Model
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
APPLICATION
INTEGRATION
CLOUD
INTEGRATION
DATA
INTEGRATION
BIG DATA
INTEGRATION
MASTER DATA
MANAGEMENT
DATA
PREPARATION
Data Fabric
Connect
Serve
Serve
Ingest
Ingest
6
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Polling Question
• What is your top requirement for modernizing your big data infrastructure?
• Capture and secure all data
• Operational simplicity
• Variety of tools to interact with data
• Integration and portability across multiple platforms
• Timely availability of data
• Other
7
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
About me
 Jonathon Whitton, Director Data Services at PRGX
 Started working at PRGX in 2000
 BA from Duke University
 MBA from Kennesaw State University
 LinkedIn.com/in/whitton
 Twitter: @JonathonWhitton
8
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
 Global leader in accounts payable recovery audit
 Serve 75% of the Top 20 Global Retailers, most for more than 10 years
 Nearly half of recovery audit revenue from outside the U.S., in more than 30 countries
and across 5 continents
 ~1400 employees
9
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
What do we do for a living? …. We are in search of errors!
Aggregate & Manipulate
Client Data
• Receive over 2 million client
files annually
• 2.3 petabytes of data “live”
for auditing on average
• Data includes purchasing,
payment, receiving, deals,
point of sale, and emails
Mine Client Data for
Overpayments
• Utilize proprietary data
mining tools and techniques
• Fully document claim
Recover Overpayments
from Vendors
• Handle majority of vendor
communications
• Verify that deductions taken
or payments received
10
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
We are a data-driven business dealing with huge data sizes…
MONTHLY ANNUALLY200,000 2,400,000
# FILES FOR STRUCTURED DATA
MONTHLY 40TB
ANNUALLY 480TB
MONTHLY 200TB
ANNUALLY 2,375TB
INBOUND FOR STRUCTURED DATA
MONTHLY ANNUALLY12,500,000 150,000,000
# EMAILS UNSTRUCTURED DATA
11
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
… and huge data variety
CLIENT DATA ARRIVES
CHALLENGES METHOD RECEIVED
DELIVER TO AUDIT
TYPES OF FILES % BY FILES
EDI
XML
Flat file csv
Flat file delimited
database backups
spreadsheets
Pdfs
Tiff
Jpeg
Png
Prns
Emails
Microfiche
Proprietary formats
(FOR STRUCTURED DATA)
28% EBCDIC Flat
40% ASCII Flat
25% DB back-ups
7% Proprietary
Daily
Weekly
Bi-weekly
Monthly
Quarterly
Semi-annually
Annually
Unexpected
Under- estimated
New schema
Wrong schema
Tape
Hard drive
sFTP
Email
Other media – flash drives,
DVDs
DAILY WEEKLY BI-WEEKLY MONTHLY
QUARTERLY SEMI ANNUALLY ANNUALLY
12
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our Legacy solution was designed in the 1990s…
It had all the problems
that legacy systems
have
• High lead times
• Re-runs take a lot of
operations
• Highly labor intensive
operations
Tweaked and upgraded
over the years
but
architecturally
remained the same
Transformation process
Significant
re-engineering
zero downtime
Our solution had to be
flexible
and
lightning fast!
HiPer
High
Performance
Computing
13
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our evaluation was on potential “architectural stacks”
 Functional Requirements
 Information Security
Requirements and Privacy
Protection
 Performance Requirements
 Technical Requirements
 Vendor Information
Filter 1:
High level assessment focused on vendor’s overall company
strength, ability for the solution to scale, technology
feasibility in PRGX
19 potential vendors Short list of 6 vendors 5 POC vendors
Filter 2:
Based on willingness of vendor to work with us on a POC
and their flexibility
14
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Timeline of HiPer program
Q3 2013 –
Explore
technology that
can do Data
Transformation
much faster
Q4 2013 –
Ready with
problem
statement
Q1 2014 – Level
1 screening of
technology
stack options
Q2 2014 –Proof
of Concept
Q3 2014 –
Selection of
Hadoop
(Cloudera),
Talend stack
Q4 2014 – Pilot
Accounts;
Planning,
prioritization
for 2015
Q1 2015 –
Production
begins for top
15 accounts;
unstructured
analytics Proof
of concept
By Q2 2016 –
Remove
reliance on
AS400; pilot
accounts for
unstructured
analytics
By Q4 2016 –
60% accounts in
Production;
majority of
accounts for
unstructured
analytics
15
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our solution for structured data processing
Data Acquisition and Load Data Transformation Auditing
Infrastructure management
Data Long Term Storage Hadoop Cluster
Data Set 1
Data Set 2
Data Set 3
Data Set 4
Data Set 5
Audit Tool Set 1
Audit Tool Set 2
Audit Tool Set 3
1. Talend and related
automation
Data Privacy, Security and Data Lineage
1. HDFS – for long term data storage
2. Batch processing – Apache Hive and Apache Spark
3. Investigative queries (QC) – Apache Impala
4. Output to RDBMS – Apache Scoop and Talend
1. Cloudera Manager
1. Cloudera Navigator
2. Kerberos
1. Continue and invest in Legacy
technology (MSSQL)
2. Explore use of Hadoop
oriented BI tools
16
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Our solution for unstructured data processing
Data Acquisition and Load Data Transformation Auditing
Infrastructure management
Data
(pst, nsf, …)
Email Auditing Tool
Data Privacy, Security and Data Lineage
1. Cloudera Manager
1. Kerberos
2. Mutiple Solr indexes / Hbase Tables
EML Files
EML files
EML files
Long term
storage - HBASE
Search Index -
SOLR
1. In-house automation logic in .Net
to convert files to eml files
2. Load to Hbase via Apache Thrift
3. Apache Tika to Read Email Data
(Headers, Body, Attachments).
1. Hbase – email storage and updates
to individual records
2. NGData Keystore indexer (Lily) to
update Solr in near real time
3. SOLR – index to query emails
4. SOLRNET - .Net Api to SOLR querying
5. Apache Thrift - Update to HBASE for
audit findings
1. Refreshed .Net
application to read from
SOLR/update to HBASE
17
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Cloudera Hadoop impact our business?
Load and Prepare Data Audit
Load and
Prepare Data
Audit
X weeks
X/2 weeks
More time for Auditing
Limited time to process and audit data
HiPer’s Impact
Delivery Cycle
Structured data processing
 Improved performance in data delivery for structured
and unstructured data about 9-10 times the original
timelines via Hive
 Starting to use Spark
18
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Cloudera Hadoop impact our business?
Faster analysis in Hadoop
 Point of Sales (POS) data was aggregated
 list of stores
 number of transactions
 sum of quantity
 Would have run for hours in our AS400
environment
 2014 would not have been readily available before
Hadoop’s lower cost of storage made it affordable
for us to keep more data online
 No need to go back to our archive
 No need to request a restore
2015 data
13 months: 20141201 to 20151231
Row count: 2,048,666,122
Returned: 1385 rows returned (1 per store)
Returned in: 90 seconds
2014 data
13 months: 20131201 to 20141231
Row count: 2,159,626,445
Returned: 1365 rows returned (1 per store)
Returned in: 90 seconds
Field count in source POS data was 14
19
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
How does Talend impact our development?
Hive QL hand coding
Talend Big Data 40% faster
Move on to the next item
this much faster
Development Cycle
Development Cycle
 Reduced developing processes in Hadoop by at least
40%.
 With over 200 distinct process to get through this
year, the time savings is very meaningful in achieving
our goals
 No more hand coding Hive QL
 Talend has the logic de-coupled from the code that is
executing on the Hadoop cluster
 “Upgrade” to Spark via dropdown and then
addressing limited component changes that are
clearly identified
 “Upgrade” to the next big thing
20
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Cloudera Hadoop and Talend impact the future of our business
Structured data processing
 Support for “machine-learning” techniques in the email and buyer pulls significantly
enhancing productivity for audits
Expanded focus
 Key enabler to mine large volumes of data from our customers
 Ability to re-run and experiment on large sets of data
 Framework for industry standard Master Data
Infrastructure
 Backup and archival solution for the future
 Begin standardization and reusability of processes; eliminate person dependency
21
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Key success factors
 Create the business case…for multiple years, with quick wins
– no IT programs succeed without business backing
 Plan the program – for multiple years
 Architect it right
• Architect keeping your current process and tools in mind
• Architect it for different uses
• Architect it for time!
 Sell your program…you need champions everywhere
– from the CEO/Board to your business, to marketing to even HR
 Find the right partner; develop the partnership
22
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Polling Question
• Which Hadoop tools do you foresee using the most to help with your big data
challenges?
• Data ingestion tools
• Search
• Impala
• Kudu
• Spark
23
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Q&A
24
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Call to Action
Test Drive Talend Software Define a Big Data ProjectLearn about Hadoop with
Cloudera
25
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Why Cloudera and Talend?
 Scalability
 Performance
 Flexibility
 Stability
 Support
26
© Cloudera, Inc. All rights reserved.
© PRGX Global, Inc. All Rights Reserved
Thank you

Contenu connexe

Tendances

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
 
151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile
Zarul Zaabah
 

Tendances (20)

The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big Data
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big Data
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Hadoop and Financial Services
Hadoop and Financial ServicesHadoop and Financial Services
Hadoop and Financial Services
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 

Similaire à Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Retailers

The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 

Similaire à Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Retailers (20)

The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 

Dernier

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Dernier (20)

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Retailers

  • 1. 1 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Turning Petabytes of Data into Millions in Cost Recapture for the World’s Biggest Retailers Case Study with PRGX Global Jonathon Whitton| Director Data Services
  • 2. 2 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved DRIVE CUSTOMER INSIGHTS IMPROVE PRODUCT & SERVICES EFFICIENCY LOWER BUSINESS RISKS Data is Transforming Business MODERNIZE ARCHITECTURE
  • 3. 3 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved OPERATIONS DATA MANAGEMENT BATCH REAL-TIME PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT SECURITY FILESYSTEM RELATIONAL NoSQL STORE INTEGRATE BATCH STREAM SQL SEARCH SDK Cloudera Enterprise Making Hadoop Fast, Easy, and Secure for the Modernized Architecture Hadoop is a new kind of data platform. • One place for unlimited data • Unified data access Cloudera makes it: • Fast for business • Easy to manage • Secure without compromise
  • 4. 4 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Talend: A History of Delivering Innovation 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 (Revenue Growth) Data Integration Master Data Management Data Quality Big Data Application Integration Hadoop 2.0 Spark & Cloud • Our mission: enable the data driven enterprise • The most advanced Big Data integration platform • 10X faster development • One solution for batch & streaming • Deploy machine learning in minutes Data Preparation 1st to market for Spark 1st on YARN & Hadoop 2.0 1st to deliver a DI and AI platform 1st ETL open source tool
  • 5. 5 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved CLOUDERA ENTERPRISE Talend and Cloudera Enterprise Data Warehouse Unstructured Data Data Sources Structured Data Archive Load Applications Reporting BI System Modeling Model OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE APPLICATION INTEGRATION CLOUD INTEGRATION DATA INTEGRATION BIG DATA INTEGRATION MASTER DATA MANAGEMENT DATA PREPARATION Data Fabric Connect Serve Serve Ingest Ingest
  • 6. 6 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Polling Question • What is your top requirement for modernizing your big data infrastructure? • Capture and secure all data • Operational simplicity • Variety of tools to interact with data • Integration and portability across multiple platforms • Timely availability of data • Other
  • 7. 7 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved About me  Jonathon Whitton, Director Data Services at PRGX  Started working at PRGX in 2000  BA from Duke University  MBA from Kennesaw State University  LinkedIn.com/in/whitton  Twitter: @JonathonWhitton
  • 8. 8 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved  Global leader in accounts payable recovery audit  Serve 75% of the Top 20 Global Retailers, most for more than 10 years  Nearly half of recovery audit revenue from outside the U.S., in more than 30 countries and across 5 continents  ~1400 employees
  • 9. 9 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved What do we do for a living? …. We are in search of errors! Aggregate & Manipulate Client Data • Receive over 2 million client files annually • 2.3 petabytes of data “live” for auditing on average • Data includes purchasing, payment, receiving, deals, point of sale, and emails Mine Client Data for Overpayments • Utilize proprietary data mining tools and techniques • Fully document claim Recover Overpayments from Vendors • Handle majority of vendor communications • Verify that deductions taken or payments received
  • 10. 10 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved We are a data-driven business dealing with huge data sizes… MONTHLY ANNUALLY200,000 2,400,000 # FILES FOR STRUCTURED DATA MONTHLY 40TB ANNUALLY 480TB MONTHLY 200TB ANNUALLY 2,375TB INBOUND FOR STRUCTURED DATA MONTHLY ANNUALLY12,500,000 150,000,000 # EMAILS UNSTRUCTURED DATA
  • 11. 11 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved … and huge data variety CLIENT DATA ARRIVES CHALLENGES METHOD RECEIVED DELIVER TO AUDIT TYPES OF FILES % BY FILES EDI XML Flat file csv Flat file delimited database backups spreadsheets Pdfs Tiff Jpeg Png Prns Emails Microfiche Proprietary formats (FOR STRUCTURED DATA) 28% EBCDIC Flat 40% ASCII Flat 25% DB back-ups 7% Proprietary Daily Weekly Bi-weekly Monthly Quarterly Semi-annually Annually Unexpected Under- estimated New schema Wrong schema Tape Hard drive sFTP Email Other media – flash drives, DVDs DAILY WEEKLY BI-WEEKLY MONTHLY QUARTERLY SEMI ANNUALLY ANNUALLY
  • 12. 12 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our Legacy solution was designed in the 1990s… It had all the problems that legacy systems have • High lead times • Re-runs take a lot of operations • Highly labor intensive operations Tweaked and upgraded over the years but architecturally remained the same Transformation process Significant re-engineering zero downtime Our solution had to be flexible and lightning fast! HiPer High Performance Computing
  • 13. 13 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our evaluation was on potential “architectural stacks”  Functional Requirements  Information Security Requirements and Privacy Protection  Performance Requirements  Technical Requirements  Vendor Information Filter 1: High level assessment focused on vendor’s overall company strength, ability for the solution to scale, technology feasibility in PRGX 19 potential vendors Short list of 6 vendors 5 POC vendors Filter 2: Based on willingness of vendor to work with us on a POC and their flexibility
  • 14. 14 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Timeline of HiPer program Q3 2013 – Explore technology that can do Data Transformation much faster Q4 2013 – Ready with problem statement Q1 2014 – Level 1 screening of technology stack options Q2 2014 –Proof of Concept Q3 2014 – Selection of Hadoop (Cloudera), Talend stack Q4 2014 – Pilot Accounts; Planning, prioritization for 2015 Q1 2015 – Production begins for top 15 accounts; unstructured analytics Proof of concept By Q2 2016 – Remove reliance on AS400; pilot accounts for unstructured analytics By Q4 2016 – 60% accounts in Production; majority of accounts for unstructured analytics
  • 15. 15 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our solution for structured data processing Data Acquisition and Load Data Transformation Auditing Infrastructure management Data Long Term Storage Hadoop Cluster Data Set 1 Data Set 2 Data Set 3 Data Set 4 Data Set 5 Audit Tool Set 1 Audit Tool Set 2 Audit Tool Set 3 1. Talend and related automation Data Privacy, Security and Data Lineage 1. HDFS – for long term data storage 2. Batch processing – Apache Hive and Apache Spark 3. Investigative queries (QC) – Apache Impala 4. Output to RDBMS – Apache Scoop and Talend 1. Cloudera Manager 1. Cloudera Navigator 2. Kerberos 1. Continue and invest in Legacy technology (MSSQL) 2. Explore use of Hadoop oriented BI tools
  • 16. 16 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Our solution for unstructured data processing Data Acquisition and Load Data Transformation Auditing Infrastructure management Data (pst, nsf, …) Email Auditing Tool Data Privacy, Security and Data Lineage 1. Cloudera Manager 1. Kerberos 2. Mutiple Solr indexes / Hbase Tables EML Files EML files EML files Long term storage - HBASE Search Index - SOLR 1. In-house automation logic in .Net to convert files to eml files 2. Load to Hbase via Apache Thrift 3. Apache Tika to Read Email Data (Headers, Body, Attachments). 1. Hbase – email storage and updates to individual records 2. NGData Keystore indexer (Lily) to update Solr in near real time 3. SOLR – index to query emails 4. SOLRNET - .Net Api to SOLR querying 5. Apache Thrift - Update to HBASE for audit findings 1. Refreshed .Net application to read from SOLR/update to HBASE
  • 17. 17 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Cloudera Hadoop impact our business? Load and Prepare Data Audit Load and Prepare Data Audit X weeks X/2 weeks More time for Auditing Limited time to process and audit data HiPer’s Impact Delivery Cycle Structured data processing  Improved performance in data delivery for structured and unstructured data about 9-10 times the original timelines via Hive  Starting to use Spark
  • 18. 18 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Cloudera Hadoop impact our business? Faster analysis in Hadoop  Point of Sales (POS) data was aggregated  list of stores  number of transactions  sum of quantity  Would have run for hours in our AS400 environment  2014 would not have been readily available before Hadoop’s lower cost of storage made it affordable for us to keep more data online  No need to go back to our archive  No need to request a restore 2015 data 13 months: 20141201 to 20151231 Row count: 2,048,666,122 Returned: 1385 rows returned (1 per store) Returned in: 90 seconds 2014 data 13 months: 20131201 to 20141231 Row count: 2,159,626,445 Returned: 1365 rows returned (1 per store) Returned in: 90 seconds Field count in source POS data was 14
  • 19. 19 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved How does Talend impact our development? Hive QL hand coding Talend Big Data 40% faster Move on to the next item this much faster Development Cycle Development Cycle  Reduced developing processes in Hadoop by at least 40%.  With over 200 distinct process to get through this year, the time savings is very meaningful in achieving our goals  No more hand coding Hive QL  Talend has the logic de-coupled from the code that is executing on the Hadoop cluster  “Upgrade” to Spark via dropdown and then addressing limited component changes that are clearly identified  “Upgrade” to the next big thing
  • 20. 20 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Cloudera Hadoop and Talend impact the future of our business Structured data processing  Support for “machine-learning” techniques in the email and buyer pulls significantly enhancing productivity for audits Expanded focus  Key enabler to mine large volumes of data from our customers  Ability to re-run and experiment on large sets of data  Framework for industry standard Master Data Infrastructure  Backup and archival solution for the future  Begin standardization and reusability of processes; eliminate person dependency
  • 21. 21 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Key success factors  Create the business case…for multiple years, with quick wins – no IT programs succeed without business backing  Plan the program – for multiple years  Architect it right • Architect keeping your current process and tools in mind • Architect it for different uses • Architect it for time!  Sell your program…you need champions everywhere – from the CEO/Board to your business, to marketing to even HR  Find the right partner; develop the partnership
  • 22. 22 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Polling Question • Which Hadoop tools do you foresee using the most to help with your big data challenges? • Data ingestion tools • Search • Impala • Kudu • Spark
  • 23. 23 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Q&A
  • 24. 24 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Call to Action Test Drive Talend Software Define a Big Data ProjectLearn about Hadoop with Cloudera
  • 25. 25 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Why Cloudera and Talend?  Scalability  Performance  Flexibility  Stability  Support
  • 26. 26 © Cloudera, Inc. All rights reserved. © PRGX Global, Inc. All Rights Reserved Thank you