Big Data - How to Get Started

Primer On Getting Started With
Big Data Projects
Kurt Lueck
January 2013

Contact Presenter

Kurt Lueck
Managing Director, Business Intelligence & Analytics

Email: Kurt.Lueck@pactera.com
Desk: +1.704.944.3155 x240
6100 Fairview Road, Suite 560, Charlotte, NC 28210
Visit our website: www.pactera.com

© Pactera. Confidential. All Rights Reserved. 2

Pactera Snapshot
 NASDAQ: Symbol PACT
 Based in Charlotte NC & Beijing, China
 35 Offices Globally / 24,000 Employees
 Fortune 500 Clients (Financial Services, High Tech, Retail)
 Focus on Driving Innovation (Big Data, Analytics, Mobility, Cloud Solutions)


Global Footprint and Flexible Delivery Capabilities

Pactera is a global company strategically headquartered in China, enabling
partnership with companies seeking to leverage one of the world’s largest and
fastest-growing technology markets.

Global FTE: 24,000 North America & EU: 500 Asia Pacific: 1,000 Greater China: 22,500

London
Seattle
Changchun

San Francisco Barcelona Beijing Dalian
Tokyo
Silicon Valley Charlotte Tianjin
Qingdao

Xi’an
Nanjing Wuxi

Osaka
Atlanta
Wuhan

San Diego
Shanghai
Chengdu Changsha Hangzhou

Guangzhou Taiwan
Dongguan Hong Kong
Shenzhen

Malaysia
Singapore

Melbourne Sydney


Primer on Big Data

1 Definitions

2 Drivers

3 Predictions

4 10 Steps to Starting Your Big Data project

5 5 Critical Mistakes

6 2 Practical Success Stories

7 Next Steps


What Is Big Data?

Volume
Velocity
Variety
Big Data is high-volume, velocity and variety information
assets that demand cost-effective, innovative forms of
information processing for enhanced insight
and decision making


Driver - #1 Growth of Dark Data

Leveraging dark data represents largest
opportunity to transform business.


Drivers - #2 Increasing Need to Process Data (Efficiently)

Organizations must process increasing
data, increasing types, and create
real-time business decisions.


Driver - #3 Explosion of Variety

Explosion of unstructured data to be
analyzed creates opportunities.


Big Data Predictions

Through 2014, 20% of enterprise warehouses will add
distributed processes

By 2015, 20% of Global 1000 organizations will have a
strategic focus on information infrastructure equal to that of
application management

Beginning in 2015, the term ‘big data’ will no longer be a
competitive differentiator for technology providers

By 2015, big data demand will reach 4.4 million jobs globally
but only one third of those jobs will be filled

Source: Gartner

What Exactly is hadoop?

Hadoop Distributed
File System (HDFS) MapReduce

File Sharing & Data
Distribute Computing
Protection Across
Across Physical Servers
Physical Servers


Getting Started – 9 Steps

Identify Problem

Develop Business Case

Identify Resource Needs

Evaluate /Select Hardware & Software

Fund POC

Create Small Solution

Evaluate Solution

Develop Long-Term Roadmap

Perform Project


Step 1: What’s Your Problem


Step 2: Develop Business Case

General Guidelines
1. Follow Traditional Business
Case Steps
2. Engage Organization – This is
Not an IT project
Proposed 3. Engage Experts (You May Not
Business Have Them Yet)
Solution 4. Consider Team Carefully

Business
Case
Proposed
Technology
Solution


Step 3: Identify Resource Needs

Potential Weaknesses:
• Big Data Skills
• Predictive Analytics
• Data Scientist
• Strong Business Analyst
• Agile Methodology
Business • Project Managers
Expertise

New
Resources?

Technology
Expertise


Step 4: Technical Architecture

Mega-Vendors – Big Data – Vertical Industry


Step 4: Technical Architecture

Architectures
• Move computing near to data
• Online analysis & Offline analysis
• Parallel ingestion/exchanges
• SQL and NoSQL
• Computing as well as storing

Business Value
• From statistic to explore & prediction
• From period to near real time
• From commercial to open source
• From big data to big understanding


Critical Mistakes

Lack of Expertise

Big Data is IT project without a problem

Lack of technology alignment

Lack of Long-Term Roadmap
Lack of critical evaluation

Story #1 – Travel Cloudera Style
Collecting Data
• Offline explorer, spiders
• Web server log files and Web UI scripts
• Data feed from tools, tealeaf, Omniture feed, etc
• Data feed from external, such as facebook feed, etc
• Upstream operational database

Analyzing and Exploiting Data
• Method, funnel analysis, shopping cart analysis, decision tree, etc
• Tools, such as Omniture, Google analytics, SSAS, Unica, Weka, etc
• Analytics of searching engine, such as SEO and SEM reporting

Empower Business with Intelligence
• Mini-batch
• Near real time DW/DB
• A/B and MVT Testing Originally, we implement Behavioral Search project intended to capture
• Recommendation Engine customer behavior on line. It captures search parameters from the
customers using Tealeaf and persists this data in Hadoop. From it, an
• Finance projection
analyst would be able to re-tell a story of what the customer searched for,
what he/she saw, and what he/she did based on the response.
• High margin comes from the lodging;
• High degree of merchant hotels are sold in the
Next, we polished new customer data mart including full roll out of
1st page of search result;
individualization, customer segmentations, customer lifetime value calc,
• Larger families tend to book passenger vans
and quick lookup of customer purchase details for longer period
instead of midsize cars


Story #1 – Lessons Learned
secs Data @ Nov. 2012
1800 Hive Impala 1556
1600
1400
1200
934
1000
800 667
600 431 425
400 224 240
151
200 37 49 86
4
0
One Day Query- One Month Query- Three Month Query Six Month Query One Year Query Two and half Year
21GB-24P 650GB-744P 1.7TB-2047P 2.9TB-2920P 3.8TB-2391P Query 5.8TB-3500+P

• Hadoop Use Cases Moving to Real-Time
• 71% - Move data from Hadoop to RDBMS for faster and interactive SQL
• 67% - already query Hadoop using Hive
• Impala – Real-Time SQL Queries engine for Hadoop, officially release in Q1, 2013
• Query results 4-30x faster than Hive
• Support HQL and 100% open source


Story #2 – Personalization With Big Data


2013 Pactera Focus Area

1 2 3 4
Putting Big Data Visual Performance
Voice of Customer: Predict Your Future:
To Work: Management Enabled:

Large clients are still struggling Nobody can predict their future Data volumes are growing fast. Clients who desire to tie
with what to do with the other but using advanced predictive Customers, partners, and now individual accountability to
85% of their data, which is analytics financial services even sensor-based systems are business value drivers can utilize
unstructured. This unstructured organizations can apply science to generating data so quickly that BPM services to identify metrics
data is made up of customer understanding fraudulent organizations across all industries and BI & Analytics technology to
surveys, call center activity, customer buying need new technologies to stay enable the BPM Strategy.
discussions, and most recently behavior, and manage risk etc. ahead. Organizations must analyze
social media data. VOC strategies this data to understand and
help companies manage and gain improve their business.
value from this data.

Example: Creating a
Example: Embedding Big Data Solution to Example: Enabling BPM
Example: Creating
Predictive Analytics into through Visual Analytic
Customer Buying analyze customer
Risk Management
Behavior Solutions relationship and Mgmt Dashboards
solutions
demand data


Conclusions

How Target
Figured Out A
Teen Girl Was
Pregnant Before
Her Father Did

Thank you

Kurt Lueck
Managing Director, Business Intelligence & Analytics

Email: Kurt.Lueck@pactera.com
Desk: +1.704.944.3155 x240
6100 Fairview Road, Suite 560, Charlotte, NC 28210
Visit our website: www.pactera.com


Big Data - How to Get Started

Recommandé

Recommandé

Contenu connexe

Similaire à Big Data - How to Get Started

Similaire à Big Data - How to Get Started (20)

Plus de Pactera_US

Plus de Pactera_US (9)

Dernier

Dernier (20)

Big Data - How to Get Started

Notes de l'éditeur