SlideShare une entreprise Scribd logo
1  sur  24
Primer On Getting Started With
             Big Data Projects
                      Kurt Lueck
                    January 2013
Contact Presenter


                                                Kurt Lueck
                                                Managing Director, Business Intelligence & Analytics

                                                Email: Kurt.Lueck@pactera.com
                                                Desk: +1.704.944.3155 x240
                                                6100 Fairview Road, Suite 560, Charlotte, NC 28210
                                                Visit our website: www.pactera.com




© Pactera. Confidential. All Rights Reserved.                                                          2
Pactera Snapshot
    NASDAQ: Symbol PACT
    Based in Charlotte NC & Beijing, China
    35 Offices Globally / 24,000 Employees
    Fortune 500 Clients (Financial Services, High Tech, Retail)
    Focus on Driving Innovation (Big Data, Analytics, Mobility, Cloud Solutions)




© Pactera. Confidential. All Rights Reserved.                                   3
Global Footprint and Flexible Delivery Capabilities

        Pactera is a global company strategically headquartered in China, enabling
        partnership with companies seeking to leverage one of the world’s largest and
        fastest-growing technology markets.

          Global FTE: 24,000                    North America & EU: 500 Asia Pacific: 1,000                 Greater China: 22,500



                                                                 London
            Seattle
                                                                                                                                 Changchun

    San Francisco                                                Barcelona                              Beijing                 Dalian
                                                                                                                                                 Tokyo
    Silicon Valley                          Charlotte                                                       Tianjin
                                                                                                                                Qingdao



                                                                                                   Xi’an
                                                                                                                  Nanjing       Wuxi

                                                                                                                                                Osaka
                                        Atlanta
                                                                                                        Wuhan


          San Diego
                                                                                                                               Shanghai
                                                                                         Chengdu     Changsha               Hangzhou


                                                                                                   Guangzhou                           Taiwan
                                                                                                     Dongguan           Hong Kong
                                                                                                                      Shenzhen




                                                                                    Malaysia
                                                                                      Singapore



                                                                                                        Melbourne                                  Sydney


© Pactera. Confidential. All Rights Reserved.                                                                                                               4
Primer on Big Data

    1                Definitions


     2               Drivers


     3               Predictions


     4               10 Steps to Starting Your Big Data project


     5               5 Critical Mistakes


     6               2 Practical Success Stories


     7               Next Steps

© Pactera. Confidential. All Rights Reserved.                     5
What Is Big Data?


                                                Volume
                                                 Velocity
                                                Variety
         Big Data is high-volume, velocity and variety information
           assets that demand cost-effective, innovative forms of
                information processing for enhanced insight
                            and decision making


© Pactera. Confidential. All Rights Reserved.                        6
Driver - #1 Growth of Dark Data




Leveraging dark data represents largest
opportunity to transform business.




© Pactera. Confidential. All Rights Reserved.   7
Drivers - #2 Increasing Need to Process Data (Efficiently)




Organizations must process increasing
data, increasing types, and create
real-time business decisions.




© Pactera. Confidential. All Rights Reserved.                8
Driver - #3 Explosion of Variety




Explosion of unstructured data to be
analyzed creates opportunities.




© Pactera. Confidential. All Rights Reserved.   9
Big Data Predictions

                     Through 2014, 20% of enterprise warehouses will add
                     distributed processes


                     By 2015, 20% of Global 1000 organizations will have a
                     strategic focus on information infrastructure equal to that of
                     application management


                       Beginning in 2015, the term ‘big data’ will no longer be a
                       competitive differentiator for technology providers



                     By 2015, big data demand will reach 4.4 million jobs globally
                     but only one third of those jobs will be filled

                                                                    Source: Gartner
© Pactera. Confidential. All Rights Reserved.                                         10
What Exactly is hadoop?




                         Hadoop Distributed
                         File System (HDFS)          MapReduce

                            File Sharing & Data
                                                  Distribute Computing
                             Protection Across
                                                  Across Physical Servers
                              Physical Servers




© Pactera. Confidential. All Rights Reserved.                               11
Getting Started – 9 Steps

               Identify Problem

               Develop Business Case

               Identify Resource Needs

               Evaluate /Select Hardware & Software

               Fund POC

               Create Small Solution

               Evaluate Solution

               Develop Long-Term Roadmap

               Perform Project

© Pactera. Confidential. All Rights Reserved.         12
Step 1: What’s Your Problem




© Pactera. Confidential. All Rights Reserved.   13
Step 2: Develop Business Case

                                                           General Guidelines
                                                           1. Follow Traditional Business
                                                              Case Steps
                                                           2. Engage Organization – This is
                                                              Not an IT project
                                  Proposed                 3. Engage Experts (You May Not
                                  Business                    Have Them Yet)
                                  Solution                 4. Consider Team Carefully

                                                Business
                                                 Case
                                 Proposed
                                Technology
                                 Solution




© Pactera. Confidential. All Rights Reserved.                                                 14
Step 3: Identify Resource Needs

                                                             Potential Weaknesses:
                                                             • Big Data Skills
                                                             • Predictive Analytics
                                                             • Data Scientist
                                                             • Strong Business Analyst
                                                             • Agile Methodology
                                  Business                   • Project Managers
                                  Expertise

                                                   New
                                                Resources?

                                Technology
                                 Expertise




© Pactera. Confidential. All Rights Reserved.                                            15
Step 4: Technical Architecture

      Mega-Vendors – Big Data – Vertical Industry




© Pactera. Confidential. All Rights Reserved.       16
Step 4: Technical Architecture

                                                Architectures
                                                •   Move computing near to data
                                                •   Online analysis & Offline analysis
                                                •   Parallel ingestion/exchanges
                                                •   SQL and NoSQL
                                                •   Computing as well as storing



                                                Business Value
                                                •   From statistic to explore & prediction
                                                •   From period to near real time
                                                •   From commercial to open source
                                                •   From big data to big understanding




© Pactera. Confidential. All Rights Reserved.                                                17
Critical Mistakes



          Lack of Expertise

           Big Data is IT project without a problem

          Lack of technology alignment

           Lack of Long-Term Roadmap
           Lack of critical evaluation
© Pactera. Confidential. All Rights Reserved.         18
Story #1 – Travel Cloudera Style
Collecting Data
•     Offline explorer, spiders
•     Web server log files and Web UI scripts
•     Data feed from tools, tealeaf, Omniture feed, etc
•     Data feed from external, such as facebook feed, etc
•     Upstream operational database

Analyzing and Exploiting Data
•     Method, funnel analysis, shopping cart analysis, decision tree, etc
•     Tools, such as Omniture, Google analytics, SSAS, Unica, Weka, etc
•     Analytics of searching engine, such as SEO and SEM reporting

Empower Business with Intelligence
•     Mini-batch
•     Near real time DW/DB
•     A/B and MVT Testing                               Originally, we implement Behavioral Search project intended to capture
•     Recommendation Engine                             customer behavior on line. It captures search parameters from the
                                                        customers using Tealeaf and persists this data in Hadoop. From it, an
•     Finance projection
                                                        analyst would be able to re-tell a story of what the customer searched for,
                                                        what he/she saw, and what he/she did based on the response.
    • High margin comes from the lodging;
    • High degree of merchant hotels are sold in the
                                                         Next, we polished new customer data mart including full roll out of
      1st page of search result;
                                                         individualization, customer segmentations, customer lifetime value calc,
    • Larger families tend to book passenger vans
                                                         and quick lookup of customer purchase details for longer period
      instead of midsize cars

    © Pactera. Confidential. All Rights Reserved.                                                                                     19
Story #1 – Lessons Learned
secs        Data @ Nov. 2012
 1800                                                                                           Hive                Impala 1556
 1600
 1400
 1200
                                                                                                        934
 1000
   800                                                                             667
   600                                                          431                                                        425
   400                                          224                                                     240
                                                                                   151
   200                37                        49               86
                      4
       0
              One Day Query-           One Month Query-   Three Month Query   Six Month Query      One Year Query    Two and half Year
                21GB-24P                 650GB-744P          1.7TB-2047P        2.9TB-2920P         3.8TB-2391P     Query 5.8TB-3500+P



   •       Hadoop Use Cases Moving to Real-Time
   •       71% - Move data from Hadoop to RDBMS for faster and interactive SQL
   •       67% - already query Hadoop using Hive
   •       Impala – Real-Time SQL Queries engine for Hadoop, officially release in Q1, 2013
   •       Query results 4-30x faster than Hive
   •       Support HQL and 100% open source

© Pactera. Confidential. All Rights Reserved.                                                                                      20
Story #2 – Personalization With Big Data




© Pactera. Confidential. All Rights Reserved.   21
2013 Pactera Focus Area



     1                                           2                                 3                                      4
                                                                                       Putting Big Data                   Visual Performance
       Voice of Customer:                         Predict Your Future:
                                                                                          To Work:                       Management Enabled:


  Large clients are still struggling        Nobody can predict their future      Data volumes are growing fast.        Clients who desire to tie
  with what to do with the other            but using advanced predictive        Customers, partners, and now          individual accountability to
  85% of their data, which is               analytics financial services         even sensor-based systems are         business value drivers can utilize
  unstructured. This unstructured           organizations can apply science to   generating data so quickly that       BPM services to identify metrics
  data is made up of customer               understanding fraudulent             organizations across all industries   and BI & Analytics technology to
  surveys, call center                      activity, customer buying            need new technologies to stay         enable the BPM Strategy.
  discussions, and most recently            behavior, and manage risk etc.       ahead. Organizations must analyze
  social media data. VOC strategies                                              this data to understand and
  help companies manage and gain                                                 improve their business.
  value from this data.

                                                                                  Example: Creating a
                                                Example: Embedding                Big Data Solution to                  Example: Enabling BPM
    Example: Creating
                                                Predictive Analytics into                                               through Visual Analytic
    Customer Buying                                                               analyze customer
                                                Risk Management
    Behavior Solutions                                                            relationship and                      Mgmt Dashboards
                                                solutions
                                                                                  demand data




© Pactera. Confidential. All Rights Reserved.                                                                                                               22
Conclusions


             How Target
            Figured Out A
            Teen Girl Was
           Pregnant Before
            Her Father Did
© Pactera. Confidential. All Rights Reserved.   23
Thank you




Kurt Lueck
Managing Director, Business Intelligence & Analytics

Email: Kurt.Lueck@pactera.com
Desk: +1.704.944.3155 x240
6100 Fairview Road, Suite 560, Charlotte, NC 28210
Visit our website: www.pactera.com

 © Pactera. Confidential. All Rights Reserved.                     24

Contenu connexe

Similaire à Big Data - How to Get Started

China IT Outsourcing
China IT Outsourcing China IT Outsourcing
China IT Outsourcing Pactera_US
 
Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?Eric D. Schabell
 
Accenture technology-vision-2013
Accenture technology-vision-2013Accenture technology-vision-2013
Accenture technology-vision-2013ruttens.com
 
Accenture technology vision_2013_feb_18[1]
Accenture technology vision_2013_feb_18[1]Accenture technology vision_2013_feb_18[1]
Accenture technology vision_2013_feb_18[1]Lars Kamp
 
Accenture technology-vision-2013
Accenture technology-vision-2013Accenture technology-vision-2013
Accenture technology-vision-2013ruttens.com
 
Accenture technology-vision-2013
Accenture technology-vision-2013Accenture technology-vision-2013
Accenture technology-vision-2013Francisco Calzado
 
The 10 most valuable sdn solution providers dec jan 2017
The 10 most valuable sdn solution providers dec jan 2017The 10 most valuable sdn solution providers dec jan 2017
The 10 most valuable sdn solution providers dec jan 2017Merry D'souza
 
Business Value of APIs - TFG 2012 Issue2 (Webcast)
Business Value of APIs - TFG 2012 Issue2  (Webcast)Business Value of APIs - TFG 2012 Issue2  (Webcast)
Business Value of APIs - TFG 2012 Issue2 (Webcast)Apigee | Google Cloud
 
LocationSelector.com
LocationSelector.comLocationSelector.com
LocationSelector.comZoe Harries
 
Network Operations | SlideShare | Accenture
Network Operations | SlideShare | AccentureNetwork Operations | SlideShare | Accenture
Network Operations | SlideShare | AccentureAccenture Operations
 
Building The Next Generation of Connected Smart Contracts
Building The Next Generation of Connected Smart ContractsBuilding The Next Generation of Connected Smart Contracts
Building The Next Generation of Connected Smart ContractsArthur Micoulet
 
Equity trading in india
Equity trading in indiaEquity trading in india
Equity trading in indiasmriti31dubei
 
Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India IdeasImpacts
 
Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India IdeasImpacts
 
Wireless Breakfast Briefing
Wireless Breakfast BriefingWireless Breakfast Briefing
Wireless Breakfast BriefingLuke Thomas
 
Journey to the Cloud
Journey to the CloudJourney to the Cloud
Journey to the CloudPete Nieminen
 
Logicalis Annual Review 2010
Logicalis Annual Review 2010Logicalis Annual Review 2010
Logicalis Annual Review 2010Logicalis
 

Similaire à Big Data - How to Get Started (20)

China IT Outsourcing
China IT Outsourcing China IT Outsourcing
China IT Outsourcing
 
Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?Why change? Why Open Source? Why Red Hat? Why now?
Why change? Why Open Source? Why Red Hat? Why now?
 
Accenture technology-vision-2013
Accenture technology-vision-2013Accenture technology-vision-2013
Accenture technology-vision-2013
 
Accenture technology vision_2013_feb_18[1]
Accenture technology vision_2013_feb_18[1]Accenture technology vision_2013_feb_18[1]
Accenture technology vision_2013_feb_18[1]
 
Accenture technology-vision-2013
Accenture technology-vision-2013Accenture technology-vision-2013
Accenture technology-vision-2013
 
Accenture technology-vision-2013
Accenture technology-vision-2013Accenture technology-vision-2013
Accenture technology-vision-2013
 
The 10 most valuable sdn solution providers dec jan 2017
The 10 most valuable sdn solution providers dec jan 2017The 10 most valuable sdn solution providers dec jan 2017
The 10 most valuable sdn solution providers dec jan 2017
 
Online shopping in China
Online shopping in China Online shopping in China
Online shopping in China
 
Business Value of APIs - TFG 2012 Issue2 (Webcast)
Business Value of APIs - TFG 2012 Issue2  (Webcast)Business Value of APIs - TFG 2012 Issue2  (Webcast)
Business Value of APIs - TFG 2012 Issue2 (Webcast)
 
LocationSelector.com
LocationSelector.comLocationSelector.com
LocationSelector.com
 
LocationSelector.com
LocationSelector.comLocationSelector.com
LocationSelector.com
 
Talent Pool Landscape Analysis - SFDC 2018
Talent Pool Landscape Analysis - SFDC 2018Talent Pool Landscape Analysis - SFDC 2018
Talent Pool Landscape Analysis - SFDC 2018
 
Network Operations | SlideShare | Accenture
Network Operations | SlideShare | AccentureNetwork Operations | SlideShare | Accenture
Network Operations | SlideShare | Accenture
 
Building The Next Generation of Connected Smart Contracts
Building The Next Generation of Connected Smart ContractsBuilding The Next Generation of Connected Smart Contracts
Building The Next Generation of Connected Smart Contracts
 
Equity trading in india
Equity trading in indiaEquity trading in india
Equity trading in india
 
Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India
 
Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India Ideas to Impacts | Distributing The Future Evenly | India
Ideas to Impacts | Distributing The Future Evenly | India
 
Wireless Breakfast Briefing
Wireless Breakfast BriefingWireless Breakfast Briefing
Wireless Breakfast Briefing
 
Journey to the Cloud
Journey to the CloudJourney to the Cloud
Journey to the Cloud
 
Logicalis Annual Review 2010
Logicalis Annual Review 2010Logicalis Annual Review 2010
Logicalis Annual Review 2010
 

Plus de Pactera_US

How to Achieve Measurable Benefits Through Project and Organizational Change
How to Achieve Measurable Benefits Through Project and Organizational ChangeHow to Achieve Measurable Benefits Through Project and Organizational Change
How to Achieve Measurable Benefits Through Project and Organizational ChangePactera_US
 
Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks Pactera_US
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Pactera_US
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Pactera_US
 
Pactera Big Data Solutions for Retail
Pactera Big Data Solutions for Retail Pactera Big Data Solutions for Retail
Pactera Big Data Solutions for Retail Pactera_US
 
Siebel to Salesforce
Siebel to Salesforce Siebel to Salesforce
Siebel to Salesforce Pactera_US
 
Business Process Management - Enabling The Business Drivers
Business Process Management - Enabling The Business DriversBusiness Process Management - Enabling The Business Drivers
Business Process Management - Enabling The Business DriversPactera_US
 
How do you monitor your Basel III compliance?
How do you monitor your Basel III compliance? How do you monitor your Basel III compliance?
How do you monitor your Basel III compliance? Pactera_US
 

Plus de Pactera_US (9)

How to Achieve Measurable Benefits Through Project and Organizational Change
How to Achieve Measurable Benefits Through Project and Organizational ChangeHow to Achieve Measurable Benefits Through Project and Organizational Change
How to Achieve Measurable Benefits Through Project and Organizational Change
 
Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks Unlock Big Data's Potential in Financial Services with Hortonworks
Unlock Big Data's Potential in Financial Services with Hortonworks
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data
 
Pactera Big Data Solutions for Retail
Pactera Big Data Solutions for Retail Pactera Big Data Solutions for Retail
Pactera Big Data Solutions for Retail
 
Siebel to Salesforce
Siebel to Salesforce Siebel to Salesforce
Siebel to Salesforce
 
Business Process Management - Enabling The Business Drivers
Business Process Management - Enabling The Business DriversBusiness Process Management - Enabling The Business Drivers
Business Process Management - Enabling The Business Drivers
 
How do you monitor your Basel III compliance?
How do you monitor your Basel III compliance? How do you monitor your Basel III compliance?
How do you monitor your Basel III compliance?
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Big Data - How to Get Started

  • 1. Primer On Getting Started With Big Data Projects Kurt Lueck January 2013
  • 2. Contact Presenter Kurt Lueck Managing Director, Business Intelligence & Analytics Email: Kurt.Lueck@pactera.com Desk: +1.704.944.3155 x240 6100 Fairview Road, Suite 560, Charlotte, NC 28210 Visit our website: www.pactera.com © Pactera. Confidential. All Rights Reserved. 2
  • 3. Pactera Snapshot  NASDAQ: Symbol PACT  Based in Charlotte NC & Beijing, China  35 Offices Globally / 24,000 Employees  Fortune 500 Clients (Financial Services, High Tech, Retail)  Focus on Driving Innovation (Big Data, Analytics, Mobility, Cloud Solutions) © Pactera. Confidential. All Rights Reserved. 3
  • 4. Global Footprint and Flexible Delivery Capabilities Pactera is a global company strategically headquartered in China, enabling partnership with companies seeking to leverage one of the world’s largest and fastest-growing technology markets. Global FTE: 24,000 North America & EU: 500 Asia Pacific: 1,000 Greater China: 22,500 London Seattle Changchun San Francisco Barcelona Beijing Dalian Tokyo Silicon Valley Charlotte Tianjin Qingdao Xi’an Nanjing Wuxi Osaka Atlanta Wuhan San Diego Shanghai Chengdu Changsha Hangzhou Guangzhou Taiwan Dongguan Hong Kong Shenzhen Malaysia Singapore Melbourne Sydney © Pactera. Confidential. All Rights Reserved. 4
  • 5. Primer on Big Data 1 Definitions 2 Drivers 3 Predictions 4 10 Steps to Starting Your Big Data project 5 5 Critical Mistakes 6 2 Practical Success Stories 7 Next Steps © Pactera. Confidential. All Rights Reserved. 5
  • 6. What Is Big Data? Volume Velocity Variety Big Data is high-volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making © Pactera. Confidential. All Rights Reserved. 6
  • 7. Driver - #1 Growth of Dark Data Leveraging dark data represents largest opportunity to transform business. © Pactera. Confidential. All Rights Reserved. 7
  • 8. Drivers - #2 Increasing Need to Process Data (Efficiently) Organizations must process increasing data, increasing types, and create real-time business decisions. © Pactera. Confidential. All Rights Reserved. 8
  • 9. Driver - #3 Explosion of Variety Explosion of unstructured data to be analyzed creates opportunities. © Pactera. Confidential. All Rights Reserved. 9
  • 10. Big Data Predictions Through 2014, 20% of enterprise warehouses will add distributed processes By 2015, 20% of Global 1000 organizations will have a strategic focus on information infrastructure equal to that of application management Beginning in 2015, the term ‘big data’ will no longer be a competitive differentiator for technology providers By 2015, big data demand will reach 4.4 million jobs globally but only one third of those jobs will be filled Source: Gartner © Pactera. Confidential. All Rights Reserved. 10
  • 11. What Exactly is hadoop? Hadoop Distributed File System (HDFS) MapReduce File Sharing & Data Distribute Computing Protection Across Across Physical Servers Physical Servers © Pactera. Confidential. All Rights Reserved. 11
  • 12. Getting Started – 9 Steps Identify Problem Develop Business Case Identify Resource Needs Evaluate /Select Hardware & Software Fund POC Create Small Solution Evaluate Solution Develop Long-Term Roadmap Perform Project © Pactera. Confidential. All Rights Reserved. 12
  • 13. Step 1: What’s Your Problem © Pactera. Confidential. All Rights Reserved. 13
  • 14. Step 2: Develop Business Case General Guidelines 1. Follow Traditional Business Case Steps 2. Engage Organization – This is Not an IT project Proposed 3. Engage Experts (You May Not Business Have Them Yet) Solution 4. Consider Team Carefully Business Case Proposed Technology Solution © Pactera. Confidential. All Rights Reserved. 14
  • 15. Step 3: Identify Resource Needs Potential Weaknesses: • Big Data Skills • Predictive Analytics • Data Scientist • Strong Business Analyst • Agile Methodology Business • Project Managers Expertise New Resources? Technology Expertise © Pactera. Confidential. All Rights Reserved. 15
  • 16. Step 4: Technical Architecture Mega-Vendors – Big Data – Vertical Industry © Pactera. Confidential. All Rights Reserved. 16
  • 17. Step 4: Technical Architecture Architectures • Move computing near to data • Online analysis & Offline analysis • Parallel ingestion/exchanges • SQL and NoSQL • Computing as well as storing Business Value • From statistic to explore & prediction • From period to near real time • From commercial to open source • From big data to big understanding © Pactera. Confidential. All Rights Reserved. 17
  • 18. Critical Mistakes Lack of Expertise Big Data is IT project without a problem Lack of technology alignment Lack of Long-Term Roadmap Lack of critical evaluation © Pactera. Confidential. All Rights Reserved. 18
  • 19. Story #1 – Travel Cloudera Style Collecting Data • Offline explorer, spiders • Web server log files and Web UI scripts • Data feed from tools, tealeaf, Omniture feed, etc • Data feed from external, such as facebook feed, etc • Upstream operational database Analyzing and Exploiting Data • Method, funnel analysis, shopping cart analysis, decision tree, etc • Tools, such as Omniture, Google analytics, SSAS, Unica, Weka, etc • Analytics of searching engine, such as SEO and SEM reporting Empower Business with Intelligence • Mini-batch • Near real time DW/DB • A/B and MVT Testing Originally, we implement Behavioral Search project intended to capture • Recommendation Engine customer behavior on line. It captures search parameters from the customers using Tealeaf and persists this data in Hadoop. From it, an • Finance projection analyst would be able to re-tell a story of what the customer searched for, what he/she saw, and what he/she did based on the response. • High margin comes from the lodging; • High degree of merchant hotels are sold in the Next, we polished new customer data mart including full roll out of 1st page of search result; individualization, customer segmentations, customer lifetime value calc, • Larger families tend to book passenger vans and quick lookup of customer purchase details for longer period instead of midsize cars © Pactera. Confidential. All Rights Reserved. 19
  • 20. Story #1 – Lessons Learned secs Data @ Nov. 2012 1800 Hive Impala 1556 1600 1400 1200 934 1000 800 667 600 431 425 400 224 240 151 200 37 49 86 4 0 One Day Query- One Month Query- Three Month Query Six Month Query One Year Query Two and half Year 21GB-24P 650GB-744P 1.7TB-2047P 2.9TB-2920P 3.8TB-2391P Query 5.8TB-3500+P • Hadoop Use Cases Moving to Real-Time • 71% - Move data from Hadoop to RDBMS for faster and interactive SQL • 67% - already query Hadoop using Hive • Impala – Real-Time SQL Queries engine for Hadoop, officially release in Q1, 2013 • Query results 4-30x faster than Hive • Support HQL and 100% open source © Pactera. Confidential. All Rights Reserved. 20
  • 21. Story #2 – Personalization With Big Data © Pactera. Confidential. All Rights Reserved. 21
  • 22. 2013 Pactera Focus Area 1 2 3 4 Putting Big Data Visual Performance Voice of Customer: Predict Your Future: To Work: Management Enabled: Large clients are still struggling Nobody can predict their future Data volumes are growing fast. Clients who desire to tie with what to do with the other but using advanced predictive Customers, partners, and now individual accountability to 85% of their data, which is analytics financial services even sensor-based systems are business value drivers can utilize unstructured. This unstructured organizations can apply science to generating data so quickly that BPM services to identify metrics data is made up of customer understanding fraudulent organizations across all industries and BI & Analytics technology to surveys, call center activity, customer buying need new technologies to stay enable the BPM Strategy. discussions, and most recently behavior, and manage risk etc. ahead. Organizations must analyze social media data. VOC strategies this data to understand and help companies manage and gain improve their business. value from this data. Example: Creating a Example: Embedding Big Data Solution to Example: Enabling BPM Example: Creating Predictive Analytics into through Visual Analytic Customer Buying analyze customer Risk Management Behavior Solutions relationship and Mgmt Dashboards solutions demand data © Pactera. Confidential. All Rights Reserved. 22
  • 23. Conclusions How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did © Pactera. Confidential. All Rights Reserved. 23
  • 24. Thank you Kurt Lueck Managing Director, Business Intelligence & Analytics Email: Kurt.Lueck@pactera.com Desk: +1.704.944.3155 x240 6100 Fairview Road, Suite 560, Charlotte, NC 28210 Visit our website: www.pactera.com © Pactera. Confidential. All Rights Reserved. 24

Notes de l'éditeur

  1. Kurt Lueckhas over 20 years of experience within the Business Intelligence and Analytics field. During his consulting career he has worked with over 40 different organizations in multiple industries on a variety of technologies. In his current role, Mr. Lueck manages the BI & Analytics practice for Pactera.    
  2. Good Afternoon and Good Morning on the west Coast.I appreciate everyone’s attendance and sincerely hope that you gather some very valuable information and insights from our presentation today. This is a very exciting topic.As promised, we have our 10 steps, 5 critical mistakes, and 2 success stories but I also wanted to start with some quick definitions, drivers, and key predictions .This presentation was built as a primer and we will be having several follow-up presentations in the coming weeks and months that dive deeper into industry (Financial services and Retail for example) and particular vendor solutions (for example: What is Oracle’s Big Data Solution).Lets get started.
  3. Ok, I feel obligated to start with the 3 V’s. The definition of Big Data has been a work in progress over the past few years. The established definition at this point always has the 3V’s somewhere in the mix. Most recently I have seen another V mentioned but first the traditional 3 V’s.Volume – This is probably the most mentioned. The shear volume of data has been the biggest driver. Velocity – As the saying goes Speed Kills. Social Media put the bullet in most traditional attempts for retail organizations. Other industries such as our Energy client are getting overwhelmed by Smart Grid initiatives. Each industry has their own issues from some new technology.Variety – If it was just traditional data then there probably would not be a neccesity for any of this discussion. However, the fact is we have all different type of data that are simply not handled correctly in the traditional Oracle/DB2/SQL databases. Sure they can store them but they cannot do anything with them very efficiently.What is the fourth V? Value
  4. There is a general consensus that there are 3 drivers of Big Data. The first is something called Dark Data – This is the data that we stored because we had to or wanted to store the data but never used. The thought was we better store it and at some point we might get some value out of it. We never did. This data volume has increased and increased.
  5. Driver #2 – Enterprise organizations have been slowly adding in data to their enterprise datawarehouses but unfortunately at too slow of a pace. Business cannot wait. The introduction on new technologies has just piled onto the problem.I was recently at a large bank and the CTO was explaining that the current state simply cannot continue. We cannot throw more people and more servers at this problem. We have to think about this differently. The shear cost of adding one more TERADATA platform was becoming too expensive. If you looked a the cost on a graph it is obvious it was out of control.
  6. The last driver is the variety of data. If the data was always a set of nice neat columns of text and numbers we would all be in better shape. XML a few years ago was meant to help take semi-unstructured data and put it into a nice relational format. Unfortunately, the ball is so far off the track now with the variety of data that is a NEW solution is required.Social Media is case in point. The information is completely unstructured but yet actually holds HUGE value.What about Audio and Video
  7. I am always doing research and thought these Predictions seem very relevant to our presentation today. These are straight from GartnerI won’t read all of these predictions but the bottom line is the BIG Data IS in a hype cycle….but it IS here to stay. I was recently at the TDWI Conference on BIG DATA and the group was reminded that there have been a number of terms and products that in the beginning were USED in front of every product…. WEB-ENABLED. This is the assumption today. BIG DATA is here to stay for a number of reasons.Enterprise clients MUST engage BIG DATA as a competitive advantage today and later as an equalizer.The last point that I want to drive home is the amount of jobs that will go unfullfilled in the big data arena. If you have any college age kids this is where you should push them. However, I believe it takes a very science oriented mind to really engage this profession.
  8. This presentation was built as a primer so I want to ensure that everyone gets some basic terms and some advanced as well.Lets talk about the basics of hadoop. They consist of two basic components. Hadoop and MapReduce.Hadoop is used generically to discuss the BIG DATA platform. More specifically Hadoop is a file system. It’s actually not a database but does share some commonality. HADOOP is to the starting block for any BIG DATA solution. See PictureMapReduce according to Wiki is is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers. See the Picture.
  9. Most It departments are simply feeling overwhelmed with the amount of data and the amount of pressure from the business to combine data to provide business insight. This can be an incredibly exciting opportunity for IT and business to work together.IF you can gain an understanding of What a Big Data solutions look like then and only then will you be able to determine how Big data can actually help.The chart on this page show IN general which areas are most positively impacted by Big data. Financial Services as usual is right up front on the overall volume of data, velocity of data. Media Services however has a highvariety of data.As an interesting side note, Pactera has worked with Microsoft to develop solutions that will read in videos and decipher them into textual …hence searchable output. This is just one of many example where EVERYTHING is becoming searchable. Pictures, Videos, blogs, and the traditional data Action Item: Look around your enterprise, and identify scenarios where combining and analyzingdiverse datasets will generate substantial business value.
  10. Ok, so this should be obvious. Reading that Big Data in CIO or CFO magazine does NOT mean that you need a BIG data solution. I was prepped before this call on the type of participants so I am fairly comfortable with this next statement. You all need to understand the types of Big Data solutions and more importantly you need to understand the business problems.Big Data Business Problems have the following characteristics:Combining Multiple Types of data togetherCombining LARGE volumes of dataRealistic ability to create value from projectLet me list some examples from in the Financial services industry just to drive the point home about realistic problems:FINANCE:Customer Service Context Aware sales and serviceSegmentationHollistic Customer ViewTrading:Algorithmic TradingSurvellianceRevenue GenerationNext Based OfferIn many cases, despite your significant size you do not have expertise on the Big Data solutions and will need to engage outside firms for at least the first project. WHICH BRINGS US TO THE NEXT SLIDE.
  11. The main items that I am worried about for companies is this role called a data scientist. I believe most organizations simply do not have any or enough.What are the key roles of a data scientist?To make a big data project or any analytics project succeed, you actually need a lot of skills. I think of it as a combination of functional skills and technical skills … Most people when they think of data scientists, they think of the technical side. And their minds immediately go to analytics, which is important, but it’s not the whole part of the story. To me it’s 2 Sides:Analytics & DesignSo on the analytics, it’s the things around statistics, operations research, computer science, machine learning in particular is important for data science … But then there’s technology in the sense of being able to understand systems, particularly large systems, because you need to store data all over the place in distributed form, and the ability to program -- to write code that acts as a glue to put all these pieces together. The second functional area is around Design:There’s also the design side of things, which is basically being able to create an interface to the data so people will find it usable, and there's the data side, which is data manipulation, data modeling, data cleansing. So if I got the numbers right, there should be kind of two functional skill sets and four technical skill sets. And all of those need to be combined to make a good data science project work. This is a LOT to ask of ONE person. I believe this set of skills comes from teams of individuals who work on projects together and use each others strengths.
  12. I love this picture because it really drives home the complexity of the marketplace. Ok, lets put some sanity to an insane marketplace of products.Hopefully this at least looks a little more reasonable visually. My recommendations are two-fold in the build-out of your solution.What is the problem attempting to be solved. If the problem is so complex that it requires a very specific solution then you may want to purchase a product that addresses that specific industry problem.IF possible I would prefer to recommend a big vendor such as the usual suspects (IBM - ORACLE /Cloudera) for your base platform of a BIG Data solution. I say that because this market is changing so fast and the vendors are popping up and in some cases down faster than a Whack A Mole game.
  13. Again, I put this presentation together as a primer. When I first began looking at this market one of my first questions was what does a BIG DATA solution look like for a technical or data architecture perspective. My other question was DOES Big Data replace my BI solution? This picture is taken from one of our current projects before we introduced Big Data. We worked with this client for several years to build out a world-class BI solution. Starting from the bottom of the diagram we have our typical source systems that feed into the datawarehouse. The Analytics layer for developing real business answers was usually second. This layer would usually have our predictive analytic models.The third layer was typically a BI Presentation layer of some form or another. Products such as SAP Business Objects, Cognos, QlikView, Microstrategy, Spotfire would typically play here.So what does Big data Look like (CLICK)Ok – As you saw from the number of vendors that we had presented this configuration could have a whole slew of different products. This is simply an example.For this particular client we have base Hadoop – Hive and HBASE in the Data Service LayerHadoop - Remember Hadoop is the distributed file system that facilitates the data among potentially thousands of nodes and in most cases involves thousands of terabytes.HBASE - HBase is a sub-project of the Apache Hadoop project and is used to provide real-time read and write access to your big data., the primary objective of Apache HBase is the hosting of very large tables (billions of rows X millions of columns) on top of clusters of commodity hardwareHIVE - Apache Hive (Hive) is a data warehouse system for the open sourceApache Hadoop project. Hive features a SQL-like HiveQL language that facilitates data analysis and summarization for large datasets stored in Hadoop-compatible file systems. Hive enables SQL geeks to continue to be productive.In the Analytics Layer we used Mahout. This particular product is used to perform our predictive analytic searches. What does Mahout do? Glad you asked.Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.Think of this like portions of SAS but in opensource form.
  14. This slide discussed the 5 Most Common mistakes that we are seeing within the marketplace.In no particular order:Lack of Expertise – I am actually not referring to the Hadoop or Java expertise that is required. If that was the case most projects never even get started. I am referring more to data scientist type of resources. There are projections from various “critical thinking” organizations such as Gartner who project a significant short-fall of data scientist. The truth is you may have to develop this internally. I would suggest you do that now. I also suggest looking at your universities and hiring graduates from Analytic programs.BIG Data projects without a problem. We are certainly in a hype cycle around Big Data. This is natural with any technology that can be a game changer. More than likely your company does have business problems that can be assisted with BIG Data solutions. The alignment between Savvy business users and technology enabled IT departments is still in the works. Lack of technology alignment – By this I am referring to the fact that it is very easy to begin purchasing point Big Data Solutions for one specific problem. Watch out. This same problem has been happenning for years with out HYPE cycles. Lets get a bit smarter on this CYCLE.This flows directly into my next CAUTION – Develop a longer-term roadmap. If you are going to start a BIG DATA project that means you will be purchase software and may be hiring resources. Before you start, it may be time for a short Big Data strategy. Understand what happens after the first project. I am absolutely in favor of starting with a POC and starting small. However, before large investments think through the 2yr plan. PACTERA’s Big Data Strategy is a quick engagement to review each major business group in an organization and look for detailed problems that may be solved by BIG DATA solutions. It’s a great engagement that has the outcome of a 1-2year plan for implementing BIG DATA>5) Lack of Critical Evaluation – I feel like this has been missing in most IT projects. At the end of the project, did we achieve the expected business goals. If the answer is no then lets figure out why and make improvements.
  15. I now want to present two business cases from real-life projects. The first project is for one of the largest on-line travel organizations in the world. Lets call them Acme OnLine Travel (AOL).Pactera has had a relationship with AOL for over 6years. We built the datawarehouse. We understand the business very well and frankly we understand the weaknesses of the BI solution. The volumes of data were so high and the cost to maintain was growing.The data sources for this client were everything from traditional ERP systems, Click Stream Data to Social Media such as Facebook. It’s not hard to see why the volumes were high. Petabytes is the norm.Part of the main driver for this project was to Reduce cost per TB from which was running at ten thousand USD. So a few years ago we suggested to AOL that we think a BIG Data solution is most likely necessary if we want to continue to be competitive in this industry. It started with some POC’s and then moved into BUILDING ONTO the current BI system at first. We are now beginning to see the natural death of some portions of the traditional BI system. I say natural death because our business users are simply not using some of the old methods. The most interesting and hard-hitting is the Preditive analytic functions that are being built on top of the base hadoop file system.One of the most recent changes is our moving to near REAL-TIME with a newer BIG Data product called Impala. Our team has been working with Impala for the past year or so even before it was officially released. This addresses one of the CRITICAL issues with Big Data and that is the lack of real-time capabilities.
  16. Lets talk about IMPALA for a moment. This graph show our own testing at this client with Petabytes of data. As you can see the performance is quite stark between Hive and Impala. If you know anything about traditional What is also interesting that I wanted to draw out is that DESPITE our success with BIG DATA at this client is that large number of people use HADOOP only to get data so that they can process in a traditional RDBMS. A lot of this is simply because people are more comfortable and end-user tools are more user-friendly on relational/traditional databases. Please keep in mind that when it says FASTER on that 3rd line it is referring to much smaller sets of data that we are placing into RDBMS.In conclusion, the solution provided FASTER , more intelligence insights and the cost is down toless than 2 thousand per TB in Hadoop from 10kUSD.
  17. The final case study that I want to present is around Retail. The picture that you are seeing is the goal of most major Retailers. The goal to drive a marketing and eventual sale down so personalized that it felt like they knew the customer on a one on one basis. Oh and by the way, not to cross the “Creep Factor” line. That is the line where the customer feels violated. This was the case with our client. Our client had a mix of the following types of data:Store POSWeb ClickStreamSocial MediaFinancialA BUNCH of spreadsheetsCustomer Satisfaction dataCall Center DataJust as in the last case study the volume of data was growing and the cost to manage it was growing even faster.The project started with a POC and has now reached into several departments. Examples of business problems / projects include:Customer buying behaviourPrice Optimization – as in changing prices on the web based on behaviourAnd Space planningAll of these projects were accomplished with a Theory, a Model, and A lot of testing. Eventually when good models were built and TESTED significantly then the models were embedded into the clients operational systems. What I have walked away from with these projects and research is how much phycology is required to be successful.This particular client is actually using BIG DATA solutions combined with SAS and several other traditional BI tools.
  18. BIG data is not the solution. The solution is some type of use of technology that enables business answers. The four bullets on here represent the 4 focus areas of our BI&Analytic practice in 2013. I believe BIG DATA is the foundation that many of these other solutions.
  19. I love this story because it is so hard hitting ….especially if you have daughters like I do.Most of you have heard the story so I won’t go into all of the details. The basic gist goes something like this.Target started a predictive analytics project that was so successful and accurate that it actually predicted that a Fathers daughter was pregnant before the father knew. Google the story to find the full story if you have not heard it.I wanted to end on this because we all have a corporate responsibility to use our technology without crossing the privacy line with our customers.