SlideShare une entreprise Scribd logo
1  sur  15
Emerging Technologies
      DIY Analytics
IBM Software for a Smarter Planet


       Emerging Technology - What Do We Do?


       Innovation/collaborations in technologies
       that we hope garner broad industry
       adoption in timeframe of 12 -18 months

       Our technology initiatives are refined based
       on the marketplace & evolution of web
       technologies

       Voice of the Customer – early & direct
       customer engagements (POCs) to iterate
       on both the technology and the business
       value




IBM Confidential                                Chart   2   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   3   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   4   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Evolving Emerging Technology Focus Areas


         Big Data Analytics for Business
         Professionals - DIY Analytic Tool &
         middleware - enabling massive amounts
         of data to be in analyzed for actionable
         insights

         Web Browser Application Platform -
         pushing the envelope of next
         generation RIA applications & tooling
         delivered with web browser reach &
         economics

         Mobile - next generation Enterprise-
         Consumer applications & architecture




IBM Confidential                                    Chart   5   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       New Intelligence




                      DIY Analytics
     Making Hadoop accessible
    to the business professionals




IBM Confidential                           Chart   6   © 2009 IBM Corporation
IBM Software for a Smarter Planet


       New Intelligence - New Class of Application On Horizon

        Hear business users asking for the
        ability to directly manipulate, analyze &
        remix massive data sources & services
        • LOB “… Google wetted my appetite...I
             want more customizable analytics with
             me in the drivers seat…”                               Rich
                                                                  Spectrum
                                                                 DIY Analytic
        Leveraging easy-to-use, rich data
        manipulation metaphors like                              Applications
        spreadsheets, etc..                                       Emerging


        Rich visualizations to quickly identify
        insights




IBM Confidential                                     Chart   7            © 2009 IBM Corporation
IBM Software for a Smarter Planet


       IBM Emerging Technology Project: BigSheets

        What is it?
        An insight engine for enabling ad-hoc business insights for
        business users - at web scale


        How does it work?
        Discovery Process
        1. point BigSheets to data sources of interests
           • unstructured web data, feeds, XML, etc..
        2. transform data into a form that can be analyzed
           • Unstructured data becomes semi-structured data
           • Example: name: Rod Smith, employer: IBM, state: GA
           • Apply analytics - enriching the data
        3. “what if tooling” - browser-based visual front end - spreadsheet
           metaphor to create worksheets for exploring/visualizing the big data



        What’s different?
        • Unlocking insights embedded in unstructured data
        • Analyzing data previously unavailable to analyze


IBM Confidential                                                  Chart   8       © 2009 IBM Corporation
IBM Software for a Smarter Planet


       BigSheets: Framework on Hadoop


      Expanding upon the Hadoop stack
      • Visual tooling builds extensively on Pig

      Big Sheets Architecture Characteristics:
      • Extensible via UDFs
      • REST API for customer choice of analytic service/
           engine
      •    REST APl for choice of visualization packages
      •    Export content as feeds, XML, etc..
      •    ...more to come




IBM Confidential                                           Chart   9   © 2009 IBM Corporation
IBM Software for a Smarter Planet


        BigSheets in action

                                                   Crowd sourcing - Nikon: what are folks on
                                                   twitter saying about our cameras - by model




[                      Input
    Gather Daily Tweets for May
    • 64 million tweets per day
    •   ~210 terabytes a month              ][
                                             •
                                             •
                                                            Map
                                                 Split data across cluster
                                                 Emit tweets mentioning Nikon
                                                 cameras (key=Nikon D90, …)     ][
                                                                                 •
                                                                                 •
                                                                                 •
                                                                                     model
                                                                                             Reduce

                                                                                     D90: 300 tweets
                                                                                     D3000: 68 tweets             ]
                                                                                     Aggregate tweets for each Nikon

                                                                                                                       •
                                                                                                                       •
                                                                                                                               Output
                                                                                                                       Perform sediment analysis
                                                                                                                       • “..Wow, Great, Incredible…”
                                                                                                                           “..Lousy, sucks, ... “
                                                                                                                           “..no RAW support...”




IBM Confidential                                                     Chart 10
                                                                            3                                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       A Demonstration of BigSheets in action

                                              Crowd sourcing - What do people want to buy?

                   What do people want to buy

                   • Gather

                   • Created an analysis model, using IBM Content      Analytics, looking for ʻbuy signalsʼ:

                    • Verb phrase indicating the desire to get something
                      • “I would really love a...”
                    • Buy Target (“I would really love to get myself a cool new phone”)
                    • Brand, Company, and opinion statements in the context of this buy statement

                    • Deployed the analysis model into BigSheets where it gets deployed across the Hadoop
                      cloud

                    ★In BigSheets each analysis model is considered a macro

                    • Visualize the results

IBM Confidential                                            Chart 11
                                                                   3                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Marketplace Application Example - British Library

                                                                               The Goal
                                                                               Can an ET technology project &
                      Web Archive Opportunity                                  IBM’s Classification Module (ICM)
                                                                               electronically classify & tag web
       Libraries & archives are interested in                                  content & enable/create
       collecting & preserving the web data                                    visualizations
       • British Library has opened the UK Web Archive
            portal for researchers & historians to explore
            preserved web content
       • Parliament nearing vote to give the British Library
            the nod to archive all .uk domain data, spanning 4
            million sites & ~128TB today.
            • Today, web page classification for the 5000 British
                   Library web sites is performed by 30 folks




                                                                               Web Content To Gather:
                                                                               • British Library gathered 1.48 TB of data - 4
                                                                                 web archive files comprising ~400,000 web
                                                                                 pages from 300 archived websites

                                                                               • 4 machines (dual core), HD 1TB, 8 GBs
                                                                                 RAM


IBM Confidential                                                    Chart 12                                           © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Marketplace Application Example: AmEx or IBM
                                                                   Business Questions
                                                                   • Ongoing tracking of acquisitions and
                                                                     associated IP
                                                                   • Visualizations, e.g. corporate
                                                                     genealogy




                                  Project:                         Knowledge of Interest:
                   Improve IP Portfolio Analysis for               •   Corporate genealogies
                       Mergers & Acquisitions                      •   IP ownership roll-up
                                                                   •   Patents ranked by citation
                                                                   •   Augment analysis with items affecting IP
                     “...please collect all US Patent                  value, inventor affiliation, citation rank by
                         filings… then let’s do…”                      time




                                                                   Web Content To Gather:
                                                                   •   SEC filings, e.g. annual and quarterly reports
                                                                   •   USPTO patents, assignments and trademarks
                                                                   •   Company press releases
                                                                   •   Other M&A, inventor information from
                                                                       feeds, webpages


IBM Confidential                                        Chart 13                                            © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Let’s Talk Customers: AmEx or IBM
                                             American Express:
                             Evaluating IP with large amounts of public and private data
     Gathered 1,400,000 U.S. Patents on record from
     2002 - 2009
                                                                          ★ 90 were cited/referenced of AMEX cited patents, 24
     •      The 1,400,000 cited/referenced another 6,100,000                cited 1 time thru one cited 67 times
            U.S. & International patents
                                                                          •   3600 cases from Court of Appeals, Federal Circuit,
     ★ Odd fact: a few patents cited/referenced as many as                    1993 - 2007 (Georgetown Law)
       13,870 other patents
                                                                          ★ 43 mentions of U.S. patents issued between 2002 -
     •      ~216 are AMEX patents                                          2009; relies on exact “Patent No. 9,999,999” match

                                                                          •   Productivity improvement from weeks to hours




IBM Confidential                                               Chart 14                                                © 2009 IBM Corporation
IBM Software for a Smarter Planet


       Conclusion


                        In God we trust
                   ...all others, bring data




IBM Confidential                           Chart 15   © 2009 IBM Corporation

Contenu connexe

Similaire à Disruptive Applications with Hadoop__HadoopSummit2010

IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM (Middle East and Africa)
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data AnalyticsCynthia Saracco
 
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...TIBCO Jaspersoft
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...Neo4j
 
An Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationAn Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationOpen Data Center Alliance
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopCCG
 
Application Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeApplication Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeJohn Head
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAvkash Chauhan
 
BI on Cloud Computing
BI on Cloud ComputingBI on Cloud Computing
BI on Cloud Computingtdwiindia
 
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...John Head
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Janine Sneed
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessTIBCO_Software
 
What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4Senturus
 
IBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxIBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxKamalKamalli1
 
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Vanguard Visions
 

Similaire à Disruptive Applications with Hadoop__HadoopSummit2010 (20)

IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
ITCamp 2011 - Adrian Stoian - System Center Configuration Manager 2012
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
Fundamentals of Ad Hoc Reporting: Create a beautiful report-building oasis fo...
 
A journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercialization
 
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
New Opportunities for Connected Data - Emil Eifrem @ GraphConnect Boston + Ch...
 
An Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud InnovationAn Enterprise Perspective on Cloud Innovation
An Enterprise Perspective on Cloud Innovation
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual Workshop
 
Enabling Ad Hoc Reporting
Enabling Ad Hoc ReportingEnabling Ad Hoc Reporting
Enabling Ad Hoc Reporting
 
Application Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile ConvergeApplication Modernization: Where Consumer, Social, and Mobile Converge
Application Modernization: Where Consumer, Social, and Mobile Converge
 
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo JapanAI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
AI Solutions with Macnica.ai - AI Expo 2018 Tokyo Japan
 
BI on Cloud Computing
BI on Cloud ComputingBI on Cloud Computing
BI on Cloud Computing
 
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
AD214 What's Next? Application Modernization Roadmap for Socializing IBM Note...
 
Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing Manage the Velocity of Change with Cloud Computing
Manage the Velocity of Change with Cloud Computing
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
AI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter BusinessAI Foundations: Simpler Technologies, Smarter Business
AI Foundations: Simpler Technologies, Smarter Business
 
What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4What’s New in Cognos Analytics 11.1.4
What’s New in Cognos Analytics 11.1.4
 
IBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptxIBM_Garage_client_deck.pptx
IBM_Garage_client_deck.pptx
 
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
Is your business NBN ready? – Developing a Digital Business Strategy: VELG Na...
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Disruptive Applications with Hadoop__HadoopSummit2010

  • 1. Emerging Technologies DIY Analytics
  • 2. IBM Software for a Smarter Planet Emerging Technology - What Do We Do? Innovation/collaborations in technologies that we hope garner broad industry adoption in timeframe of 12 -18 months Our technology initiatives are refined based on the marketplace & evolution of web technologies Voice of the Customer – early & direct customer engagements (POCs) to iterate on both the technology and the business value IBM Confidential Chart 2 © 2009 IBM Corporation
  • 3. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 3 © 2009 IBM Corporation
  • 4. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 4 © 2009 IBM Corporation
  • 5. IBM Software for a Smarter Planet Evolving Emerging Technology Focus Areas Big Data Analytics for Business Professionals - DIY Analytic Tool & middleware - enabling massive amounts of data to be in analyzed for actionable insights Web Browser Application Platform - pushing the envelope of next generation RIA applications & tooling delivered with web browser reach & economics Mobile - next generation Enterprise- Consumer applications & architecture IBM Confidential Chart 5 © 2009 IBM Corporation
  • 6. IBM Software for a Smarter Planet New Intelligence DIY Analytics Making Hadoop accessible to the business professionals IBM Confidential Chart 6 © 2009 IBM Corporation
  • 7. IBM Software for a Smarter Planet New Intelligence - New Class of Application On Horizon Hear business users asking for the ability to directly manipulate, analyze & remix massive data sources & services • LOB “… Google wetted my appetite...I want more customizable analytics with me in the drivers seat…” Rich Spectrum DIY Analytic Leveraging easy-to-use, rich data manipulation metaphors like Applications spreadsheets, etc.. Emerging Rich visualizations to quickly identify insights IBM Confidential Chart 7 © 2009 IBM Corporation
  • 8. IBM Software for a Smarter Planet IBM Emerging Technology Project: BigSheets What is it? An insight engine for enabling ad-hoc business insights for business users - at web scale How does it work? Discovery Process 1. point BigSheets to data sources of interests • unstructured web data, feeds, XML, etc.. 2. transform data into a form that can be analyzed • Unstructured data becomes semi-structured data • Example: name: Rod Smith, employer: IBM, state: GA • Apply analytics - enriching the data 3. “what if tooling” - browser-based visual front end - spreadsheet metaphor to create worksheets for exploring/visualizing the big data What’s different? • Unlocking insights embedded in unstructured data • Analyzing data previously unavailable to analyze IBM Confidential Chart 8 © 2009 IBM Corporation
  • 9. IBM Software for a Smarter Planet BigSheets: Framework on Hadoop Expanding upon the Hadoop stack • Visual tooling builds extensively on Pig Big Sheets Architecture Characteristics: • Extensible via UDFs • REST API for customer choice of analytic service/ engine • REST APl for choice of visualization packages • Export content as feeds, XML, etc.. • ...more to come IBM Confidential Chart 9 © 2009 IBM Corporation
  • 10. IBM Software for a Smarter Planet BigSheets in action Crowd sourcing - Nikon: what are folks on twitter saying about our cameras - by model [ Input Gather Daily Tweets for May • 64 million tweets per day • ~210 terabytes a month ][ • • Map Split data across cluster Emit tweets mentioning Nikon cameras (key=Nikon D90, …) ][ • • • model Reduce D90: 300 tweets D3000: 68 tweets ] Aggregate tweets for each Nikon • • Output Perform sediment analysis • “..Wow, Great, Incredible…” “..Lousy, sucks, ... “ “..no RAW support...” IBM Confidential Chart 10 3 © 2009 IBM Corporation
  • 11. IBM Software for a Smarter Planet A Demonstration of BigSheets in action Crowd sourcing - What do people want to buy? What do people want to buy • Gather • Created an analysis model, using IBM Content Analytics, looking for ʻbuy signalsʼ: • Verb phrase indicating the desire to get something • “I would really love a...” • Buy Target (“I would really love to get myself a cool new phone”) • Brand, Company, and opinion statements in the context of this buy statement • Deployed the analysis model into BigSheets where it gets deployed across the Hadoop cloud ★In BigSheets each analysis model is considered a macro • Visualize the results IBM Confidential Chart 11 3 © 2009 IBM Corporation
  • 12. IBM Software for a Smarter Planet Marketplace Application Example - British Library The Goal Can an ET technology project & Web Archive Opportunity IBM’s Classification Module (ICM) electronically classify & tag web Libraries & archives are interested in content & enable/create collecting & preserving the web data visualizations • British Library has opened the UK Web Archive portal for researchers & historians to explore preserved web content • Parliament nearing vote to give the British Library the nod to archive all .uk domain data, spanning 4 million sites & ~128TB today. • Today, web page classification for the 5000 British Library web sites is performed by 30 folks Web Content To Gather: • British Library gathered 1.48 TB of data - 4 web archive files comprising ~400,000 web pages from 300 archived websites • 4 machines (dual core), HD 1TB, 8 GBs RAM IBM Confidential Chart 12 © 2009 IBM Corporation
  • 13. IBM Software for a Smarter Planet Marketplace Application Example: AmEx or IBM Business Questions • Ongoing tracking of acquisitions and associated IP • Visualizations, e.g. corporate genealogy Project: Knowledge of Interest: Improve IP Portfolio Analysis for • Corporate genealogies Mergers & Acquisitions • IP ownership roll-up • Patents ranked by citation • Augment analysis with items affecting IP “...please collect all US Patent value, inventor affiliation, citation rank by filings… then let’s do…” time Web Content To Gather: • SEC filings, e.g. annual and quarterly reports • USPTO patents, assignments and trademarks • Company press releases • Other M&A, inventor information from feeds, webpages IBM Confidential Chart 13 © 2009 IBM Corporation
  • 14. IBM Software for a Smarter Planet Let’s Talk Customers: AmEx or IBM American Express: Evaluating IP with large amounts of public and private data Gathered 1,400,000 U.S. Patents on record from 2002 - 2009 ★ 90 were cited/referenced of AMEX cited patents, 24 • The 1,400,000 cited/referenced another 6,100,000 cited 1 time thru one cited 67 times U.S. & International patents • 3600 cases from Court of Appeals, Federal Circuit, ★ Odd fact: a few patents cited/referenced as many as 1993 - 2007 (Georgetown Law) 13,870 other patents ★ 43 mentions of U.S. patents issued between 2002 - • ~216 are AMEX patents 2009; relies on exact “Patent No. 9,999,999” match • Productivity improvement from weeks to hours IBM Confidential Chart 14 © 2009 IBM Corporation
  • 15. IBM Software for a Smarter Planet Conclusion In God we trust ...all others, bring data IBM Confidential Chart 15 © 2009 IBM Corporation