SlideShare une entreprise Scribd logo
1  sur  12
•   Click to add text




Managing Uncertain Data at Scale
     Nikolay Marin




                                   © 2013 IBM Corporation
Managing Uncertain Data at Scale


Managing Uncertain Data at Scale




                                    By 2015, 80% of the world’s data will be uncertain
    Trend: Most of the
    world’s analyzed                Uncertain data management requires new techniques
    data will be uncertain          These techniques are necessary for real-world Big Data Analytics



    Opportunity:                    Robust, business-aware uncertain data management
    Business leadership
                                    Use analytics over uncertain web, sensor, and human-generated data
    using Big Data
    Analytics                       Enable good business decisions by understanding analysis
                                     confidence


    Challenge: Taking               Analysis of text is highly nuanced; sensor-based data is imprecise
    Big Data Analytics              Timely business decisions require efficient large-scale analytics
    into an uncertain
    world                           It is more difficult to obtain insight about an individual than a group,
                                     especially if the source data is uncertain


© 2013 3IBM Corporation                                                                                         2
Managing Uncertain Data at Scale


The fourth dimension of Big Data: Veracity – handling data in doubt



            Volume                                   Velocity          Variety               Veracity*




                                                                   Data in Many
       Data at Rest                           Data in Motion                             Data in Doubt
                                                                      Forms
       Terabytes to                             Streaming data,       Structured,         Uncertainty due to
    exabytes of existing                        milliseconds to    unstructured, text,    data inconsistency
      data to process                         seconds to respond      multimedia          & incompleteness,
                                                                                         ambiguities, latency,
                                                                                           deception, model
                                                                                            approximations

* Truthfulness, accuracy or precision, correctness


© 2013 3IBM Corporation                                                                                      3
Managing Uncertain Data at Scale


Uncertainty arises from many sources

    Process Uncertainty                  Data Uncertainty                   Model Uncertainty
            Processes contain            Data input is uncertain           All modeling is approximate
              “randomness”

                                    Intended                  Actual
                                    Spelling Text Entry      Spelling



                                                           ? ?
                                                            ?              Fitting a curve to data
        Uncertain travel times             GPS Uncertainty

                                                        ?            ?
                                       Testimony
                                                               ?
                                                         {Paris Airport}
                                              Ambiguity


                                                    {John Smith, Dallas}
         Semiconductor yield                       {John Smith, Kansas}    Forecasting a hurricane
                                   Contaminated?                              (www.noaa.gov)
                                     Rumors          Conflicting Data

© 2013 3IBM Corporation                                                                              4
Managing Uncertain Data at Scale


  By 2015, 80% of all available data will be uncertain


                                                                                                              By 2015 the number of networked devices will
                                                                                                               be double the entire global population. All
                                 9000
                                                                                                                      sensor data has uncertainty.
                                 8000 100
Global Data Volume in Exabytes




                                        90                                                     The total number of social media
                                 7000
                                                                                              accounts exceeds the entire global
                                             Aggregate Uncertainty %




                                        80                                                  population. This data is highly uncertain
                                 6000
                                                                                              in both its expression and content.
                                        70




                                                                                                                                                        s)
                                 5000




                                                                                                                                                of r s
                                                                                                                                                    in g
                                                                                                                                           rn nso
                                        60




                                                                                                                                                  Th
                                                                       Data quality solutions exist for




                                                                                                                                                e
                                 4000




                                                                                                                                        S
                                        50




                                                                                                                                             et
                                                                       enterprise data like customer,




                                                                                                                                        te
                                                                                                                                        (In
                                 3000   40                             product, and address data, but
                                                                         this is only a fraction of the                                           ia )
                                                                                                                                              M ed d text
                                 2000
                                        30                                   total enterprise data.                                      i a l an
                                                                                                                                   S ,oc audio
                                        20                                                                                            eo           P
                                 1000                                                                                            (vid          VoI
                                        10
                                    0                                                                                            Enterprise Data
                                                                        Multiple sources: IDC,Cisco
                                        2005                                                              2010                                       2015

© 2013 3IBM Corporation                                                                                                                                      5
Managing Uncertain Data at Scale


How to reduce uncertainty in processes, models, and data




Constructing context for better understanding
 Extract as much information as feasible from each source
 Combine (condense) data from multiple sources
 More data from more sources is better
   – Gathers more evidence for statistical methods

                                   Using statistical methods scaled for Big Data
                                    Stochastic techniques efficiently reason about uncertainty
                                    Monte Carlo techniques explore many possible scenarios
                                     in order to gain insight


Requires specific business process and industry context
© 2013 3IBM Corporation                                                                       6
Managing Uncertain Data at Scale


Statistical techniques reduce uncertainty in analytical models

                       Attributes
     Trouble tickets




                                                                                    Help agent find
                                                                                    similar tickets
                                               Use stochastic search
                                               to find trouble tickets
                                               that are similar



  Trouble ticket attributes                Model approximation                  Prediction

    Some attributes such as server type    Treat N attributes as N
     are precise                             dimensions in space                 Improve predictability by getting
    Other attributes such as words in      Model similarity as closeness in     agent feedback
     trouble tickets may be imprecise        the N dimensional space
     indicators of the problem



   Improve suggestions for similar problems using corroborating data and better mathematical techniques
   Analyze all the data – do not subset
   Use related techniques to automate Level 1 support, finding problem clusters, etc.

© 2013 3IBM Corporation                                                                                               7
Managing Uncertain Data at Scale


Analytics is broadly defined as the use of data and computation to make
smart decisions




                    Data                  Decision point          Possible outcomes


                                        Data instances
                  Historical                                                      1
                                                                              n
                                        Reports and queries on         Optio
                                         data aggregates
                                        Predictive models              Option 2
                                        Answers and confidence         Opt
                 Simulated                                                    ion
                                        Feedback and learning                    3


   Text      Video, Images     Audio




© 2013 3IBM Corporation                                                               8
Managing Uncertain Data at Scale


Future of Analytics




    Explosion of                    Creates new analytics opportunities
    unstructured data               Addresses new enterprise needs




    Consistent,
    extensible, and                 Reduces cost-to-value for enterprises
    consumable analytics            Increases analytics solution coverage with limited supply of skills
    platform



    Optimizing across               Analytics becomes a dominant IT workload and drives HW design
    the stack to deploy
                                    Opportunity to seamlessly scale from terascale to exascale
    analytics at scale



© 2013 3IBM Corporation                                                                                    9
Managing Uncertain Data at Scale


  Analytics toolkits will be expanded to support ingestion and interpretation of
  unstructured data, and enable adaptation and learning

                  Adaptive Analysis                                   Responding to context                                 Learn
                                                                                                                             In the context of
                  Continual Analysis                                  Responding to local change/feedback
New                                                                                                                          the decision
Methods           Optimization under Uncertainty                      Quantifying or mitigating risk                         process
                                                                                                                            Decide and Act
                  Optimization                                        Decision complexity, solution speed

                  Predictive Modeling                                 Causality, probabilistic, confidence levels

                  Simulation                                          High fidelity, games, data farming
                                                                                                                            Understand
                  Forecasting                                         Larger data sets, nonlinear regression                 and Predict
Tradi-
tional            Alerts                                              Rules/triggers, context sensitive, complex events

                  Query/Drill Down                                    In memory data, fuzzy search, geo spatial

                  Ad hoc Reporting                                    Query by example, user defined reports                Report
                  Standard Reporting                                  Real time, visualizations, user interaction

                  Entity Resolution                                   People, roles, locations, things
                                                                                                                            Collect and
New               Relationship, Feature Extraction                    Rules, semantic inferencing, matching                   Ingest/Interpret
Data                                                                                                                      Decide what to count;
                  Annotation and Tokenization                         Automated, crowd sourced
                                                                                                                          enable accurate counting

  Extended from: Competing on Analytics, Davenport and Harris, 2007
  © 2013 3IBM Corporation                                                                                                                    10
Managing Uncertain Data at Scale


Finally...what about a longer term view.... say the next 10-50 years?

1. Artificial Intelligence
2. Nano –“everything”
3. Cognitive Computing
4. Deep (Exascale) Computing
5. Automic & Quantum Computing
6. Human / Computer Interaction
7. Machine to Machine Interaction
8. BioTech / Human Augmentation
9. Robots & Robotics
10. Advanced / Predictive Analytics
11. Security & Privacy
12. 3-D Printing
13. Video-enabled Business Processes
14. Personalized Web/Assistants
15. Ubiquitous Computing
16. Gaming
17. Simulation
18. Virtual Computing (including virtual worlds, tele-presence, etc.)
19. Augmented Reality


IBM Academy of Technology and Global Technology Outlook can help you find some answers

© 2013 3IBM Corporation                                                                  11
Managing Uncertain Data at Scale




© 2013 3IBM Corporation

Contenu connexe

En vedette

Extent 2013 Obninsk LSE - The Focus Beyond Low Latency
Extent 2013 Obninsk  LSE - The Focus Beyond Low LatencyExtent 2013 Obninsk  LSE - The Focus Beyond Low Latency
Extent 2013 Obninsk LSE - The Focus Beyond Low Latencyextentconf Tsoy
 
Liquidity Fragmentation & SOR
Liquidity Fragmentation & SORLiquidity Fragmentation & SOR
Liquidity Fragmentation & SORIosif Itkin
 
Extent 2013 Obninsk High Performance Messaging
Extent 2013 Obninsk High Performance MessagingExtent 2013 Obninsk High Performance Messaging
Extent 2013 Obninsk High Performance Messagingextentconf Tsoy
 
Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011Iosif Itkin
 
EXTENT-2016: Trading Technology Trends and Innovation
EXTENT-2016: Trading Technology Trends and InnovationEXTENT-2016: Trading Technology Trends and Innovation
EXTENT-2016: Trading Technology Trends and InnovationIosif Itkin
 
EXTENT-2015: Blockchain New Frontiers
EXTENT-2015: Blockchain New FrontiersEXTENT-2015: Blockchain New Frontiers
EXTENT-2015: Blockchain New FrontiersIosif Itkin
 
EXTENT-2015: LSEG Technology Overview
EXTENT-2015: LSEG Technology Overview EXTENT-2015: LSEG Technology Overview
EXTENT-2015: LSEG Technology Overview Iosif Itkin
 
EXTENT-2015: A Test Harness for Algo Trading Systems
EXTENT-2015: A Test Harness for Algo Trading Systems EXTENT-2015: A Test Harness for Algo Trading Systems
EXTENT-2015: A Test Harness for Algo Trading Systems Iosif Itkin
 
EXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisEXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisIosif Itkin
 
EXTENT-2015 Tradecope Presentation
EXTENT-2015 Tradecope PresentationEXTENT-2015 Tradecope Presentation
EXTENT-2015 Tradecope PresentationIosif Itkin
 
EXTENT-2015: Big Button 2.0
EXTENT-2015: Big Button 2.0EXTENT-2015: Big Button 2.0
EXTENT-2015: Big Button 2.0Iosif Itkin
 
EXTENT-2015: Hyper-Fast Trading
EXTENT-2015: Hyper-Fast TradingEXTENT-2015: Hyper-Fast Trading
EXTENT-2015: Hyper-Fast TradingIosif Itkin
 
EXTENT-2016: Testing the Architecture
EXTENT-2016: Testing the ArchitectureEXTENT-2016: Testing the Architecture
EXTENT-2016: Testing the ArchitectureIosif Itkin
 
Extent3 prognoz practical_approach_lppl_model_2012
Extent3 prognoz practical_approach_lppl_model_2012Extent3 prognoz practical_approach_lppl_model_2012
Extent3 prognoz practical_approach_lppl_model_2012extentconf Tsoy
 
EXTENT-2015: Millennium Surveillance™ – Achieving Excellence
EXTENT-2015: Millennium Surveillance™ –  Achieving ExcellenceEXTENT-2015: Millennium Surveillance™ –  Achieving Excellence
EXTENT-2015: Millennium Surveillance™ – Achieving ExcellenceIosif Itkin
 
EXTENT-2015: Reconciliation Testing Aspects
EXTENT-2015: Reconciliation Testing AspectsEXTENT-2015: Reconciliation Testing Aspects
EXTENT-2015: Reconciliation Testing AspectsIosif Itkin
 

En vedette (16)

Extent 2013 Obninsk LSE - The Focus Beyond Low Latency
Extent 2013 Obninsk  LSE - The Focus Beyond Low LatencyExtent 2013 Obninsk  LSE - The Focus Beyond Low Latency
Extent 2013 Obninsk LSE - The Focus Beyond Low Latency
 
Liquidity Fragmentation & SOR
Liquidity Fragmentation & SORLiquidity Fragmentation & SOR
Liquidity Fragmentation & SOR
 
Extent 2013 Obninsk High Performance Messaging
Extent 2013 Obninsk High Performance MessagingExtent 2013 Obninsk High Performance Messaging
Extent 2013 Obninsk High Performance Messaging
 
Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011
 
EXTENT-2016: Trading Technology Trends and Innovation
EXTENT-2016: Trading Technology Trends and InnovationEXTENT-2016: Trading Technology Trends and Innovation
EXTENT-2016: Trading Technology Trends and Innovation
 
EXTENT-2015: Blockchain New Frontiers
EXTENT-2015: Blockchain New FrontiersEXTENT-2015: Blockchain New Frontiers
EXTENT-2015: Blockchain New Frontiers
 
EXTENT-2015: LSEG Technology Overview
EXTENT-2015: LSEG Technology Overview EXTENT-2015: LSEG Technology Overview
EXTENT-2015: LSEG Technology Overview
 
EXTENT-2015: A Test Harness for Algo Trading Systems
EXTENT-2015: A Test Harness for Algo Trading Systems EXTENT-2015: A Test Harness for Algo Trading Systems
EXTENT-2015: A Test Harness for Algo Trading Systems
 
EXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisEXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program Analysis
 
EXTENT-2015 Tradecope Presentation
EXTENT-2015 Tradecope PresentationEXTENT-2015 Tradecope Presentation
EXTENT-2015 Tradecope Presentation
 
EXTENT-2015: Big Button 2.0
EXTENT-2015: Big Button 2.0EXTENT-2015: Big Button 2.0
EXTENT-2015: Big Button 2.0
 
EXTENT-2015: Hyper-Fast Trading
EXTENT-2015: Hyper-Fast TradingEXTENT-2015: Hyper-Fast Trading
EXTENT-2015: Hyper-Fast Trading
 
EXTENT-2016: Testing the Architecture
EXTENT-2016: Testing the ArchitectureEXTENT-2016: Testing the Architecture
EXTENT-2016: Testing the Architecture
 
Extent3 prognoz practical_approach_lppl_model_2012
Extent3 prognoz practical_approach_lppl_model_2012Extent3 prognoz practical_approach_lppl_model_2012
Extent3 prognoz practical_approach_lppl_model_2012
 
EXTENT-2015: Millennium Surveillance™ – Achieving Excellence
EXTENT-2015: Millennium Surveillance™ –  Achieving ExcellenceEXTENT-2015: Millennium Surveillance™ –  Achieving Excellence
EXTENT-2015: Millennium Surveillance™ – Achieving Excellence
 
EXTENT-2015: Reconciliation Testing Aspects
EXTENT-2015: Reconciliation Testing AspectsEXTENT-2015: Reconciliation Testing Aspects
EXTENT-2015: Reconciliation Testing Aspects
 

Similaire à Extent 2013 Obninsk Managing Uncertain Data at Scale

Debs 2012 uncertainty tutorial
Debs 2012 uncertainty tutorialDebs 2012 uncertainty tutorial
Debs 2012 uncertainty tutorialOpher Etzion
 
Icss 20130411 v2
Icss 20130411 v2Icss 20130411 v2
Icss 20130411 v2ISSIP
 
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataDr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataGlobal Business Events
 
Smarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldSmarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldKim Escherich
 
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012James Mailley
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurancestuartdrose
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryAmazon Web Services
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupIBMInfoSphereUGFR
 
Big Data at #WADAY11
Big Data at #WADAY11 Big Data at #WADAY11
Big Data at #WADAY11 Cosimo Accoto
 
Oasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialOasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialJamie Clark
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation ITPaul Muller
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Foviance
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietySURFnet
 
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019 Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019 Amazon Web Services
 
Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009Paritosh Sharma
 

Similaire à Extent 2013 Obninsk Managing Uncertain Data at Scale (20)

Debs 2012 uncertainty tutorial
Debs 2012 uncertainty tutorialDebs 2012 uncertainty tutorial
Debs 2012 uncertainty tutorial
 
Icss 20130411 v2
Icss 20130411 v2Icss 20130411 v2
Icss 20130411 v2
 
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big DataDr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
Smarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our worldSmarter Planet: How Big Data changes our world
Smarter Planet: How Big Data changes our world
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
Big Data and Mobile Recruitment - Irish Recruiters Conf Dec 2012
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
Big Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend StoryBig Data and the Cloud a Best Friend Story
Big Data and the Cloud a Best Friend Story
 
InfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroupInfoSphere streams_technical_overview_infospherusergroup
InfoSphere streams_technical_overview_infospherusergroup
 
Big Data at #WADAY11
Big Data at #WADAY11 Big Data at #WADAY11
Big Data at #WADAY11
 
Oasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficialOasis cloud-law-ics-unofficial
Oasis cloud-law-ics-unofficial
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation IT
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
 
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big SocietyPresentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
Presentatie Big Data Forum 22 januari 2013 - Big Data en Big Society
 
Big Data and Cloud Analytics
Big Data and Cloud AnalyticsBig Data and Cloud Analytics
Big Data and Cloud Analytics
 
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019 Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
Guarding the guardian’s guard: IBM Trusteer - SEP326 - AWS re:Inforce 2019
 
Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009Airtel Represented at The Mobile VAS SUMMIT 2009
Airtel Represented at The Mobile VAS SUMMIT 2009
 

Plus de extentconf Tsoy

Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...
Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...
Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...extentconf Tsoy
 
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFTExtent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFTextentconf Tsoy
 
Extent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-tradingExtent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-tradingextentconf Tsoy
 
Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012extentconf Tsoy
 
Extent3 witology prediction_markets_2012
Extent3 witology prediction_markets_2012Extent3 witology prediction_markets_2012
Extent3 witology prediction_markets_2012extentconf Tsoy
 
Extent3 exactpro the_next_step_in_reconciliation_testing
Extent3 exactpro the_next_step_in_reconciliation_testingExtent3 exactpro the_next_step_in_reconciliation_testing
Extent3 exactpro the_next_step_in_reconciliation_testingextentconf Tsoy
 
Verification of Financial Models
Verification of Financial ModelsVerification of Financial Models
Verification of Financial Modelsextentconf Tsoy
 
The Simple Matter of Project Management
The Simple Matter of Project ManagementThe Simple Matter of Project Management
The Simple Matter of Project Managementextentconf Tsoy
 
Exchange Simulators for SOR / Algo Testing: Advantages vs. Shortcomings
Exchange Simulators for SOR / Algo Testing: Advantages vs. ShortcomingsExchange Simulators for SOR / Algo Testing: Advantages vs. Shortcomings
Exchange Simulators for SOR / Algo Testing: Advantages vs. Shortcomingsextentconf Tsoy
 
Behavior Driven Development Pros and Cons
Behavior Driven Development Pros and ConsBehavior Driven Development Pros and Cons
Behavior Driven Development Pros and Consextentconf Tsoy
 
Virtualization Technology for Test Automation
Virtualization Technology for Test AutomationVirtualization Technology for Test Automation
Virtualization Technology for Test Automationextentconf Tsoy
 
Cost of Quality How to Save Money
Cost of Quality How to Save MoneyCost of Quality How to Save Money
Cost of Quality How to Save Moneyextentconf Tsoy
 
Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011extentconf Tsoy
 
Technical Testing Introduction
Technical Testing IntroductionTechnical Testing Introduction
Technical Testing Introductionextentconf Tsoy
 
Financial Instruments EXTENT February 2011
Financial Instruments EXTENT February 2011Financial Instruments EXTENT February 2011
Financial Instruments EXTENT February 2011extentconf Tsoy
 
Liquidity Fragmentation & SOR
Liquidity Fragmentation & SORLiquidity Fragmentation & SOR
Liquidity Fragmentation & SORextentconf Tsoy
 

Plus de extentconf Tsoy (16)

Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...
Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...
Extent 2013 Obninsk How a Great QA Team Can Make a Disproportionate Contribut...
 
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFTExtent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
Extent 2013 Obninsk Cross-Asset Portfolio Margin Risk Calculation for HFT
 
Extent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-tradingExtent april2012-kostroma social-networks-socialmedia-trading
Extent april2012-kostroma social-networks-socialmedia-trading
 
Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012Extent3 turquoise equity_trading_2012
Extent3 turquoise equity_trading_2012
 
Extent3 witology prediction_markets_2012
Extent3 witology prediction_markets_2012Extent3 witology prediction_markets_2012
Extent3 witology prediction_markets_2012
 
Extent3 exactpro the_next_step_in_reconciliation_testing
Extent3 exactpro the_next_step_in_reconciliation_testingExtent3 exactpro the_next_step_in_reconciliation_testing
Extent3 exactpro the_next_step_in_reconciliation_testing
 
Verification of Financial Models
Verification of Financial ModelsVerification of Financial Models
Verification of Financial Models
 
The Simple Matter of Project Management
The Simple Matter of Project ManagementThe Simple Matter of Project Management
The Simple Matter of Project Management
 
Exchange Simulators for SOR / Algo Testing: Advantages vs. Shortcomings
Exchange Simulators for SOR / Algo Testing: Advantages vs. ShortcomingsExchange Simulators for SOR / Algo Testing: Advantages vs. Shortcomings
Exchange Simulators for SOR / Algo Testing: Advantages vs. Shortcomings
 
Behavior Driven Development Pros and Cons
Behavior Driven Development Pros and ConsBehavior Driven Development Pros and Cons
Behavior Driven Development Pros and Cons
 
Virtualization Technology for Test Automation
Virtualization Technology for Test AutomationVirtualization Technology for Test Automation
Virtualization Technology for Test Automation
 
Cost of Quality How to Save Money
Cost of Quality How to Save MoneyCost of Quality How to Save Money
Cost of Quality How to Save Money
 
Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011Exactpro Test Tools EXTENT Feb 2011
Exactpro Test Tools EXTENT Feb 2011
 
Technical Testing Introduction
Technical Testing IntroductionTechnical Testing Introduction
Technical Testing Introduction
 
Financial Instruments EXTENT February 2011
Financial Instruments EXTENT February 2011Financial Instruments EXTENT February 2011
Financial Instruments EXTENT February 2011
 
Liquidity Fragmentation & SOR
Liquidity Fragmentation & SORLiquidity Fragmentation & SOR
Liquidity Fragmentation & SOR
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Extent 2013 Obninsk Managing Uncertain Data at Scale

  • 1. Click to add text Managing Uncertain Data at Scale Nikolay Marin © 2013 IBM Corporation
  • 2. Managing Uncertain Data at Scale Managing Uncertain Data at Scale  By 2015, 80% of the world’s data will be uncertain Trend: Most of the world’s analyzed  Uncertain data management requires new techniques data will be uncertain  These techniques are necessary for real-world Big Data Analytics Opportunity:  Robust, business-aware uncertain data management Business leadership  Use analytics over uncertain web, sensor, and human-generated data using Big Data Analytics  Enable good business decisions by understanding analysis confidence Challenge: Taking  Analysis of text is highly nuanced; sensor-based data is imprecise Big Data Analytics  Timely business decisions require efficient large-scale analytics into an uncertain world  It is more difficult to obtain insight about an individual than a group, especially if the source data is uncertain © 2013 3IBM Corporation 2
  • 3. Managing Uncertain Data at Scale The fourth dimension of Big Data: Veracity – handling data in doubt Volume Velocity Variety Veracity* Data in Many Data at Rest Data in Motion Data in Doubt Forms Terabytes to Streaming data, Structured, Uncertainty due to exabytes of existing milliseconds to unstructured, text, data inconsistency data to process seconds to respond multimedia & incompleteness, ambiguities, latency, deception, model approximations * Truthfulness, accuracy or precision, correctness © 2013 3IBM Corporation 3
  • 4. Managing Uncertain Data at Scale Uncertainty arises from many sources Process Uncertainty Data Uncertainty Model Uncertainty Processes contain Data input is uncertain All modeling is approximate “randomness” Intended Actual Spelling Text Entry Spelling ? ? ? Fitting a curve to data Uncertain travel times GPS Uncertainty ? ? Testimony ? {Paris Airport} Ambiguity {John Smith, Dallas} Semiconductor yield {John Smith, Kansas} Forecasting a hurricane Contaminated? (www.noaa.gov) Rumors Conflicting Data © 2013 3IBM Corporation 4
  • 5. Managing Uncertain Data at Scale By 2015, 80% of all available data will be uncertain By 2015 the number of networked devices will be double the entire global population. All 9000 sensor data has uncertainty. 8000 100 Global Data Volume in Exabytes 90 The total number of social media 7000 accounts exceeds the entire global Aggregate Uncertainty % 80 population. This data is highly uncertain 6000 in both its expression and content. 70 s) 5000 of r s in g rn nso 60 Th Data quality solutions exist for e 4000 S 50 et enterprise data like customer, te (In 3000 40 product, and address data, but this is only a fraction of the ia ) M ed d text 2000 30 total enterprise data. i a l an S ,oc audio 20 eo P 1000 (vid VoI 10 0 Enterprise Data Multiple sources: IDC,Cisco 2005 2010 2015 © 2013 3IBM Corporation 5
  • 6. Managing Uncertain Data at Scale How to reduce uncertainty in processes, models, and data Constructing context for better understanding  Extract as much information as feasible from each source  Combine (condense) data from multiple sources  More data from more sources is better – Gathers more evidence for statistical methods Using statistical methods scaled for Big Data  Stochastic techniques efficiently reason about uncertainty  Monte Carlo techniques explore many possible scenarios in order to gain insight Requires specific business process and industry context © 2013 3IBM Corporation 6
  • 7. Managing Uncertain Data at Scale Statistical techniques reduce uncertainty in analytical models Attributes Trouble tickets Help agent find similar tickets Use stochastic search to find trouble tickets that are similar Trouble ticket attributes Model approximation Prediction  Some attributes such as server type  Treat N attributes as N are precise dimensions in space  Improve predictability by getting  Other attributes such as words in  Model similarity as closeness in agent feedback trouble tickets may be imprecise the N dimensional space indicators of the problem  Improve suggestions for similar problems using corroborating data and better mathematical techniques  Analyze all the data – do not subset  Use related techniques to automate Level 1 support, finding problem clusters, etc. © 2013 3IBM Corporation 7
  • 8. Managing Uncertain Data at Scale Analytics is broadly defined as the use of data and computation to make smart decisions Data Decision point Possible outcomes  Data instances Historical 1 n  Reports and queries on Optio data aggregates  Predictive models Option 2  Answers and confidence Opt Simulated ion  Feedback and learning 3 Text Video, Images Audio © 2013 3IBM Corporation 8
  • 9. Managing Uncertain Data at Scale Future of Analytics Explosion of  Creates new analytics opportunities unstructured data  Addresses new enterprise needs Consistent, extensible, and  Reduces cost-to-value for enterprises consumable analytics  Increases analytics solution coverage with limited supply of skills platform Optimizing across  Analytics becomes a dominant IT workload and drives HW design the stack to deploy  Opportunity to seamlessly scale from terascale to exascale analytics at scale © 2013 3IBM Corporation 9
  • 10. Managing Uncertain Data at Scale Analytics toolkits will be expanded to support ingestion and interpretation of unstructured data, and enable adaptation and learning Adaptive Analysis Responding to context  Learn In the context of Continual Analysis Responding to local change/feedback New the decision Methods Optimization under Uncertainty Quantifying or mitigating risk process  Decide and Act Optimization Decision complexity, solution speed Predictive Modeling Causality, probabilistic, confidence levels Simulation High fidelity, games, data farming  Understand Forecasting Larger data sets, nonlinear regression and Predict Tradi- tional Alerts Rules/triggers, context sensitive, complex events Query/Drill Down In memory data, fuzzy search, geo spatial Ad hoc Reporting Query by example, user defined reports  Report Standard Reporting Real time, visualizations, user interaction Entity Resolution People, roles, locations, things  Collect and New Relationship, Feature Extraction Rules, semantic inferencing, matching Ingest/Interpret Data Decide what to count; Annotation and Tokenization Automated, crowd sourced enable accurate counting Extended from: Competing on Analytics, Davenport and Harris, 2007 © 2013 3IBM Corporation 10
  • 11. Managing Uncertain Data at Scale Finally...what about a longer term view.... say the next 10-50 years? 1. Artificial Intelligence 2. Nano –“everything” 3. Cognitive Computing 4. Deep (Exascale) Computing 5. Automic & Quantum Computing 6. Human / Computer Interaction 7. Machine to Machine Interaction 8. BioTech / Human Augmentation 9. Robots & Robotics 10. Advanced / Predictive Analytics 11. Security & Privacy 12. 3-D Printing 13. Video-enabled Business Processes 14. Personalized Web/Assistants 15. Ubiquitous Computing 16. Gaming 17. Simulation 18. Virtual Computing (including virtual worlds, tele-presence, etc.) 19. Augmented Reality IBM Academy of Technology and Global Technology Outlook can help you find some answers © 2013 3IBM Corporation 11
  • 12. Managing Uncertain Data at Scale © 2013 3IBM Corporation