Contenu connexe


Similaire à Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong Value-Adding Proposition(20)

Plus de IT Network marcus evans(20)


Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong Value-Adding Proposition

  1. Australian CIO Summit 2014 28 – 30 July 2014 Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong Value-Adding Proposition Patrick Hadley Chief Information Officer Australian Bureau of Statistics
  2. Not Another ‘Big Data’ Presentation (‘V’ is not the only letter in the alphabet!)
  3. Or, to put it another way………
  4. The promise Big data is at the foundation of all the megatrends that are happening today, from social to mobile to the cloud to gaming. - Chris Lynch, ex Vertica CEO “Big Data is a tidal wave, which in the next decade will create consumer – and producer – value in almost every major sector of the economy” Philip Evans “….a tremendous wave of innovation, productivity and growth… all driven by big data” McKinsey “Big Data: A Revolution that Will Transform how We Live, Work, and Think” Viktor Mayer-Schönberger and Kenneth Cukier. 2013.
  5. Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... Dan Ariely, 2013 In God we trust; all others must bring data. W.E. Deming Or, the reality…….
  6. Agenda • What is Big Data (3/4/5/6 v’s) • Sources of Data • Data as an asset • Open Data • Opportunities…..applications…..benefits • Data Management • Data Analytics; technologies • Security • Privacy • Skills and capabilities • …… and on
  7. Agenda • What is Big Data (3/4/5/6 v’s) • Sources od Data • Data as an asset • Open Data • Opportunities…..applications…..benefits • Data Management • Data Analytics; technologies • Security • Privacy • Skills and capabilities • …… and on
  8. Today ……… • The use of Big Data in official statistics • ABS initiatives, experiences and capabilities • Learnings: Towards a strong value- adding proposition
  9. Big Data in Official Statistics The vision….. A richer, more dynamic statistical picture of Australia; Opportunity: reduce costs; improve quality
  10. Sources of Data • digital descriptions of the physical environment • sensors and other devices • communications networks • individual behaviour and information • digitisation of commerce and supply chains
  11. High potential data sources • Telecom • Utilities • Retailers • Financial sector • Satellite • Other
  12. Example: Telecom data applications • small area population estimates • service populations • travel patterns • seasonal population movements • event populations • internet use…… How do we ? o identify characteristics of handset owners? o turn handset counts into people
  13. Initiate exploratory R&D Targeted streams of investigation  Use of satellite imagery to determine land utilisation  Use of integrated demographic data for small area modelling of unemployment  Use of mobile device messaging records for real time estimation of service populations Progress the methodological framework and trial new technology approaches  Machine learning  Multidimensional data visualisation  Distributed computing  Open linked data
  14. Big Data challenges • Data quality • Data volatility and stability • Data representativeness • Data dimensionality • Statistical modelling and inference
  15. Data quality Big Data sets/streams are generally noisy and often unstructured – they need to undergo non-trivial filtering and cleaning process before they can be used Balancing the complexity of the cleaning process with the information value of the obtained results is significant issue What methods can be used for noise reduction? How do we deal with missing data?
  16. Data volatility and stability Streaming data may fluctuate over short time frames Data sources themselves may change or disappear What becomes of time series in a world where data streams and sources are transient?
  17. Data representativeness How representative are the data from emerging Big Data sources of the phenomena we are trying to measure? How do we determine whether there are hidden biases? What methods can be used to reduce the volume of data while retaining the information value of the data and statistical validity of the analysis?
  18. Data dimensionality Dimensionality is a significant and challenging aspect of “bigness” Dimension has an impact on  Storage of data  Processing and analysis of data Existing storage and computational paradigms fail badly
  19. Statistical modelling and inference How can population characteristics be determined?  What is the population? In many cases this is not known (e.g. Twitter)  Can we draw a sample and calculate descriptive statistics? How do we avoid apophenia?  Seeing meaningful patterns and connections where none exist  The number of fake correlations grows with the number of variables “To understand is to perceive patterns.” – Isaiah Berlin
  20. From ‘V’ (what) to ‘C’ (how) ‘What’ has changed about data? Vs: Volume, Velocity, Variety, Veracity, Volatility ‘How’ will we change? Cs: Creating, Computing, Comprehending, Competing, Collaboration
  21. Big Data ‘C’s and the ABS - CREATING The world is CREATING data like never before and every individual, household and business we interact with will change in data creation: • The Internet of Things (M2M) becomes the ‘Internet of Everything’ • Sometimes called the 4 internets: people, things, information, places are all network addressable, most have data producing/collecting/transmitting capability
  22. Big Data ‘C’s and the ABS - COMPUTING COMPUTING data like never before. Some examples: • emerged from Web-scale problems such as search engines with new solutions such as key-value databases (Hadoop, NOSQL DBs • advanced computation algorithms and approaches become ‘popularised’ e.g. machine learning approaches, automated visualisation and explanations systems, data mining/discovery, semantic (knowledge) representation and reasoning systems requiring ‘search’ • statistical analysis-as-a-service e.g. auto-coding, confidentiality, time series analysis, etc • distributed/parallel computation for low-cost multi-core, multi- socket, multi-computers, in-memory computation technologies • embedded processors, sensors/RFIDs/GPS/SIM • the ‘logical data warehouse’
  23. Big Data ‘C’s and the ABS - COMPREHENDING COMPREHENDING/CONSUMING data requiring new tools in the ABS kit bag: • tables – static and data consumer dynamically defined (ABS.stat, REEM Table Builder) in standard XML formats like SDMX • visualisation – for internal ABS insight, for our ‘retail’ dissemination, ‘smart’ insight where software suggests the best way to see data: ‘telling the story’ • narrative – table to text production (auto produce media release & part of main features): • voice – text to speech to read narrative & data for Accessibility speech to text for NIRS analysis • semantic data outputs in OWL/RDF • hybrid of above – to add value to information, for ABS data consumers to enhance comprehension • data streams – data-as-a-service for M2M (the ABS public Web services library) , could be called ‘the embedded ABS’ and all this with adaptive/responsive design for multiple end-points devices types!!!
  24. Big Data ‘C’s and the ABS - COMPETING COMPETING with data, to obtain it and use it for competitive advantage • In some subject-matter areas there is more competition. Who can make a statistical index ? Anyone with a spreadsheet; • Who else wants to be influential in and/or monetarise statistics? • Everyone else starts to understand INFONOMICS • More ‘agent’ data sources for ABS as we may not have a the capability to collect (full) unit record ‘big data’?
  25. Big Data ‘C’s and the ABS : COLLABORATING In ABS In Government In Academia Across the international statistical community
  26. ABS Capabilities, expertise • collect and process large quantities of data • data ‘cleansing’ • data standards and framework • data integration • methodological techniques • strong analytical capability • sophisticated web based dissemination system • data quality framework
  27. ABS Big Data Challenges Business Benefit Validity of Statistical Inference Privacy and Public Trust Data Integrity Data Ownership and Access Computational Efficacy Technology Infrastructure (Source: “Big data and the ABS – from ideas to action”, ABS MM paper, Oct 2013)
  28. Value explained?
  29. Summary - considerations • Value : • what’s the proposition • what’s the question • Strategy; plan, investments • Data sources & acquisition • Eyes open – data challenges • Build capabilities: V’s to C’s
  30. Questions?