Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Big data 2017 final

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Big data ppt
Big data ppt
Chargement dans…3
×

Consultez-les par la suite

1 sur 110 Publicité

Big data 2017 final

Télécharger pour lire hors ligne

COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,

COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,

Publicité
Publicité

Plus De Contenu Connexe

Publicité

Plus récents (20)

Big data 2017 final

  1. 1. WWW.TIC.OM INNOVATIVE & LEADING EDGE IT SOLUTIONS
  2. 2. WWW.TIC.OM INNOVATIVE & LEADING EDGE IT SOLUTIONS It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle
  3. 3. WWW.TIC.OM
  4. 4. WWW.TIC.OM Big Data and Storage Smart Talks Amjid Ali Head of Business - TIC
  5. 5. WWW.TIC.OM It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle
  6. 6. WWW.TIC.OM
  7. 7. WWW.TIC.OM Agenda Big Data and Storage • Introduction • Data Generations / Timeline • Why big data? – Users vs Devices and IoTs • Practical Benefits • Big data defined. • Landscape • Storage • Next Generation Storage
  8. 8. WWW.TIC.OM ●Every object on the earth will be generating data. ●Digital format of Information ●Quick search through tons of information. ●We are exposed to vast ocean of data. ●What we buy, where we go, what we say, what we do is all been recorded forever. HUMAN FACE OF BIG DATA
  9. 9. WWW.TIC.OM ●Buzz word since 2012 ●Data, small data, big data. ●Exceed the processing capacity of conventional data ●All data is not being analyzed. INTRODUCTION
  10. 10. WWW.TIC.OM • Data is “data” what is big? • Cannot be analyzed using traditional computing techniques. • Storage • Processing • Visualization INTRODUCTION - BIG DATA
  11. 11. WWW.TIC.OM INTRODUCTION – BIG DATA
  12. 12. WWW.TIC.OM • Relevant to more and more organizations. • New field of applications. • Large volume and generate automatically and continuedly. • Various data sources • Limitations for analyzing • Complexity and speed limitations INTRODUCTION - BIG DATA
  13. 13. WWW.TIC.OM TIMELINE BIG DATA
  14. 14. WWW.TIC.OM BIG DATA TIMELINE “information explosion” (a term first used in 1941, according to the Oxford English Dictionary). 2030 – to start all the data generated 6X size of greater London data center will be required.
  15. 15. WWW.TIC.OM BIG DATA TIMELINE Over load Census Punch Cards Accounting Machine Library Rate of Transmissi on Storage Capacity Predict Big Data Visualizatio n 1999199619901971196719441927191018901880
  16. 16. WWW.TIC.OM BIG DATA TIMELINE Everyone Produces Data 3V Hadoop and Map Reduce Social Media and Web 2.0 Big Data Projects 5 Exabyte Till Now vs Two Years Big Data Buzz word Data Scientists Genome Decoding Google Largest Big Data 2015201420132012201020092006200520022000
  17. 17. WWW.TIC.OM BIG DATA TIMELINE Iot and Big Data Revolution 20172016 • Year of big data Revolution • Big data becomes fast and approachable • Artificial Intelligence and Augmented Intelligence annual growth 34% • Big data (scientists, engineers and analyst) most demanding jobs • 100 times better performance computers • GPU and HPC • Hadoop , Hive, Presto, Impala and Spark • Hadoop and enterprise standards. • In-Memory Computing - in-memory data grids (IMDGs) • IoT will grow up further • Machine learning and Operational Intelligence • Many big data ideas • Business Intelligence • Cloud – Big data as a service • Spark • Convergence of IoT, cloud, and big data create new opportunities for self-service analytics • DNA Storage
  18. 18. WWW.TIC.OM Why big data? Some Facts
  19. 19. WWW.TIC.OM 2.12.1 2.12.4 2.13.2 2.13.7 2012 2014 2015 2016 2017 2.13.4 In Billions GLOBAL INTERNET POPULATION
  20. 20. WWW.TIC.OM World Regions Population ( 2017 Est.) Population % of World Internet Users 31 Dec 2016 Penetration Rate (% Pop.) Growth 2000-2017 Table % Users Asia 4,148,177,672 55.2 % 1,856,212,654 44.7 % 1,523.9% 50.2 % Europe 822,710,362 10.9 % 630,708,269 76.7 % 500.1% 17.1 % Latin America / Caribbean 647,604,645 8.6 % 384,766,521 59.4 % 2,029.4% 10.4 % Africa 1,246,504,865 16.6 % 335,453,374 26.9 % 7,330.7% 9.1 % North America 363,224,006 4.8 % 320,067,193 88.1 % 196.1% 8.7 % Middle East 250,327,574 3.3 % 141,489,765 56.5 % 4,207.4% 3.8 % Oceania / Australia 40,479,846 0.6 % 27,540,654 68.0 % 261.4% 0.7 % WORLD TOTAL 7,519,028,970 100.0 % 3,696,238,430 49.2 % 923.9% 100.0 % WORLD INTERNET USAGE AND POPULATION STATISTICS MARCH 4, 2017 - Update
  21. 21. WWW.TIC.OM 2017 Big Data Facts
  22. 22. WWW.TIC.OM CURRENT Big Data Facts
  23. 23. WWW.TIC.OM Internet Minute • 701,389 logins on Facebook • 69,444 hours watched on Netflix • 150 million emails sent • 1,389 Uber rides • 527,760 photos shared on Snapchat • 51,000 app downloads on Apple’s App Store • $203,596 in sales on Amazon.com • 120+ new Linkedin accounts • 347,222 tweets on Twitter • 28,194 new posts to Instagram • 38,052 hours of music listened to on Spotify • 1.04 million vine loops • 2.4 million search queries on Google • 972,222 Tinder swipes • 2.78 million video views on Youtube • 20.8 million messages on WhatsApp
  24. 24. WWW.TIC.OM Human and Devices
  25. 25. WWW.TIC.OM IoTs (Sensors and controls)
  26. 26. WWW.TIC.OM WHAT GENERATES THE DATA
  27. 27. WWW.TIC.OM WHAT GENERATES THE DATA
  28. 28. WWW.TIC.OM The Earthscope • world's largest science project • 67 terabytes of data.
  29. 29. WWW.TIC.OM Maximilien Brice, © CERN CERN’s Large Hydron Collider (LHC) generates 15 PB a year LHC
  30. 30. WWW.TIC.OM A whopping 90% of the data that currently exists was created in just the last two years Why big? 3.7 Billion People, 25 Billion Sensors, Devices connected.
  31. 31. WWW.TIC.OM BIG DATA The 3 V's - the data Volume, Variety and Velocity- create challenges
  32. 32. WWW.TIC.OM
  33. 33. WWW.TIC.OM 5 exabyte of data every 2 days 2020 – Big data and analytics market will reach $ 202b
  34. 34. WWW.TIC.OM PRACTICAL BENEFITS BIG DATA IMPLEMENTATIONS
  35. 35. WWW.TIC.OM PRACTICAL BENFITS BIG DATA 1. Dialogue with consumers 2. Re-develop your products 3. Perform risk analysis 4. Keeping your data safe 5. Create new revenue streams 6. Customize your website in real time 7. Reducing maintenance costs 8. Offering tailored healthcare 9. Offering enterprise-wide insights 10. Making our cities smarter
  36. 36. WWW.TIC.OM PRODUCT FACTOR In addition to capital, commodities and labor force data are the fourth production factors of the digital economy. DATA STRUCTURE The most unstructured databases in business can be structured for analysis. RANGE OPTIMIZATION In particular, areas such as development, sales, production, organization and management are appointed for Big Data. IN THE COMPANY Why, for whom and for what?
  37. 37. WWW.TIC.OM • Relevant to more and more organizations. • New field of applications. • Large volume and generate automatically and continuedly. • Various data sources • Limitations for analyzing • Complexity and speed limitations IN THE COMPANY Enabler
  38. 38. WWW.TIC.OM TRANSPARENCY Transparency helps all those involved to access information at the same time. The value cham can therein be maximized. FORECAST Big Data offers the opportunity for real time performance monitoring and to execute extensive simulations CUSTOMER FOCUS Can be cut to size through detailed customer segmentation services. ANALYSIS Through real-time analysis, automated decisions are possible. Alternatively, a decisIon basis for management can be created. INNOVATION Big Data promotes the opportunity for real- time performance monitorIng and extensive simulations to operate. IN THE COMPANY ECONOMIC FACTORS
  39. 39. WWW.TIC.OM TEAM COLLABORATION MOBILE DATA OF TABLETS AND SMARTPHONES COMMUNICATION DATA CLOUD APPLICATIONS AUTOMATED MACHINES SOCIAL MEDIA E-COMMERCE AUDIO/VIDEO DATA IN THE COMPANY DATA SOURCES
  40. 40. WWW.TIC.OM IN THE COMPANY DATA ANALYTICS
  41. 41. WWW.TIC.OM
  42. 42. WWW.TIC.OM
  43. 43. WWW.TIC.OM INFOGRAPHICS Big Data Facts
  44. 44. WWW.TIC.OM INFOGRAPHICS Big Data Facts
  45. 45. WWW.TIC.OM Salesforce Research IN THE COMPANY DATA ANALYZED
  46. 46. WWW.TIC.OM ● Clickstream analysis, buying patterns ● Sentiment Analysis ● Fraud detection; forensics analysis ● Machine learning based investment strategies ● Healthcare research ● Prediction and prevention of equipment failure ● Predicting epedmics using searches ● Finding correleations between different trends ● Personlizations/predective anlytics ● GPS monitoring and tracking ● Risk Analysis and management ● Identifying patterns in sensor data to predict issue. ● And many more…. Big data benefits various sectors
  47. 47. WWW.TIC.OM HEALTH CARE
  48. 48. WWW.TIC.OM HEALTH CARE VS BIG DATA – PERSONAL
  49. 49. WWW.TIC.OM BIG DATA
  50. 50. WWW.TIC.OM GENOME SEQUENCING COST
  51. 51. WWW.TIC.OM BIG DATA DEFINED
  52. 52. WWW.TIC.OM BIG DATA DEFINED 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing
  53. 53. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  54. 54. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  55. 55. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  56. 56. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  57. 57. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  58. 58. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  59. 59. WWW.TIC.OM 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  60. 60. WWW.TIC.OM Big Data Defined large data volumes in the range of many Terabytes and more – multiple petabytes is absolutely realistic, various data types (structured, unstructured, semi-structured and poly-structured data) from versatile data sources which are often physically distributed. Quite often, data is generated at high velocity and needs to be processed and analysed in real-time. Sometimes data expires at the same high velocity as it is generated. From a content perspective, data can even be ambiguous, which makes its interpretation quite challenging.
  61. 61. WWW.TIC.OM Big Data Defined “Big data are high volume, high velocity, and high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” (Gartner 2012)
  62. 62. WWW.TIC.OM ● Data which is “big” in these 3 dimensions ○ Volume : Lots of data being collected 90% of data the data in the world were colleted in last two years. ○ Velocity : Data is being generated quickly and we need to deal with it. ○ Variety : Structured, Unstructured, 3 Vs of big Data Image Source : GITS
  63. 63. WWW.TIC.OM 3 Vs of big Data There is 4th V of data
  64. 64. WWW.TIC.OM 4th V ● The trustworthiness of the data which is captured, in terms of accuracy. ● uncertain or imprecise data ● inherent discrepancies in all the data collected
  65. 65. WWW.TIC.OM Other Characteristics Many definitions. Often defined in terms of 3,4,5,7,9 10 Vs 1. Volume 2. Velocity 3. Variety 4. Veracity 5. Variability – inconsistencies in data and inconstant speed at which big data is loaded to database. 6. Validity – similar to veracity but how correct the data is for indented use. 7. Vulnerability – Security concerns and hacking attempts 8. Volatility – How long the data needs to be kept for? 9. Visualization – How challenging it is to visualize, ways to represent the information. 10.Value - Business Value from the Data
  66. 66. WWW.TIC.OM Big data redefined Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. —-Doug Laney Gartner Analyst, Chief Data Officer research & advisory team. Data & Analytics Strategy, Infonomics, Big Data. Info Innovation
  67. 67. WWW.TIC.OM Big Data Big Data - Value Technology and Architecture
  68. 68. WWW.TIC.OM Big Data Defined 100s of TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing
  69. 69. WWW.TIC.OM Commodity hardware compatibility Reduction in storage cost Open source ecosystem The web economy Economics Community BIG DATA ENABLER
  70. 70. WWW.TIC.OM Architecture BIG DATA
  71. 71. WWW.TIC.OM BIG DATA STEPS INVOLVED Analyze Data Store Data Process Data Collect Data Data Sources Tools Storage Solutions Result (end user Application) Serve Data
  72. 72. WWW.TIC.OM ● Capture – distributed database, appends only logs, queues ● Store – horizontally scalable system, usage patterns based data ● Search – optimized for searching ● Process – mapreduce, queues, spark jobs ● Analyze –mapreduce, spark, hive, pig ● Visualize – chart and graphs on hive ● Intergate – with existing system, datbases Big Data and Platform requirements
  73. 73. WWW.TIC.OM Architecture – Source Oracle
  74. 74. WWW.TIC.OM Data Lake
  75. 75. WWW.TIC.OM Data Lake
  76. 76. WWW.TIC.OM Analyze Data Store DataProcess Data Tools Storage Solutions Serve Data Data Sources Collect Data
  77. 77. WWW.TIC.OM HADOOP
  78. 78. WWW.TIC.OM
  79. 79. WWW.TIC.OM ● Opensource apache project ● Distrubuted fault tolerant data storage and batch processing ● Provides linear scalability on community hardware ● Flexible , scalable and free. Hadoop
  80. 80. WWW.TIC.OM Technology and Architecture
  81. 81. WWW.TIC.OM ● Unix file like system ● Splitting of large files into blocks ● Distribution and replication into various nodes ● Master namenode and many data nodes ● Master namenode and many data nodes ● Name node : has namespaces which stores the block to location. ● Datanode : Stores block to local disk, heartbeats, reports, replications HDFS
  82. 82. WWW.TIC.OM MapReduce • Map step : split the data and pre-process it • Reduce Step : aggregates the result • Most typical of Hadoop but employed by others, to various extent. • First used by Google • Google discarded it now and no plan to continue.
  83. 83. WWW.TIC.OM Cloudera • Commercial Hadoop • Enterprise solution • Data security • Doesn’t use Map Reduce now.
  84. 84. WWW.TIC.OM Spark • 2016 a great year for spark. • Apache Spark 2.0 in 2016 • Cluster-computing framework • Open source • Hadoop open source community • Apache top level project. • Top of Hadoop file system • Not tied to map reduce paradigm • MapReduce is strictly disk-based • Spark 100 times faster than Hadoop • In Memory cluster computer • Scala, Java and Python • Doesn't have its own distributed filesystem, but can use HDFS.
  85. 85. WWW.TIC.OM Data bricks • Commercial Tool of • Production • Exploration • Security • Spart in cloud hive • Apache Hive ™ data warehouse software • Reading/Writing and Managing large datasets • Distributed storage. • Facebook
  86. 86. WWW.TIC.OM R
  87. 87. WWW.TIC.OM PLATFORM BIG DATA STORAGE
  88. 88. WWW.TIC.OM Hardware Specs 2010 Storage 100MB/s Network 1Gbps CPU 3 Ghz 2017 1000MB/s (SSD) 10Gbps 3 Ghz Improvement 10 X 10 X  • The removal of virtualization layers. • Acceleration technologies, such as GPUs and NVMe • Optimal placement of storage and compute. • High-capacity, nonblocking networking.
  89. 89. WWW.TIC.OM Infrastructure with all tools Store and Query Many hardware vendors Storage at Cloud Fully-engineered, enterprise-grade big data solution. Modern Data Architecture (MDA) EMC Business Data Lake. BIG DATA PLATFORMS
  90. 90. WWW.TIC.OM
  91. 91. WWW.TIC.OM EMC
  92. 92. WWW.TIC.OM Microsoft R Server
  93. 93. WWW.TIC.OM Oceanstore 9000
  94. 94. WWW.TIC.OM Biological Computing and Storage BIG DATA : Nature has Solution
  95. 95. WWW.TIC.OM Personal Data Storage + Cloud 2001 2017 What about big data? X 90,000 2030
  96. 96. WWW.TIC.OM Modern archiving technology cannot keep up with the growing tsunami of bits. But nature may hold an answer to that problem already. Big data storage
  97. 97. WWW.TIC.OM All the world’s data can fit on a DNA hard drive the size of a teaspoon DNA Storage
  98. 98. WWW.TIC.OM A bioengineer and geneticist at Harvard’s Wyss Institute have successfully stored 5.5 petabits of data — around 700 terabytes — in a single gram of DNA, smashing the previous DNA data density record by a thousand times. DNA Storage
  99. 99. WWW.TIC.OM DNA Storage Hard Drives DNA Storage 3TB X 233 Hard Drives World’s data in a teaspoon size drive 151 kg 1 gram 10 Years Lifetime
  100. 100. WWW.TIC.OM 011001101010001 ATGCTCGAAGCT
  101. 101. WWW.TIC.OM Basic building blocks DNA Cell nucleus chromosome genes
  102. 102. WWW.TIC.OM 102 DNA Structure
  103. 103. WWW.TIC.OM DNA Self-refrential
  104. 104. WWW.TIC.OM Data Science BIG DATA
  105. 105. WWW.TIC.OM ●Then and Now ●Information becomes driving force. ●Complexity ●Processes Data Science
  106. 106. WWW.TIC.OM Skill set
  107. 107. WWW.TIC.OM • Data Scientist • Sophisticated team of developers • Analysts • Education Resources Lack of Talent 2018 - the USA alone will face a shortage of 140.000 – 190.000 data scientist as well as 1.5 million data managers.
  108. 108. WWW.TIC.OM Big data – big team
  109. 109. WWW.TIC.OM
  110. 110. WWW.TIC.OM HeadquartersOffice No. Z-215, 2nd Floor KOM4 Knowledge Oasis Muscat Sultanate of Oman amjid@tic.om @ticllc @tic_oman +theintegratedconnection+968 24166290 Amjid Ali Head of Business The Integrated Connection LLC

×