COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,
COMEX2017 Smart Talks by Amjid Ali , Muscat, Oman. Covering Introduction to big data, Big Data Definitions, Big Data Revolution, Big Data Timeline, Hadoop and Map Reduce covers importance of storage and DNA, Oceanstore 9000, Microsoft R, Spark,
1.
WWW.TIC.OM
INNOVATIVE & LEADING EDGE
IT SOLUTIONS
2.
WWW.TIC.OM
INNOVATIVE & LEADING EDGE
IT SOLUTIONS
It is a capital mistake to theorize before one has data. Insensibly one begins to twist
facts to suit theories, instead of theories to suit facts.
Sir Arthur Conan Doyle
4.
WWW.TIC.OM
Big Data and Storage
Smart Talks
Amjid Ali
Head of Business - TIC
5.
WWW.TIC.OM
It is a capital mistake to theorize before one has data. Insensibly one begins to twist
facts to suit theories, instead of theories to suit facts.
Sir Arthur Conan Doyle
7.
WWW.TIC.OM
Agenda
Big Data and Storage
• Introduction
• Data Generations / Timeline
• Why big data? – Users vs Devices and IoTs
• Practical Benefits
• Big data defined.
• Landscape
• Storage
• Next Generation Storage
8.
WWW.TIC.OM
●Every object on the earth will be generating data.
●Digital format of Information
●Quick search through tons of information.
●We are exposed to vast ocean of data.
●What we buy, where we go, what we say, what we do is all
been recorded forever.
HUMAN FACE OF BIG DATA
9.
WWW.TIC.OM
●Buzz word since 2012
●Data, small data, big data.
●Exceed the processing capacity of conventional data
●All data is not being analyzed.
INTRODUCTION
10.
WWW.TIC.OM
• Data is “data” what is big?
• Cannot be analyzed using traditional computing techniques.
• Storage
• Processing
• Visualization
INTRODUCTION - BIG DATA
12.
WWW.TIC.OM
• Relevant to more and more organizations.
• New field of applications.
• Large volume and generate automatically and continuedly.
• Various data sources
• Limitations for analyzing
• Complexity and speed limitations
INTRODUCTION - BIG DATA
14.
WWW.TIC.OM
BIG DATA TIMELINE
“information explosion” (a term first used in 1941,
according to the Oxford English Dictionary).
2030 – to start all the data generated 6X size of greater London data center will be required.
15.
WWW.TIC.OM
BIG DATA TIMELINE
Over load Census
Punch
Cards
Accounting
Machine
Library
Rate of
Transmissi
on
Storage
Capacity
Predict Big Data
Visualizatio
n
1999199619901971196719441927191018901880
16.
WWW.TIC.OM
BIG DATA TIMELINE
Everyone
Produces
Data
3V
Hadoop
and Map
Reduce
Social
Media and
Web 2.0
Big Data
Projects
5 Exabyte
Till Now vs
Two Years
Big Data
Buzz word
Data
Scientists
Genome
Decoding
Google
Largest Big
Data
2015201420132012201020092006200520022000
17.
WWW.TIC.OM
BIG DATA TIMELINE
Iot and Big
Data
Revolution
20172016
• Year of big data Revolution
• Big data becomes fast and approachable
• Artificial Intelligence and Augmented Intelligence annual growth 34%
• Big data (scientists, engineers and analyst) most demanding jobs
• 100 times better performance computers
• GPU and HPC
• Hadoop , Hive, Presto, Impala and Spark
• Hadoop and enterprise standards.
• In-Memory Computing - in-memory data grids (IMDGs)
• IoT will grow up further
• Machine learning and Operational Intelligence
• Many big data ideas
• Business Intelligence
• Cloud – Big data as a service
• Spark
• Convergence of IoT, cloud, and big data create new opportunities for
self-service analytics
• DNA Storage
23.
WWW.TIC.OM
Internet Minute
• 701,389 logins on Facebook
• 69,444 hours watched on Netflix
• 150 million emails sent
• 1,389 Uber rides
• 527,760 photos shared on Snapchat
• 51,000 app downloads on Apple’s App Store
• $203,596 in sales on Amazon.com
• 120+ new Linkedin accounts
• 347,222 tweets on Twitter
• 28,194 new posts to Instagram
• 38,052 hours of music listened to on Spotify
• 1.04 million vine loops
• 2.4 million search queries on Google
• 972,222 Tinder swipes
• 2.78 million video views on Youtube
• 20.8 million messages on WhatsApp
30.
WWW.TIC.OM
A whopping 90%
of the data that
currently exists
was created in
just the last two
years
Why big?
3.7 Billion
People, 25
Billion Sensors,
Devices
connected.
31.
WWW.TIC.OM
BIG DATA
The 3 V's - the data Volume, Variety and Velocity- create challenges
33.
WWW.TIC.OM
5 exabyte of data every 2 days
2020 – Big data and analytics market will reach $ 202b
34.
WWW.TIC.OM
PRACTICAL BENEFITS
BIG DATA IMPLEMENTATIONS
35.
WWW.TIC.OM
PRACTICAL BENFITS
BIG DATA
1. Dialogue with consumers
2. Re-develop your products
3. Perform risk analysis
4. Keeping your data safe
5. Create new revenue streams
6. Customize your website in real time
7. Reducing maintenance costs
8. Offering tailored healthcare
9. Offering enterprise-wide insights
10. Making our cities smarter
36.
WWW.TIC.OM
PRODUCT FACTOR
In addition to capital, commodities
and labor force data are the fourth
production factors of the digital
economy.
DATA STRUCTURE
The most unstructured databases
in business can be structured for
analysis.
RANGE OPTIMIZATION
In particular, areas such as
development, sales, production,
organization and management
are appointed for Big Data.
IN THE COMPANY
Why, for whom and for what?
37.
WWW.TIC.OM
• Relevant to more and more organizations.
• New field of applications.
• Large volume and generate automatically and continuedly.
• Various data sources
• Limitations for analyzing
• Complexity and speed limitations
IN THE COMPANY
Enabler
38.
WWW.TIC.OM
TRANSPARENCY
Transparency helps all
those involved to
access information at
the same time. The
value cham can
therein be maximized.
FORECAST
Big Data offers the
opportunity for real
time performance
monitoring and to
execute extensive
simulations
CUSTOMER
FOCUS
Can be cut to size
through detailed
customer
segmentation
services.
ANALYSIS
Through real-time
analysis, automated
decisions are possible.
Alternatively, a
decisIon basis for
management can be
created.
INNOVATION
Big Data promotes the
opportunity for real-
time performance
monitorIng and
extensive simulations
to operate.
IN THE COMPANY
ECONOMIC FACTORS
39.
WWW.TIC.OM
TEAM COLLABORATION MOBILE DATA OF
TABLETS AND SMARTPHONES
COMMUNICATION DATA CLOUD APPLICATIONS
AUTOMATED MACHINES SOCIAL MEDIA
E-COMMERCE AUDIO/VIDEO DATA
IN THE COMPANY
DATA SOURCES
45.
WWW.TIC.OM
Salesforce Research
IN THE COMPANY
DATA ANALYZED
46.
WWW.TIC.OM
● Clickstream analysis, buying patterns
● Sentiment Analysis
● Fraud detection; forensics analysis
● Machine learning based investment strategies
● Healthcare research
● Prediction and prevention of equipment failure
● Predicting epedmics using searches
● Finding correleations between different trends
● Personlizations/predective anlytics
● GPS monitoring and tracking
● Risk Analysis and management
● Identifying patterns in sensor data to predict issue.
● And many more….
Big data benefits various sectors
52.
WWW.TIC.OM
BIG DATA DEFINED
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
53.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
54.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
55.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
56.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
57.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
58.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
59.
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
60.
WWW.TIC.OM
Big Data Defined
large data volumes in the range of many Terabytes and more – multiple petabytes is
absolutely realistic, various data types (structured, unstructured, semi-structured
and poly-structured data) from versatile data sources which are often physically
distributed. Quite often, data is generated at high velocity and needs to be
processed and analysed in real-time. Sometimes data expires at the same high
velocity as it is generated. From a content perspective, data can even be ambiguous,
which makes its interpretation quite challenging.
61.
WWW.TIC.OM
Big Data Defined
“Big data are high volume, high velocity, and high variety information assets that
require new forms of processing to enable enhanced decision making, insight
discovery and process optimization” (Gartner 2012)
62.
WWW.TIC.OM
● Data which is “big” in these 3 dimensions
○ Volume : Lots of data being collected 90%
of data the data in the world were colleted
in last two years.
○ Velocity : Data is being generated quickly
and we need to deal with it.
○ Variety : Structured, Unstructured,
3 Vs of big Data
Image Source : GITS
63.
WWW.TIC.OM
3 Vs of big Data
There is 4th V of data
64.
WWW.TIC.OM
4th V
● The trustworthiness of the data which is
captured, in terms of accuracy.
● uncertain or imprecise data
● inherent discrepancies in all the data collected
65.
WWW.TIC.OM
Other Characteristics
Many definitions. Often defined in terms of 3,4,5,7,9 10 Vs
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Variability – inconsistencies in data and inconstant speed at which big data is
loaded to database.
6. Validity – similar to veracity but how correct the data is for indented use.
7. Vulnerability – Security concerns and hacking attempts
8. Volatility – How long the data needs to be kept for?
9. Visualization – How challenging it is to visualize, ways to represent the
information.
10.Value - Business Value from the Data
66.
WWW.TIC.OM
Big data redefined
Big data is high volume, high velocity, and/or high variety information assets that require new forms of
processing to enable enhanced decision making, insight discovery and process optimization.
—-Doug Laney
Gartner Analyst, Chief Data Officer research & advisory team. Data & Analytics Strategy, Infonomics, Big Data. Info Innovation
67.
WWW.TIC.OM
Big Data
Big Data - Value
Technology and
Architecture
68.
WWW.TIC.OM
Big Data Defined
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
69.
WWW.TIC.OM
Commodity
hardware
compatibility
Reduction in
storage cost
Open source
ecosystem
The web
economy
Economics
Community
BIG DATA ENABLER
71.
WWW.TIC.OM
BIG DATA STEPS INVOLVED
Analyze
Data
Store
Data
Process
Data
Collect
Data
Data Sources
Tools
Storage
Solutions
Result (end user
Application)
Serve
Data
72.
WWW.TIC.OM
● Capture – distributed database, appends only logs, queues
● Store – horizontally scalable system, usage patterns based data
● Search – optimized for searching
● Process – mapreduce, queues, spark jobs
● Analyze –mapreduce, spark, hive, pig
● Visualize – chart and graphs on hive
● Intergate – with existing system, datbases
Big Data and Platform requirements
79.
WWW.TIC.OM
● Opensource apache project
● Distrubuted fault tolerant data storage and batch processing
● Provides linear scalability on community hardware
● Flexible , scalable and free.
Hadoop
81.
WWW.TIC.OM
● Unix file like system
● Splitting of large files into blocks
● Distribution and replication into various
nodes
● Master namenode and many data nodes
● Master namenode and many data nodes
● Name node : has namespaces which stores
the block to location.
● Datanode : Stores block to local disk,
heartbeats, reports, replications
HDFS
82.
WWW.TIC.OM
MapReduce
• Map step : split the data and pre-process
it
• Reduce Step : aggregates the result
• Most typical of Hadoop but employed by
others, to various extent.
• First used by Google
• Google discarded it now and no plan to
continue.
83.
WWW.TIC.OM
Cloudera
• Commercial Hadoop
• Enterprise solution
• Data security
• Doesn’t use Map Reduce now.
84.
WWW.TIC.OM
Spark
• 2016 a great year for spark.
• Apache Spark 2.0 in 2016
• Cluster-computing framework
• Open source
• Hadoop open source community
• Apache top level project.
• Top of Hadoop file system
• Not tied to map reduce paradigm
• MapReduce is strictly disk-based
• Spark 100 times faster than Hadoop
• In Memory cluster computer
• Scala, Java and Python
• Doesn't have its own distributed filesystem, but can use HDFS.
85.
WWW.TIC.OM
Data bricks
• Commercial Tool of
• Production
• Exploration
• Security
• Spart in cloud
hive
• Apache Hive ™ data warehouse software
• Reading/Writing and Managing large datasets
• Distributed storage.
• Facebook
88.
WWW.TIC.OM
Hardware
Specs 2010
Storage 100MB/s
Network 1Gbps
CPU 3 Ghz
2017
1000MB/s (SSD)
10Gbps
3 Ghz
Improvement
10 X
10 X
• The removal of virtualization layers.
• Acceleration technologies, such as GPUs and NVMe
• Optimal placement of storage and compute.
• High-capacity, nonblocking networking.
89.
WWW.TIC.OM
Infrastructure with all tools
Store and Query
Many hardware vendors
Storage at Cloud
Fully-engineered, enterprise-grade big data solution.
Modern Data Architecture (MDA)
EMC Business Data Lake.
BIG DATA PLATFORMS
94.
WWW.TIC.OM
Biological Computing and Storage
BIG DATA : Nature has Solution
95.
WWW.TIC.OM
Personal Data Storage
+ Cloud
2001 2017
What about big data?
X 90,000
2030
96.
WWW.TIC.OM
Modern archiving technology cannot
keep up with the growing tsunami of
bits. But nature may hold an answer
to that problem already.
Big data storage
97.
WWW.TIC.OM
All the world’s data can fit on a DNA
hard drive the size of a teaspoon
DNA Storage
98.
WWW.TIC.OM
A bioengineer and geneticist at Harvard’s Wyss Institute
have successfully stored 5.5 petabits of data — around
700 terabytes — in a single gram of DNA, smashing the
previous DNA data density record by a thousand times.
DNA Storage
99.
WWW.TIC.OM
DNA Storage
Hard Drives DNA Storage
3TB X 233 Hard Drives World’s data in a teaspoon size
drive
151 kg 1 gram
10 Years Lifetime
107.
WWW.TIC.OM
• Data Scientist
• Sophisticated team of developers
• Analysts
• Education Resources
Lack of Talent
2018 - the USA alone will face a shortage of 140.000 – 190.000 data scientist as well as 1.5 million data
managers.
110.
WWW.TIC.OM
HeadquartersOffice No. Z-215, 2nd Floor KOM4
Knowledge Oasis Muscat
Sultanate of Oman
amjid@tic.om
@ticllc
@tic_oman
+theintegratedconnection+968 24166290
Amjid Ali
Head of Business
The Integrated Connection LLC
Il semblerait que vous ayez déjà ajouté cette diapositive à .
Créer un clipboard
Vous avez clippé votre première diapositive !
En clippant ainsi les diapos qui vous intéressent, vous pourrez les revoir plus tard. Personnalisez le nom d’un clipboard pour mettre de côté vos diapositives.
Créer un clipboard
Partager ce SlideShare
Vous avez les pubs en horreur?
Obtenez SlideShare sans publicité
Bénéficiez d'un accès à des millions de présentations, documents, e-books, de livres audio, de magazines et bien plus encore, sans la moindre publicité.
Offre spéciale pour les lecteurs de SlideShare
Juste pour vous: Essai GRATUIT de 60 jours dans la plus grande bibliothèque numérique du monde.
La famille SlideShare vient de s'agrandir. Profitez de l'accès à des millions de livres numériques, livres audio, magazines et bien plus encore sur Scribd.
Apparemment, vous utilisez un bloqueur de publicités qui est en cours d'exécution. En ajoutant SlideShare à la liste blanche de votre bloqueur de publicités, vous soutenez notre communauté de créateurs de contenu.
Vous détestez les publicités?
Nous avons mis à jour notre politique de confidentialité.
Nous avons mis à jour notre politique de confidentialité pour nous conformer à l'évolution des réglementations mondiales en matière de confidentialité et pour vous informer de la manière dont nous utilisons vos données de façon limitée.
Vous pouvez consulter les détails ci-dessous. En cliquant sur Accepter, vous acceptez la politique de confidentialité mise à jour.