2. What is Big
Data?
According to Gartner,
information becomes big data
when the volume can no longer
be managed with normal
database tools.
3. DEFINATION
Big data is high-volume, high-
velocity, and high-variety
information assets that demand
cost-effective, innovative forms of
information processing for
enhanced insight and decision-
making.
4.
5. 5 V’s Of Big Data
» Volume: Raw Data
» Velocity: Change over time
» Variety: Data types
» Veracity:Data Quality
» Value: Information for Decision
Making.
6. DATA IS EVERYWHERE
» The digital universe will grow from 3.2 zettabytes to 40 zettabytes in only six
years.
» Every day, we create 2.5 quintillion bytes of data — so much that 90% of the
data in the world today has been created in the last two years alone.
» This data comes from everywhere: sensors used to gather climate
information, posts to social media sites, digital pictures and videos, purchase
transaction records, and cell phone GPS signals .
7. Origins of Big Data Infrastructure
● The value generated by a social network is proportional to the number of
contacts between users of the social network, rather than the number of
users. According to Metcalfe’s Law[3], and its variants, the number of
contacts for N users is proportional to N*logN. Thus, the growth of contacts,
and therefore the interactions within a social network, which results in data
generation, is nonlinear with respect to number of users. As the world gets
more connected, one can expect the number of interactions to grow, resulting
in even more accelerated data growth.
8. RECENT STUDY
Google’s search index exploded from 26 Million pages in 1998, to more than 1
Trillion in less than a decade, this content was “multi-structured”, consisting of
natural language text, images, video, geo-spatial, and even renderings of
structured data.
Google had to develop , the Google File System (GFS), and MapReduce
programming framework.
These two publications became the blueprint for Apache Hadoop, an open
source framework that has become a de facto standard for big data platforms
deployed today.
9. Apache Hadoop
Yahoo adopted Apache Hadoop in January 2006, and made significant
contributions to make it a scalable and stable platform.
Today, Yahoo has the largest footprint of Apache Hadoop, running more than
45,000 servers managing more than 370 Petabytes of data with Hadoop.
Being an open source system, licensed under the liberal Apache Software
License, governed by the Apache Software Foundation.
The scalability and flexibility of Apache Hadoop prompted growing Internet
companies such as Facebook, Twitter, and LinkedIn to adopt it for their data
infrastructure.
10. Industrial Internet: The Next Frontier
The Big Data use-cases today are analysing customer behaviour, their buying
patterns, their likes and dislikes as expressed in social media,their clickstreams
and location information from mobile devices, machine-generated data could be
the next frontier for Big Data systems.
For example, in an automobile , thousands of signals being captured by 70+
sensors that generate more than 25 gigabytes of data every hour, and processed
by 70 on-board computers .
While most of this data is transient, and needs to be acted upon in real-time,
recognizing patterns within the data to improve safety and usability of the
automobile implies aggregating and analysing it offline.
11. Facts!
CEO
» Zuckerberg noted that 1 billion pieces of content are shared via Facebook’s
Open Graph daily .
» Facebook puts up over 10 million photographs every hour and around 3
billion ‘like’ buttons are pushed everyday .
» Google process more than 24 petabytes of data every day .
» 48 hours of video are uploaded to YouTube every minute, resulting in nearly
8 years of content every day .
» 70% of data is created by individuals – but enterprises are responsible for
storing and managing 80% of it .
» Every day, we create 2.5 quintillion bytes of data — so much that 90% of the
data in the world today has been created in the last two years alone.
12. Drivers and Opportunities
» Real-time prediction.
» Increase operational and supply chain efficiencies
» Deep insights into customer behaviour based on pattern and purchase
analysis
» Information aggregation
» Better and more scientific customer segmentation for targeted marketing and
product offering.
» Improve productivity and innovation
» McKinsey predicts an increase in job opportunities ranging from 140K to
190K
» Uncover hidden patterns and rapidly respond to changing scenarios.
» Multi-channel and multi-dimensional information aggregation
» Data convergence
19. Market Opportunity
Big Data offer bigger opportunities. Here is a snapshot of some of the
predictions done by market research firms
IDC predicts the Big Market to grow to $16.9 Billion by 2019
Digital reasoning estimates that Big Data market would be worth $48.3 billion in
2019
20. Applications
» Better financial data management
» Investment banking using aggregated information from various sources likes
financial forecasting, asset pricing and portfolio management.
» More accurate pricing adjustments based on vast amount of real-time data
» Stock advises based on huge amount of stock data analysis, unstructured
data like social media content etc.
» Customer segmentation based on previous transactions and profile
information
» Analysis of purchase patterns and tailor made product offerings
» Unstructured data analysis from social media, multi-media to understand the
tastes, preferences, and customer patterns and do sentiment analysis
» Targeted marketing based on user segmentation