SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
BIG DATA
Table of content
1.Introduction.....................................................................(2)
2.What is Big Data……………….............................................(3)
3.Origin of the concept......................................................(3-4)
4.Big Data- Basic concept..................................................(5)
5.What is big data? 3vs of Big Data? ................................(6-7)
6.Big Data verses Small
Data…………………………………………(8)
7.Why Big Data is important..............................................(9)
8.Big Data at the Edge........................................................(9)
9.Big Data-Opportunities and Challenges..........................(10)
10.Big Data Storage............................................................(11)
11.Big Data Processing.......................................................(11-12)
12.Advantages and disadvantages of Big Data..................(13)
13.Applications of Big Data................................................(13)
14.Conclusion.....................................................................(14)
15.References……………………………………………… (15-16)
Abstract
Big data is an all-encompassing term for any collection of data sets so large and
complex that it becomes difficult to process them using traditional data processing
applications. Big data usually includes data sets with sizes beyond the ability of
commonly used software tools to capture, curate, manage, and process data within a
tolerable elapsed time. Big data size is a constantly moving target ranging from a few
dozen terabytes to many peta-bytes of data. Big data is a set of techniques and
technologies that require new forms of integration to uncover large hidden values from
large datasets that are diverse, complex, and of a massive scale. Big data can also be
defined as "Big data is a large volume unstructured data which cannot be handled by
standard database management systems like DBMS, RDBMS or ORDBMS" .This report
focus on management and processing of Big Data that will combine business
requirements and utilize the services platform to analyze the dataset.
Introduction:
People, devices and networks are constantly generating data. When users stream
videos, play the latest game with friends, or make in-app purchases, their activity
generates data about their needs and preferences, as well as their QoE. Even when
users put their devices in their pockets, the network is generating location and other
data that keeps services running and ready to use.
As a result, the rate of mobile network data traffic growth is increasing rapidly. It is
estimated that by 2020, the number of smartphone subscriptions will have increased
from today’s 2.7 billion to 6.1 billion, and the total amount of mobile traffic generated by
smartphones will be five times that of today.
The big-data-driven telecom analytics market alone is expected to have a compound
annual growth rate of nearly 50 percent – with annual revenues expected to reach USD
5.4 billion at the end of 2019.
Communication service providers (CSPs) can make use of this big data to drive a wide
range of important decisions and activities. These include: designing more competitive
offers and packages; recommending the most attractive offers to subscribers during the
shopping and ordering process; communicating with subscribers about their usage,
spending and purchase options; configuring the network to deliver more reliable
services; and monitoring QoE to proactively correct any potential problems. All these
activities enable improved user experience, increased customer satisfaction, smarter
networks and extended network functionality to facilitate progress into the Networked
Society.
The profound impact that increased broadband networking will have on society will also
create business opportunities in new areas for CSPs. Improved real-time connectivity
and data management enables the creation of tailored data sets, readily available for
analysis and machine learning. This enables data-driven efficiency improvements in
several business areas – for example, transport, logistics, energy, agriculture and
environmental protection. Furthermore, decision making in business and society will be
facilitated by access to insights based on more accurate and up-to-date data.[1]
What is Big Data?
Big Data is the ocean of information we swim in every day – vast sources of data
flowing from our computers, mobile devices, and machine sensors. Big Data is being
generated by everything around us at all times. Every digital process and social media
ex-change produces it, while systems, sensors, and mobile de-vices transmit it. New
sources of data come from a variety of ma-chines, such as website inter-actions, search
engine optimizations, and social business sites by using click-stream data. These
changing business requirements demand that the right information be available at the
right time.[3]
Origins of the concept:
A decade ago, data storage scalability was one of the major technical issues data
owners were facing. Nevertheless, a new brand of efficient and scalable technology has
been incorporated and data management and storage is no longer the problem it used
to be.In addition, data is constantly being generated, not only by use of internet, but also
by companies generating big amounts of information coming from sensors, computers
and automated processes. This phenomenon has recently accelerated further thanks to
the increase of connected devices (which will soon become the largest source of data)
and the worldwide success of the social platforms.
Significant Internet players like Google, Amazon, Facebook and Twitter were the first
facing these increasing data volumes “at the internet scale” and designed ad-hoc
solutions to be able to cope with the situation.Those solutions have since, partly
migrated into the open source software communities and have been made publicly
available. This was the starting point of the current Big Data trend as it was a relatively
cheap solution for businesses confronted with similar problems.Meanwhile, two parallel
breakthroughs have further helped accelerate the adoption of solutions for handling Big
Data:
The availability of Cloud based solutions has dramatically lowered the cost of storage,
amplified by the use of commodity hardware. Virtual file systems, either open source or
vendor specific, helped transition from a managed infrastructure to a service based
approach;
When dealing with large volumes of data, it is necessary to distribute data and
workload over many servers. New designs for databases and efficient ways to support
massively parallel processing have led to a new generation of products like the so
called noSQL databases and the Hadoop map-reduce platform.
The table below summarizes the main features and problems connected to handing
different types of large data sets, and explains how Big Data technologies can help
solve them.[2]
Aspect Characteristcs Challenges and Technology
responses
Volume
The most visible aspect of Big
Data, referring to the fact that
the amount of generated data
has increased tremendously
the past years. However, this is
the less challenging aspect in
practice.
The natural expansion of
internet has created an
increase in the global data
production. A response to this
situation has been the
virtualization of storage in data
centres, amplified by a
significant decrease of the cost
of ownership through the
generalization of the cloud
based solutions.The noSQL
database approach is a
response to store and query
huge volumes of data heavily
distributed.
Velocity
This aspect captures the
growing data production
rates.More and more data are
produced and must be
collected in shorter time
frames.
*
The daily addition of millions of
connected devices
(smartphones) will increase not
only volume but also velocity.
Real-time data processing
platforms are now considered
by global companies as a
requirement to get a
competitive edge
Variety
With the multiplication of data
sources comes the explosion of
data formats, ranging from
structured information to free
text.
*
The necessity to collect and
analyse non-structured or semi-
structured data goes against
the traditional relational data
model and query languages.
This reality has been a strong
incentive to create new kinds of
data stores able to support
flexible data models
*
Value
This highly subjective aspect
refers to the fact that until
recently, large volumes of data
where recorded (often for
archiving or regulatory
purposes) but not exploited*
Big Data technologies are now
seen as enablers to create or
capture value from otherwise
not fully exploited data. In
essence, the challenge is to
find a way to transform raw
data into information that has
value, either internally, or for
making a business out of it.* [2]
Big Data: Basic Concept:
Big Data encompasses everything from click stream data from the web to genomic and
proteomic data from biological research and medicines. Big Data is a heterogeneous
mix of data both structured (traditional datasets –in rows and columns like DBMS tables,
CSV's and XLS's) and unstructured data like e-mail attachments, manuals, images,
PDF documents, medical records such as x-rays, ECG and MRI images, forms, rich
media like graphics, video and audio, contacts, forms and documents. Businesses are
primarily concerned with managing unstructured data, because over 80 percent of
enterprise data is unstructured and require significant storage space and effort to
manage.“Big data” refers to datasets whose size is beyond the ability of typical
database software tools to capture, store, manage, and analyse.
Big data analyticsis the area where advanced analytic techniques operate on big data
sets.[4]
Fig 1. Big Data
What is big data? 3vs of Big Data?
By now, it’s almost impossible to not have heard the term Big Data- a cursory glance at
Google Trends will show how the term has exploded over the past few years, and
become unavoidably ubiquitous in public consciousness. But what you may have
managed to avoid is gaining a thorough understanding what Big Data actually
constitutes.
The first go-to answer is that ‘Big Data’ refers to datasets too large to be processed on a
conventional database system. In this way, the term Big Data is nebulous- whilst size is
certainly a part of it, scale alone doesn’t tell the whole story of what makes Big Data
‘big’.
When looking for a slightly more comprehensive overview, many refer to Doug Laney’s
3 V’s:
1. Volume
100 terabytes of data are uploaded daily to Facebook; Akamai analyses 75 million
events a day to target online ads; Walmart handles 1 million customer transactions
every single hour. 90% of all data ever created was generated in the past 2 years.
Scale is certainly a part of what makes Big Data big. The internet-mobile revolution,
bringing with it a torrent of social media updates, sensor data from devices and an
explosion of e-commerce, means that every industry is swamped with data- which can
be incredibly valuable, if you know how to use it.
2. Velocity
In 1999, Wal-Mart’s data warehouse stored 1,000 terabytes (1,000,000 gigabytes) of
data. In 2012, it had access to over 2.5 petabytes (2,500,000 gigabytes) of data.
Every minute of every day, we upload 100 hours of video on Youtube, send over 200
million emails and send 300,000 tweets. ‘Velocity’ refers to the increasing speed at
which this data is created, and the increasing speed at which the data can be processed,
stored and analysed by relational databases. The possibilities of processing data in
real-time is an area of particular interest, which allows
companies to do things like display personalised ads on the web pages you visit, based
on your recent search, viewing and purchase history.
3. Variety
Gone are the days when a company’s data could be neatly slotted into a table and
analysed. 90% of data generated is ‘unstructured’, coming in all shapes and forms- from
geo-spatial data, to tweets which can be analysed for content and sentiment, to visual
data such as photos and videos.
The ‘3 V’s’ certainly give us an insight into the almost unimaginable scale of data, and
the break-neck speeds at which these vast datasets grow and multiply. But only
‘Variety’ really begins to scratch the surface of the depth- and crucially, the challenges-
of Big Data. [6]
Fig 2. Characteristics Of Big Data
Big Data Versus Small Data:
Parameter Small Data Big Data
Goals Usually designed to answer
a specific question or serve
a particular goal.
Usually designed with a goal in
mind, but the goal is flexible and the
questions posed are protean. Big
Data grants designed “to combine
high-quality data from fisheries,
Coast Guard, commercial shipping,
and coastal management agencies
for a growing data collections.
Location Typically, small data is
contained within one
institution, often on one
computer, sometimes in one
file.
Typically spread throughout
electronic space, typically parceled
onto multiple Internet servers,
located anywhere on earth.
Data structure and
content
Ordinarily contains highly
structured data. The data
domain is restricted to a
single discipline or sub
discipline. The data often
comes in the form of uniform
records in an ordered
spreadsheet.
Must be capable of absorbing
unstructured data (e.g., such as
free-text documents, images,
motion pictures, sound recordings,
physical objects). The subject
matter of the resource may cross
multiple disciplines, and the
individual data objects in the
resource may link to data contained
in other, seemingly unrelated, Big
Data resources.
Data preparation In many cases, the data
user prepares her own data,
for her own purposes.
The data comes from many diverse
sources, and it is prepared by many
people. People who use the data
are seldom the people who have
prepared the data.
Measurements Typically, the data is
measured using one
experimental protocol, and
the data can be represented
using one set of standard
units (see Glossary item,
Protocol).
Many different types of data are
delivered in many different
electronic formats. Measurements,
when present, may be obtained by
many different protocols. Verifying
the quality of Big Data is one of the
most difficult tasks for data
managers.[7]
Why Is Big Data Important?
The importance of big data doesn’t revolve around how much data you have, but what
you do with it. You can take data from any source and analyze it to find answers that
enable 1) cost reductions, 2) time reductions, 3) new product development and
optimized offerings, and 4) smart decision making. When you combine big data with
high-powered analytics, you can accomplish business-related tasks such as:
 Determining root causes of failures, issues and defects in near-real time.
 Generating coupons at the point of sale based on the customer’s buying habits.
 Recalculating entire risk portfolios in minutes.
 Detecting fraudulent behavior before it affects your organization.[8]
Big Data at the Edge:
Much of the current discussion about big data analytics today focuses on managing and
analyzing unstructured data from business and social sources such as e-mail, videos,
tweets, Face book posts, reviews, and Web behavior. While this type of big data
analytics promises to provide significant value to organizations, data generated at the
edge of the network from sensors and other devices represents another huge, untapped
resource with the potential to deliver insights that can transform the operations and
strategic initiatives of public and private sector organizations.
Data from intelligent systems and sensors is some of the largest volume, fastest
streaming, and/or most complex big data. The data sources are distributed across the
network and data is collectedby an enormous variety of equipment, such as utility
meters, traffic and security cameras, RFID readers, factory-line sensors, fitness
machines, and medical devices.
Ubiquitous connectivity and the growth of sensors and intelligent systems have opened
up a whole new storehouse of valuable information. Edge data can provide significant
value to both the private and public sector as a source of enormous potential for gaining
deeper, richer insight faster and more cost-effectively than in the past. In many cases,
analysis of edge data can help organizations respond to events and solve problems that
were previously out of reach.[9]
Big Data: Opportunities and Challenges:
In the distributed systems world, “Big Data” started to become a major issue in the late
1990‟s due to the impact of the world-wide Web and a resulting need to index and
query its rapidly mushrooming content. Database technology (including parallel
databases) was considered for the task, but was found to be neither well-suited nor
cost-effective for those purposes. The turn of the millennium then brought further
challenges as companies began to use information such as the topology of the Web
and users‟ search histories in order to provide increasingly useful search results, as well
as more effectively-targeted advertising to display alongside and fund those results.
Google‟s technical response to the challenges of Web-scale data management and
analysis was simple, by database standards, but kicked off what has become the
modern “Big Data” revolution in the systems world.
To handle the challenge of Web-scale storage, the Google File System (GFS) was
created. GFS provides clients with the familiar OS-level byte-stream abstraction, but it
does so for extremely large files whose content can span hundreds of machines in
shared-nothing clusters created using inexpensive commodity hardware. To handle the
challenge of processing the data in such large files, Google pioneered its Map Reduce
programming model and platform.
This model, characterized by some as “parallel programming for dummies”, enabled
Google‟s developers to process large collections of data by writing two user-defined
functions, map and reduce, that the Map Reduce framework applies to the instances
(map) and sorted groups of instances that share a common key (reduce) – similar to the
sort of partitioned parallelism utilized in shared-nothing parallel query processing.
Driven by very similar requirements, software developers at Yahoo!, Facebook, and
other large Web companies followed suit. Taking Google‟s GFS and Map Reduce
papers as rough technical specifications, open-source equivalents were developed, and
the Apache Hadoop Map Reduce platform and its underlying file system (HDFS, the
Hadoop Distributed File System) were born. The Hadoop system has quickly gained
traction, and it is now widely used for use cases including Web indexing, clickstream
and log analysis, and certain large-scale information extraction and machine learning
tasks. Soon tired of the low-level nature of the Map Reduce programming model, the
Hadoop community developed a set of higher-level declarative languages for writing
queries and data analysis pipelines that are compiled into Map Reduce jobs and then
executed on the Hadoop Map Reduce platform.
Popular languages include Pig from Yahoo!, Jaql from IBM, and Hive from Facebook.
Pig is relational-algebra-like in nature, and is reportedly used for over 60% of
Yahoo!‟sMapReduce use cases; Hive is SQL-inspired and reported to be used for over
90% of the Facebook Map Reduce use cases. Microsoft‟s technologies include a
parallel runtime system called Dryad and two higher-level programming models, Dryad
LINQ and the SQLlike SCOPE language, which utilizes Dryad under the covers.
Interestingly, Microsoft has also recently announced that its future “Big Data” strategy
includes support for Hadoop.[4]
Big Data Storage:
We live in on-demand, on-command Digital universe with data prolifering by Institutions,
Individuals and Machines at a very high rate. This data is categories as "Big Data" due
to its sheer Volume, Variety, Velocity and Veracity. Most of this data is unstructured,
quasi structured or semi structured and it is heterogeneous in nature. The volume and
the heterogeneity of data with the speed it is generated, makes it difficult for the present
computing infrastructure to manage Big Data. Traditional data management,
warehousing and analysis systems fall short of tools to analyze this data.
Due to its specific nature of Big Data, it is stored in distributed file system architectures.
Hadoop and HDFS by Apache is widely used for storing and managing Big Data.
Analyzing Big Data is a challenging task as it involves large distributed file systems
which should be fault tolerant, flexible and scalable. Map Reduce is widely been used
for the efficient analysis of Big Data. Traditional DBMS techniques like Joins and
Indexing and other techniques like graph search is used for classification and clustering
of Big Data. These techniques are being adopted to be used in Map Reduce.
Map Reduce framework over Hadoop Distributed File System (HDFS). Map Reduce is
a Minimization technique which makes use of file indexing with mapping, sorting,
shuffling and finally reducing. Map Reduce techniques have been studied at in this
paper which is implemented for Big Data analysis using HDFS.[11][12]
Big Data Processing:
Big Data encompasses everything from click stream data from the web to genomic and
proteomic data from biological research and medicines. Big Data is a heterogeneous
mix of data both structured (traditional datasets –in rows and columns like DBMS tables,
CSV's and XLS's) and unstructured data like e-mail attachments, manuals, images,
PDF documents, medical records such as x-rays, ECG and MRI images, forms, rich
media like graphics, video and audio, contacts, forms and documents. Businesses are
primarily concerned with managing unstructured data, because over 80 percent of
enterprise data is unstructured and require significant storage space and effort to
manage.“Big data” refers to datasets whose size is beyond the ability of typical
database software tools to capture, store, manage, and analyse.
Big data analytics is the area where advanced analytic techniques operate on big data
sets. It is really about two things, Big data and Analytics and how the two have teamed
up to create one of the most profound trends in business intelligence (BI) . Map Reduce
by itself is capable for analysing large distributed data sets; but due to the heterogeneity,
velocity and volume of Big Data, it is a challenge for traditional data analysis and
management tools. A problem with Big Data is that they use NoSQL and has no Data
Description Language (DDL) and it supports transaction processing. Also, web-scale
data is not universal and it is heterogeneous.
For analysis of Big Data, database integration and cleaning is much harder than the
traditional mining approaches. Parallel processing and distributed computing is
becoming a standard procedure which are nearly non-existent in RDBMS. Map Reduce
has following characteristics; it supports Parallel and distributed processing, it is simple
and its architecture is shared-nothing which has commodity diverse hardware (big
cluster).Its functions are programmed in a high-level programming language (e.g. Java,
Python) and it is flexible.
Query processing is done through NoSQL integrated in HDFS as Hive tool. Analytics
helps to discover what has changed and the possible solutions. Second, advanced
analytics is the best way to discover more business opportunities, new customer
segments, identify the best suppliers, associate products of affinity, understand sales
seasonality etc.[5][13]
Benefits of Big Data:
 Understand customer need better.
 Reduce cost.
 Make processes more efficient.
 Detect risks and check fraud.[14]
Drawbacks of Big Data:
o High Maintenance.
o Skill needed to access Data.
o Difficult to Handle.
o Violates the Privacy Principle.[14]
Applications of Big Data:
 Government.
 International development
 Manufacturing
 Cyber-Physical Models
 Media
 Technology
 Private sector
 Science and Research.[15]
Conclusion:
Big Data analysis tools like Map Reduce over Hadoop and HDFS, promises to help
organizations better understand their customers and the marketplace, hopefully leading
to better business decisions and competitive advantages. The need to process
enormous quantities of data has never been greater. Not only are terabyte- and
petabyte-scale datasets rapidly becoming commonplace, but there is consensus that
great value lies buried in them, waiting to be unlocked by the right computational tools.
In the commercial sphere, business intelligence, driven by the ability to gather data from
a dizzying array of sources. For engineers building information processing tools and
applications, large and heterogeneous datasets which are generating continuous flow of
data, lead to more effective algorithms for a wide range of tasks, from machine
translation to spam detection. In the natural and physical sciences, the ability to analyse
massive amounts of data may provide the key to unlocking the secrets of the cosmos or
the mysteries of life. MapReduce can be exploited to solve a variety of problems related
to text processing at scales that would have been unthinkable a few years ago.
We regard Big Data as an emerging trend and the need for Big Data is arising in all
science and engineering domains. With Big Data technologies, we will hopefully be able
to provide most relevant and most accurate social sensing feedback to better
understand our society at realtime. We can further stimulate the participation of the
public audiences in the data production circle for societal and economical events.The
development of big data extends the scope of human activities. It demands proper
attention from academia, industry and government. The world has been cooperating
and integrating on a global scale. Human is enforced to change mode from the local to
the global in their everyday life and work. It redefines the relationship among individuals,
businesses, organizations, governments, and societies through networked thinking and
further to improve the human living environment, to enhance the quality of public
services, to improve performance, efficiency and productivity through the intelligentized
interactive operating. The technological progress and industrial upgrading of big data
will create new markets, new business models and new industry rules, and more
importantly it demonstrates the collective will of acountry that looking for strategic
advantage. Although there isstill a large gap to gain data intelligence like human
wisdom big data is a promising topic and it certainly helps us to understand the world
from an entirely new aspect.[16]
REFRENCES:
[1].Ericsson White paper*
Ericsson, Ericsson Mobility Report, February 2015, available at:
http://www.ericsson.com/res/docs/2015/ericsson-mobility-report-feb-2015-interim.pdf
[2].NESSI White Paper*
* Big Data
A New World of Opportunities*
[3].Book: "Big Data for Beginners" by Alonzo Williams,Stepanie Foor.
[4].Puneet Singh Duggal ,Sanchita Paul , “Big Data Analysis : Challenges and
Solutions” , International Conference On Cloud, Big Data and Trust 2013 , Nov 13-15 ,
RGPV.
[5].Prashant Kumar, Khushboo Pandey, “Big Data and Distributed Data Mining: An
Example of Future Networks”, Volume 1, Issue 2 (2013) 36-39 International Journal of
Advance Research and Innovation.
[6].Book: "Understanding Big Data: A Beginners Guide to Data Science & the Business
Applications" by Eileen McNulty-Holmes.
[7].Book: "Principles of Big Data: Preparing, Sharing, and Analyzing Complex
Information" by Jules J. Berman.
[8].http://www.sas.com/en_th/insights/big-data/what-is-big-data.html
[9].Feng Ye, Zhijian Wang, Fachao Zhou, Yapu Wang, Yuanchao Zhou, “Cloud –based
Big Data Mining &Analyzing Services Platform integrating R”, 2013 International
Conference on Advanced Cloud and Big Data.
[10].Hanna Yang, Minjeong Park, Minsu Cho, Minseok Song, Seongjoo Kim, “A System
Architecture for Manufacturing Process Analysis based on Big Data and Process Mining
Techniques”, 2014 IEEE International Conference on Big Data.
[11].Xindong Wu , Fellow , IEEE , Xingquan Zhu , Senior Member , IEEE , Gong-Qing
Wu , and Wei Ding , Senior Member, IEEE , “Data Mining with Big Data”, IEEE
Transactions on knowledge and Data Engineering, VOL. 26, NO. 1 , January 2014.
[12].Sandy Moens ,EminAksehirli , Bart Goethals , “Frequent Itemset Mining for Big
Data”, 2013 IEEE International Conference on Big Data.
[13].Carson Kai-Sang Leung, Fan Jiang, “A Data Science Solution for Mining Interesting
Patterns from Uncertain Big Data”, 2014 IEEE Fourth International Conference on Big
Data and Cloud Computing.
[14].http://www.oii.ox.ac.uk/research/project/?id=98.
[15].https://www.google.co.in/#q=applications+of+big+data+wikipedia.
[16].Jason Venner, Pro Hadoop: Build Scalable, distributed applications in the
cloud ,ISBN-13 (pbk): 978-1-4302-1942-2, ISBN-13(electronic): 978-1-4302-1943-9.

Contenu connexe

Tendances

Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big dataMark Albala
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4wilson_lucas
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationUyoyo Edosio
 
BIG DATA(PPT)
BIG DATA(PPT)BIG DATA(PPT)
BIG DATA(PPT)josnapv
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 

Tendances (20)

1
11
1
 
Big Data
Big DataBig Data
Big Data
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Big data survey
Big data surveyBig data survey
Big data survey
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big data
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and Application
 
BIG DATA(PPT)
BIG DATA(PPT)BIG DATA(PPT)
BIG DATA(PPT)
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Big data
Big dataBig data
Big data
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 

Similaire à Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage processing of big data, applications, adv,dis etc..)

Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big DataIRJET Journal
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big DataSonovate
 
06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyanIAESIJEECS
 
Idc big data whitepaper_final
Idc big data whitepaper_finalIdc big data whitepaper_final
Idc big data whitepaper_finalOsman Circi
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET Journal
 
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...Stuart Blair
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta diyotta
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesEditor IJCATR
 
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspectiveIABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspectiveMateusz Maj
 

Similaire à Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage processing of big data, applications, adv,dis etc..) (20)

Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big Data
 
06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan
 
Idc big data whitepaper_final
Idc big data whitepaper_finalIdc big data whitepaper_final
Idc big data whitepaper_final
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
 
130214 copy
130214   copy130214   copy
130214 copy
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
Fast Data and Architecting the Digital Enterprise Fast Data drivers, componen...
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Modern data integration | Diyotta
Modern data integration | Diyotta Modern data integration | Diyotta
Modern data integration | Diyotta
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
Big data
Big dataBig data
Big data
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
 
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspectiveIABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage processing of big data, applications, adv,dis etc..)

  • 1. BIG DATA Table of content 1.Introduction.....................................................................(2) 2.What is Big Data……………….............................................(3) 3.Origin of the concept......................................................(3-4) 4.Big Data- Basic concept..................................................(5) 5.What is big data? 3vs of Big Data? ................................(6-7) 6.Big Data verses Small Data…………………………………………(8) 7.Why Big Data is important..............................................(9) 8.Big Data at the Edge........................................................(9) 9.Big Data-Opportunities and Challenges..........................(10) 10.Big Data Storage............................................................(11) 11.Big Data Processing.......................................................(11-12) 12.Advantages and disadvantages of Big Data..................(13) 13.Applications of Big Data................................................(13) 14.Conclusion.....................................................................(14) 15.References……………………………………………… (15-16)
  • 2. Abstract Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data size is a constantly moving target ranging from a few dozen terabytes to many peta-bytes of data. Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are diverse, complex, and of a massive scale. Big data can also be defined as "Big data is a large volume unstructured data which cannot be handled by standard database management systems like DBMS, RDBMS or ORDBMS" .This report focus on management and processing of Big Data that will combine business requirements and utilize the services platform to analyze the dataset.
  • 3. Introduction: People, devices and networks are constantly generating data. When users stream videos, play the latest game with friends, or make in-app purchases, their activity generates data about their needs and preferences, as well as their QoE. Even when users put their devices in their pockets, the network is generating location and other data that keeps services running and ready to use. As a result, the rate of mobile network data traffic growth is increasing rapidly. It is estimated that by 2020, the number of smartphone subscriptions will have increased from today’s 2.7 billion to 6.1 billion, and the total amount of mobile traffic generated by smartphones will be five times that of today. The big-data-driven telecom analytics market alone is expected to have a compound annual growth rate of nearly 50 percent – with annual revenues expected to reach USD 5.4 billion at the end of 2019. Communication service providers (CSPs) can make use of this big data to drive a wide range of important decisions and activities. These include: designing more competitive offers and packages; recommending the most attractive offers to subscribers during the shopping and ordering process; communicating with subscribers about their usage, spending and purchase options; configuring the network to deliver more reliable services; and monitoring QoE to proactively correct any potential problems. All these activities enable improved user experience, increased customer satisfaction, smarter networks and extended network functionality to facilitate progress into the Networked Society. The profound impact that increased broadband networking will have on society will also create business opportunities in new areas for CSPs. Improved real-time connectivity and data management enables the creation of tailored data sets, readily available for analysis and machine learning. This enables data-driven efficiency improvements in several business areas – for example, transport, logistics, energy, agriculture and environmental protection. Furthermore, decision making in business and society will be facilitated by access to insights based on more accurate and up-to-date data.[1]
  • 4. What is Big Data? Big Data is the ocean of information we swim in every day – vast sources of data flowing from our computers, mobile devices, and machine sensors. Big Data is being generated by everything around us at all times. Every digital process and social media ex-change produces it, while systems, sensors, and mobile de-vices transmit it. New sources of data come from a variety of ma-chines, such as website inter-actions, search engine optimizations, and social business sites by using click-stream data. These changing business requirements demand that the right information be available at the right time.[3] Origins of the concept: A decade ago, data storage scalability was one of the major technical issues data owners were facing. Nevertheless, a new brand of efficient and scalable technology has been incorporated and data management and storage is no longer the problem it used to be.In addition, data is constantly being generated, not only by use of internet, but also by companies generating big amounts of information coming from sensors, computers and automated processes. This phenomenon has recently accelerated further thanks to the increase of connected devices (which will soon become the largest source of data) and the worldwide success of the social platforms. Significant Internet players like Google, Amazon, Facebook and Twitter were the first facing these increasing data volumes “at the internet scale” and designed ad-hoc solutions to be able to cope with the situation.Those solutions have since, partly migrated into the open source software communities and have been made publicly available. This was the starting point of the current Big Data trend as it was a relatively cheap solution for businesses confronted with similar problems.Meanwhile, two parallel breakthroughs have further helped accelerate the adoption of solutions for handling Big Data: The availability of Cloud based solutions has dramatically lowered the cost of storage, amplified by the use of commodity hardware. Virtual file systems, either open source or vendor specific, helped transition from a managed infrastructure to a service based approach; When dealing with large volumes of data, it is necessary to distribute data and workload over many servers. New designs for databases and efficient ways to support massively parallel processing have led to a new generation of products like the so called noSQL databases and the Hadoop map-reduce platform. The table below summarizes the main features and problems connected to handing different types of large data sets, and explains how Big Data technologies can help solve them.[2]
  • 5. Aspect Characteristcs Challenges and Technology responses Volume The most visible aspect of Big Data, referring to the fact that the amount of generated data has increased tremendously the past years. However, this is the less challenging aspect in practice. The natural expansion of internet has created an increase in the global data production. A response to this situation has been the virtualization of storage in data centres, amplified by a significant decrease of the cost of ownership through the generalization of the cloud based solutions.The noSQL database approach is a response to store and query huge volumes of data heavily distributed. Velocity This aspect captures the growing data production rates.More and more data are produced and must be collected in shorter time frames. * The daily addition of millions of connected devices (smartphones) will increase not only volume but also velocity. Real-time data processing platforms are now considered by global companies as a requirement to get a competitive edge Variety With the multiplication of data sources comes the explosion of data formats, ranging from structured information to free text. * The necessity to collect and analyse non-structured or semi- structured data goes against the traditional relational data model and query languages. This reality has been a strong incentive to create new kinds of data stores able to support flexible data models * Value This highly subjective aspect refers to the fact that until recently, large volumes of data where recorded (often for archiving or regulatory purposes) but not exploited* Big Data technologies are now seen as enablers to create or capture value from otherwise not fully exploited data. In essence, the challenge is to find a way to transform raw data into information that has value, either internally, or for making a business out of it.* [2]
  • 6. Big Data: Basic Concept: Big Data encompasses everything from click stream data from the web to genomic and proteomic data from biological research and medicines. Big Data is a heterogeneous mix of data both structured (traditional datasets –in rows and columns like DBMS tables, CSV's and XLS's) and unstructured data like e-mail attachments, manuals, images, PDF documents, medical records such as x-rays, ECG and MRI images, forms, rich media like graphics, video and audio, contacts, forms and documents. Businesses are primarily concerned with managing unstructured data, because over 80 percent of enterprise data is unstructured and require significant storage space and effort to manage.“Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyse. Big data analyticsis the area where advanced analytic techniques operate on big data sets.[4] Fig 1. Big Data
  • 7. What is big data? 3vs of Big Data? By now, it’s almost impossible to not have heard the term Big Data- a cursory glance at Google Trends will show how the term has exploded over the past few years, and become unavoidably ubiquitous in public consciousness. But what you may have managed to avoid is gaining a thorough understanding what Big Data actually constitutes. The first go-to answer is that ‘Big Data’ refers to datasets too large to be processed on a conventional database system. In this way, the term Big Data is nebulous- whilst size is certainly a part of it, scale alone doesn’t tell the whole story of what makes Big Data ‘big’. When looking for a slightly more comprehensive overview, many refer to Doug Laney’s 3 V’s: 1. Volume 100 terabytes of data are uploaded daily to Facebook; Akamai analyses 75 million events a day to target online ads; Walmart handles 1 million customer transactions every single hour. 90% of all data ever created was generated in the past 2 years. Scale is certainly a part of what makes Big Data big. The internet-mobile revolution, bringing with it a torrent of social media updates, sensor data from devices and an explosion of e-commerce, means that every industry is swamped with data- which can be incredibly valuable, if you know how to use it. 2. Velocity In 1999, Wal-Mart’s data warehouse stored 1,000 terabytes (1,000,000 gigabytes) of data. In 2012, it had access to over 2.5 petabytes (2,500,000 gigabytes) of data. Every minute of every day, we upload 100 hours of video on Youtube, send over 200 million emails and send 300,000 tweets. ‘Velocity’ refers to the increasing speed at which this data is created, and the increasing speed at which the data can be processed, stored and analysed by relational databases. The possibilities of processing data in real-time is an area of particular interest, which allows companies to do things like display personalised ads on the web pages you visit, based on your recent search, viewing and purchase history. 3. Variety Gone are the days when a company’s data could be neatly slotted into a table and analysed. 90% of data generated is ‘unstructured’, coming in all shapes and forms- from
  • 8. geo-spatial data, to tweets which can be analysed for content and sentiment, to visual data such as photos and videos. The ‘3 V’s’ certainly give us an insight into the almost unimaginable scale of data, and the break-neck speeds at which these vast datasets grow and multiply. But only ‘Variety’ really begins to scratch the surface of the depth- and crucially, the challenges- of Big Data. [6] Fig 2. Characteristics Of Big Data
  • 9. Big Data Versus Small Data: Parameter Small Data Big Data Goals Usually designed to answer a specific question or serve a particular goal. Usually designed with a goal in mind, but the goal is flexible and the questions posed are protean. Big Data grants designed “to combine high-quality data from fisheries, Coast Guard, commercial shipping, and coastal management agencies for a growing data collections. Location Typically, small data is contained within one institution, often on one computer, sometimes in one file. Typically spread throughout electronic space, typically parceled onto multiple Internet servers, located anywhere on earth. Data structure and content Ordinarily contains highly structured data. The data domain is restricted to a single discipline or sub discipline. The data often comes in the form of uniform records in an ordered spreadsheet. Must be capable of absorbing unstructured data (e.g., such as free-text documents, images, motion pictures, sound recordings, physical objects). The subject matter of the resource may cross multiple disciplines, and the individual data objects in the resource may link to data contained in other, seemingly unrelated, Big Data resources. Data preparation In many cases, the data user prepares her own data, for her own purposes. The data comes from many diverse sources, and it is prepared by many people. People who use the data are seldom the people who have prepared the data. Measurements Typically, the data is measured using one experimental protocol, and the data can be represented using one set of standard units (see Glossary item, Protocol). Many different types of data are delivered in many different electronic formats. Measurements, when present, may be obtained by many different protocols. Verifying the quality of Big Data is one of the most difficult tasks for data managers.[7]
  • 10. Why Is Big Data Important? The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making. When you combine big data with high-powered analytics, you can accomplish business-related tasks such as:  Determining root causes of failures, issues and defects in near-real time.  Generating coupons at the point of sale based on the customer’s buying habits.  Recalculating entire risk portfolios in minutes.  Detecting fraudulent behavior before it affects your organization.[8] Big Data at the Edge: Much of the current discussion about big data analytics today focuses on managing and analyzing unstructured data from business and social sources such as e-mail, videos, tweets, Face book posts, reviews, and Web behavior. While this type of big data analytics promises to provide significant value to organizations, data generated at the edge of the network from sensors and other devices represents another huge, untapped resource with the potential to deliver insights that can transform the operations and strategic initiatives of public and private sector organizations. Data from intelligent systems and sensors is some of the largest volume, fastest streaming, and/or most complex big data. The data sources are distributed across the network and data is collectedby an enormous variety of equipment, such as utility meters, traffic and security cameras, RFID readers, factory-line sensors, fitness machines, and medical devices. Ubiquitous connectivity and the growth of sensors and intelligent systems have opened up a whole new storehouse of valuable information. Edge data can provide significant value to both the private and public sector as a source of enormous potential for gaining deeper, richer insight faster and more cost-effectively than in the past. In many cases, analysis of edge data can help organizations respond to events and solve problems that were previously out of reach.[9]
  • 11. Big Data: Opportunities and Challenges: In the distributed systems world, “Big Data” started to become a major issue in the late 1990‟s due to the impact of the world-wide Web and a resulting need to index and query its rapidly mushrooming content. Database technology (including parallel databases) was considered for the task, but was found to be neither well-suited nor cost-effective for those purposes. The turn of the millennium then brought further challenges as companies began to use information such as the topology of the Web and users‟ search histories in order to provide increasingly useful search results, as well as more effectively-targeted advertising to display alongside and fund those results. Google‟s technical response to the challenges of Web-scale data management and analysis was simple, by database standards, but kicked off what has become the modern “Big Data” revolution in the systems world. To handle the challenge of Web-scale storage, the Google File System (GFS) was created. GFS provides clients with the familiar OS-level byte-stream abstraction, but it does so for extremely large files whose content can span hundreds of machines in shared-nothing clusters created using inexpensive commodity hardware. To handle the challenge of processing the data in such large files, Google pioneered its Map Reduce programming model and platform. This model, characterized by some as “parallel programming for dummies”, enabled Google‟s developers to process large collections of data by writing two user-defined functions, map and reduce, that the Map Reduce framework applies to the instances (map) and sorted groups of instances that share a common key (reduce) – similar to the sort of partitioned parallelism utilized in shared-nothing parallel query processing. Driven by very similar requirements, software developers at Yahoo!, Facebook, and other large Web companies followed suit. Taking Google‟s GFS and Map Reduce papers as rough technical specifications, open-source equivalents were developed, and the Apache Hadoop Map Reduce platform and its underlying file system (HDFS, the Hadoop Distributed File System) were born. The Hadoop system has quickly gained traction, and it is now widely used for use cases including Web indexing, clickstream and log analysis, and certain large-scale information extraction and machine learning tasks. Soon tired of the low-level nature of the Map Reduce programming model, the Hadoop community developed a set of higher-level declarative languages for writing queries and data analysis pipelines that are compiled into Map Reduce jobs and then executed on the Hadoop Map Reduce platform. Popular languages include Pig from Yahoo!, Jaql from IBM, and Hive from Facebook. Pig is relational-algebra-like in nature, and is reportedly used for over 60% of Yahoo!‟sMapReduce use cases; Hive is SQL-inspired and reported to be used for over 90% of the Facebook Map Reduce use cases. Microsoft‟s technologies include a parallel runtime system called Dryad and two higher-level programming models, Dryad LINQ and the SQLlike SCOPE language, which utilizes Dryad under the covers. Interestingly, Microsoft has also recently announced that its future “Big Data” strategy includes support for Hadoop.[4]
  • 12. Big Data Storage: We live in on-demand, on-command Digital universe with data prolifering by Institutions, Individuals and Machines at a very high rate. This data is categories as "Big Data" due to its sheer Volume, Variety, Velocity and Veracity. Most of this data is unstructured, quasi structured or semi structured and it is heterogeneous in nature. The volume and the heterogeneity of data with the speed it is generated, makes it difficult for the present computing infrastructure to manage Big Data. Traditional data management, warehousing and analysis systems fall short of tools to analyze this data. Due to its specific nature of Big Data, it is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. Map Reduce is widely been used for the efficient analysis of Big Data. Traditional DBMS techniques like Joins and Indexing and other techniques like graph search is used for classification and clustering of Big Data. These techniques are being adopted to be used in Map Reduce. Map Reduce framework over Hadoop Distributed File System (HDFS). Map Reduce is a Minimization technique which makes use of file indexing with mapping, sorting, shuffling and finally reducing. Map Reduce techniques have been studied at in this paper which is implemented for Big Data analysis using HDFS.[11][12]
  • 13. Big Data Processing: Big Data encompasses everything from click stream data from the web to genomic and proteomic data from biological research and medicines. Big Data is a heterogeneous mix of data both structured (traditional datasets –in rows and columns like DBMS tables, CSV's and XLS's) and unstructured data like e-mail attachments, manuals, images, PDF documents, medical records such as x-rays, ECG and MRI images, forms, rich media like graphics, video and audio, contacts, forms and documents. Businesses are primarily concerned with managing unstructured data, because over 80 percent of enterprise data is unstructured and require significant storage space and effort to manage.“Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyse. Big data analytics is the area where advanced analytic techniques operate on big data sets. It is really about two things, Big data and Analytics and how the two have teamed up to create one of the most profound trends in business intelligence (BI) . Map Reduce by itself is capable for analysing large distributed data sets; but due to the heterogeneity, velocity and volume of Big Data, it is a challenge for traditional data analysis and management tools. A problem with Big Data is that they use NoSQL and has no Data Description Language (DDL) and it supports transaction processing. Also, web-scale data is not universal and it is heterogeneous. For analysis of Big Data, database integration and cleaning is much harder than the traditional mining approaches. Parallel processing and distributed computing is becoming a standard procedure which are nearly non-existent in RDBMS. Map Reduce has following characteristics; it supports Parallel and distributed processing, it is simple and its architecture is shared-nothing which has commodity diverse hardware (big cluster).Its functions are programmed in a high-level programming language (e.g. Java, Python) and it is flexible. Query processing is done through NoSQL integrated in HDFS as Hive tool. Analytics helps to discover what has changed and the possible solutions. Second, advanced analytics is the best way to discover more business opportunities, new customer
  • 14. segments, identify the best suppliers, associate products of affinity, understand sales seasonality etc.[5][13] Benefits of Big Data:  Understand customer need better.  Reduce cost.  Make processes more efficient.  Detect risks and check fraud.[14] Drawbacks of Big Data: o High Maintenance. o Skill needed to access Data. o Difficult to Handle. o Violates the Privacy Principle.[14] Applications of Big Data:  Government.  International development  Manufacturing  Cyber-Physical Models  Media  Technology  Private sector  Science and Research.[15]
  • 15. Conclusion: Big Data analysis tools like Map Reduce over Hadoop and HDFS, promises to help organizations better understand their customers and the marketplace, hopefully leading to better business decisions and competitive advantages. The need to process enormous quantities of data has never been greater. Not only are terabyte- and petabyte-scale datasets rapidly becoming commonplace, but there is consensus that great value lies buried in them, waiting to be unlocked by the right computational tools. In the commercial sphere, business intelligence, driven by the ability to gather data from a dizzying array of sources. For engineers building information processing tools and applications, large and heterogeneous datasets which are generating continuous flow of data, lead to more effective algorithms for a wide range of tasks, from machine translation to spam detection. In the natural and physical sciences, the ability to analyse massive amounts of data may provide the key to unlocking the secrets of the cosmos or the mysteries of life. MapReduce can be exploited to solve a variety of problems related to text processing at scales that would have been unthinkable a few years ago. We regard Big Data as an emerging trend and the need for Big Data is arising in all science and engineering domains. With Big Data technologies, we will hopefully be able to provide most relevant and most accurate social sensing feedback to better understand our society at realtime. We can further stimulate the participation of the public audiences in the data production circle for societal and economical events.The development of big data extends the scope of human activities. It demands proper attention from academia, industry and government. The world has been cooperating and integrating on a global scale. Human is enforced to change mode from the local to the global in their everyday life and work. It redefines the relationship among individuals, businesses, organizations, governments, and societies through networked thinking and further to improve the human living environment, to enhance the quality of public services, to improve performance, efficiency and productivity through the intelligentized interactive operating. The technological progress and industrial upgrading of big data
  • 16. will create new markets, new business models and new industry rules, and more importantly it demonstrates the collective will of acountry that looking for strategic advantage. Although there isstill a large gap to gain data intelligence like human wisdom big data is a promising topic and it certainly helps us to understand the world from an entirely new aspect.[16] REFRENCES: [1].Ericsson White paper* Ericsson, Ericsson Mobility Report, February 2015, available at: http://www.ericsson.com/res/docs/2015/ericsson-mobility-report-feb-2015-interim.pdf [2].NESSI White Paper* * Big Data A New World of Opportunities* [3].Book: "Big Data for Beginners" by Alonzo Williams,Stepanie Foor. [4].Puneet Singh Duggal ,Sanchita Paul , “Big Data Analysis : Challenges and Solutions” , International Conference On Cloud, Big Data and Trust 2013 , Nov 13-15 , RGPV. [5].Prashant Kumar, Khushboo Pandey, “Big Data and Distributed Data Mining: An Example of Future Networks”, Volume 1, Issue 2 (2013) 36-39 International Journal of Advance Research and Innovation. [6].Book: "Understanding Big Data: A Beginners Guide to Data Science & the Business Applications" by Eileen McNulty-Holmes. [7].Book: "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information" by Jules J. Berman. [8].http://www.sas.com/en_th/insights/big-data/what-is-big-data.html
  • 17. [9].Feng Ye, Zhijian Wang, Fachao Zhou, Yapu Wang, Yuanchao Zhou, “Cloud –based Big Data Mining &Analyzing Services Platform integrating R”, 2013 International Conference on Advanced Cloud and Big Data. [10].Hanna Yang, Minjeong Park, Minsu Cho, Minseok Song, Seongjoo Kim, “A System Architecture for Manufacturing Process Analysis based on Big Data and Process Mining Techniques”, 2014 IEEE International Conference on Big Data. [11].Xindong Wu , Fellow , IEEE , Xingquan Zhu , Senior Member , IEEE , Gong-Qing Wu , and Wei Ding , Senior Member, IEEE , “Data Mining with Big Data”, IEEE Transactions on knowledge and Data Engineering, VOL. 26, NO. 1 , January 2014. [12].Sandy Moens ,EminAksehirli , Bart Goethals , “Frequent Itemset Mining for Big Data”, 2013 IEEE International Conference on Big Data. [13].Carson Kai-Sang Leung, Fan Jiang, “A Data Science Solution for Mining Interesting Patterns from Uncertain Big Data”, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing. [14].http://www.oii.ox.ac.uk/research/project/?id=98. [15].https://www.google.co.in/#q=applications+of+big+data+wikipedia. [16].Jason Venner, Pro Hadoop: Build Scalable, distributed applications in the cloud ,ISBN-13 (pbk): 978-1-4302-1942-2, ISBN-13(electronic): 978-1-4302-1943-9.