An Comprehensive Study of Big Data Environment and its Challenges.

Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data

ISSN (e): 2250 – 3005 || Volume, 06 || Issue, 03||March – 2016 ||
International Journal of Computational Engineering Research (IJCER)
www.ijceronline.com Open Access Journal Page 62
An Comprehensive Study of Big Data Environment and its
Challenges.
Saravanan.C
Assistant Professor, Department of Master of Computer Applications, R.V.College of Engineering, Bangalore
I. Introduction
Big data is a term that refers to data sets or combinations of data sets whose size, complexity and rate of growth
make them difficult to be captured, managed, and processed by conventional technologies and tools, within the
time necessary to make them useful [1]. Big Data are high-volume, high-velocity, and high-variety information
assets that require new forms of processing to enable enhanced decision making and process optimization [2].
There is no clear definition for ‗Big Data‘. It is defined based on some of its characteristics. There are three
basic characteristics that can be used for defining big data namely volume, variety, and velocity [3] (figure 3).
Volume: It refers to size of the data such as Terabytes (TB), Petabytes (PB), Zettabytes (ZB), etc.
Variety : It refers to different types of data and sources of data. The different mediums that will produce big
data are sensors, devices, social networks, the web, mobile phones, etc.
Velocity: It refers to the frequency of data generation. and how data is generated frequently For example, every
millisecond, second, minute, hour, day, week, month, year. Processing frequency may also differ from the user
requirements.
When big data is effectively and efficiently captured, processed, and analyzed, companies are able to gain more
complete understanding of their business, customers, products, competitors which can lead to efficiency
improvements, increased sales, lower costs, improved customer service, and improved products and services.
II. Big Data Generation.
In Earlier generations, fairly small volume of data generated was available through limited number of channels.
Today an huge amount of data is frequently being produced and flowing from various sources, through different
channels, every minute in today‘s digital age [4].
Traditionally few Companies were generating data and all others were consuming data. Now the scenario has
been changed. All of us are generating data and everyone is consuming data. Big Data is generated by many
channels such as Social Media and Networks like Facebook, twitter, YouTube Flickr ,Scientific instruments
Mobile devices and Sensor Technology and networks
Figure 1. Shows the different channels that produce big data.
ABSTRACT
Big Data is a data analysis methodology enabled by recent advances in technologies and
Architecture. Big data is a massive volume of both structured and unstructured data, which is so large
that it's difficult to process with traditional database and software techniques. This paper provides
insight to Big data and discusses its nature, definition that include such features as Volume, Velocity,
and Variety .This paper also provides insight to source of big data generation, tools available for
processing large volume of variety of data, applications of big data and challenges involved in
handling big data .
Keywords: Big data, Challenges, Evolution, Structured data, unstructured data, Traditional
database.
An Comprehensive Study Of Big Data…
www.ijceronline.com Open Access Journal Page 63
Figure 1. Different Medium generating Big data.
The data generated is off different format namely Structured data ,Semi structure data and Unstructured data.
Data that resides in fixed fields is termed as Structured data. Data that does not exist in fixed fields is
unstructured data. Data that do not adapt to fixed fields but contain tags and other markers to separate data
elements is semi structured data. According to Gartner Research, it is predicted that enterprise data will grow
by 800 percent in five years, with 80 percent of data to be unstructured.[5]. According to recent survey of
Gartner, September ,2015 It presents that more than 75 Percent of Companies are Investing or Planning to Invest
in Big Data in the Next Two Years. Figure 2 shows the percentage of structured, unstructured and semi
structured digital data.
Figure 2. Percentage of Structured, Semi structure and unstructured data
to be generated in next 5 years
Table 1 describes the different data format and their sources from which the data is being generated.
Table 1. Sources of Data format
Sl.No Data Format Sources
1 Structured Data
Access, SQL, Spread sheet ,
Online Transaction processing (OLTP).
2 Semi Structured Data
Blog, XML, Emails, Electronic data Interchange
[EDI]
3 Unstructured Data
Social media data, chat messages, Location stream
data, Streamed Video, Audio/Music files, Images,
Sensor data
Global data volumes—flowing from social Web sites, sensors, smartphones are increasing faster than every two
years[6].The Amount of data stored in a country differs from one country to another. Fig 3 shows amount of
new data stored across the globe[7].
An Comprehensive Study Of Big Data…
www.ijceronline.com Open Access Journal Page 64
Figure 3:Amount of new data stored across globe
SOURCE:IDC storage reports; McKinsey Global Institute analysis
III. Tools Available for Big Data
There are various tools available for handling big data
3.1 Hadoop.: This is open source software platform helpful in storing and managing vast amounts of data
cheaply and efficiently. It has two main parts –
i) Data processing framework
ii) Distributed file system for data storage.
The distributed file system is array of storage clusters i,e the Hadoop component that holds the actual data
The data processing framework is the tool used to work with the data itself. By default, this is the Java-based
system known as Map Reduce. It is a general programming model and an associated implementation for
generating and processing large data sets. [8]
3.2 NOSQL
NoSQL databases are fast becoming an essential tool for managing big data while databases based on the
relational model guarantee certain properties which at first sight might seem more important or even necessary
nowadays it is impossible to handle certain volumes without relaxing some of them. It is precisely out of this
relaxing and out of the need to provide other properties that a new data management paradigm has arisen:
NoSQL. There are four different forms of NOSQL technologies namely Key value , document store, wide
column stores and graph databases.
Key-values Stores : This database uses hash table where there is a unique key and a pointer to a particular item
of data. The Key/value model is the simplest and easiest to implement. Tokyo, Redis, Voldemort, Oracle BDB,
Amazon Simple DB, Riak are few examples of Key value Stores databases.
3.3 Wide Column store
This type of database were created to store, process very large amounts of data distributed over many machines.
There are still keys but they point to multiple columns. Cassandra which is open source Nosql database [9] and
Hbase are examples of wide column store databases.
3.4 Document Databases
These databases are similar to key-value stores. The model is basically versioned documents that are collections
of other key-value collections. The semi-structured documents are stored in formats like JSON. CouchDB[12]
and MongoDb which is open source Nosql database[9] are examples for this type of databases.
3.5 Graph Databases
This type of database are built with nodes, relationships between notes and the properties of nodes. Instead of
tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which can scale across
many machines. Neo4J, InfoGrid, Infinite Graph are examples of Graph databases.
IV .Applications
Big data technology has become part and parcel of our life. It can be applied in every fields namely
Optimization of IT Infrastructure ,Social Network Analysis, Optimization of Traffic flow in a city ,Web app
Optimization, Natural resource exploration, Weather Forecasting ,Health care outcomes, Fraud detection, Life
science research, Advertising analysis, Equipment monitoring, smart meter monitoring and so on.
An Comprehensive Study Of Big Data…
www.ijceronline.com Open Access Journal Page 65
V. Challenges In Handling Big Data
There are various challenges to be addressed when handling big data. Some of the inherited challenges in big
data are capture, storage, search, analysis, and virtualization. The other challenges are discussed below
i. Organizational Challenge
The management of the organization often lacks the understanding of the value in big data as well as how to
solve this value. Many organizations do not have the talent pool or experts to derive insights from the new
technology big data .According to Business Intelligence and Information Management Survey of 541 business
technology professionals, October 2012, 31% people are not sure how big data analytics will create business
opportunities for their business organization[10]
ii. Lack of Analytical Skills
There is scarcity of analytical and managerial skills to make best use of big data. It is found that United States
alone faces a shortage of 1,40,000 to 1,90,000 people with deep analytical skills as well as 1.5 million managers
and analysts to analyze big data and make decisions based on their findings.[11].According to Business
Intelligence and Information Management Survey of 541 business technology professionals, October 2012,
around 38% people find scarcity in big data expertise is scarce and find big data is expensive.[10].
iii. Lack of Analytical tools in big data platform
Big data can be managed by tools like Hadoop and Nosql. Analytical tools are lacking for big data
platforms.Hadoop and nosql technologies also lack management features [10].
iv. Design hurdles
The barrier is in technology i,e new architecture, algorithms, techniques are needed to handle or manage big
data.
v. Data discovery
Large volumes of different data can be combined and explored freely using a visual analytic development
environment. Data discovery provides a new insight to use the enriched data to make better-informed decisions.
The challenge lies in finding out high quality data from the vast collections of data that is available.
vi. Data volume
The capacity to process the high volume of data at an adequate speed so that the information is available to
decision makers when they are in need of it.
vii. Data integration
The capacity to integrate data which is dissimilar in nature,quickly with in short period of time from various
source at reasonable cost is challenging task
VI. Conclusion
In recent times, big data has invited a lot of attention from academia, industry, public as well as
government. The increasing volume and details of information captured by enterprises, the rise of multimedia,
social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.
Improvements in professional data management will result in better science and will benefit the social
community. A proposed direction for the future work could be the study of big data analytics, which has become
an immense tool affecting every part of the economy.
References
[1]. Mukherjee A. Datta, J. , Jorapur, R. ,Singhvi, R., Haloi, S., Akram, W. ―Shared disk big data analytics
with Apache Hadoop‖ High Performance Computing (HiPC), 2012 19th
International Conference.
[2] Douglas and Laney, ―The importance of ‗big data‘: A definition,‖ 2008.
[3] D. Laney, ―Application Delivery Strategies‖, http://blogs.gartner.com/douglaney/files/2012/01/ad949-
3D-Data- Management-Controlling Business Intelligence and Information Management Survey
Data-Volume-Velocity-and-‗ Variety.pdf
[4] King, Gary. ―Ensuring the Data-Rich Future of Social Science.‖ Science Mag 331, Feb 2011, 719-721 .
[5] Win with Advanced Business Analytics.Creating Business Value from Your Data, Wiley and SAS
Business Series,Oct12.
[6]. Jacques Bughin, Michael Chui, and James Manyik,‖Ten IT-enabled business trends for the decade
ahead‖, McKinsey & Company, Tech.rep June 2013.
An Comprehensive Study Of Big Data…
www.ijceronline.com Open Access Journal Page 66
[7]. James Manyika et al., "Big data: The next frontier for innovation, competition, and productivity,"
McKinsey Global
Institute, Tech. rep. May 2011.
[8] JeffreyDean,SanjayGhemawat, ‖ MapReduce: Simplified Data Processing on Large Clusters‖ USENIX
Association OSDI ‘
04: 6th
Symposium On Operating Systems Design and Implementation
[9] Van der Veen, J.S , van der Waaij. B , Meijer, R.J. Sensor Data Storage Performance: SQL or NoSQL,
Physical or Virtual.
In Proceedings of IEEE 5th International Conference on cloud Computing (CLOUD 2012),
Nice,France ,22-27 July 2012;
pp. 431–438
[10]. InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541
business technology
professionals, October 2012
[11] Big data: The next frontier for innovation, ompetition and productivity , McKinsey Global Institute,
June 2011.
[12]. ―Post-relational data management for Ineractive software systems, NoSQL database Technology,
www.Couchbase.com.

Recommandé

A Review on Classification of Data Imbalance using BigData par
A Review on Classification of Data Imbalance using BigDataA Review on Classification of Data Imbalance using BigData
A Review on Classification of Data Imbalance using BigDataIJMIT JOURNAL
49 vues14 diapositives
Data mining & big data presentation 01 par
Data mining & big data presentation 01Data mining & big data presentation 01
Data mining & big data presentation 01Aseem Chakrabarthy
1.4K vues15 diapositives
BIG Data and Methodology-A review par
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
4.7K vues5 diapositives
Moving Toward Big Data: Challenges, Trends and Perspectives par
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
96 vues6 diapositives
Elementary Concepts of data minig par
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minigDr Anjan Krishnamurthy
944 vues16 diapositives
Intro to big data and applications - day 1 par
Intro to big data and applications - day 1Intro to big data and applications - day 1
Intro to big data and applications - day 1Parviz Vakili
92 vues34 diapositives

Contenu connexe

Tendances

Big data par
Big dataBig data
Big dataHarsh Kishore Mishra
8.6K vues24 diapositives
Research issues in the big data and its Challenges par
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
1.4K vues15 diapositives
Data Mining and Big Data Challenges and Research Opportunities par
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesKathirvel Ayyaswamy
3.2K vues118 diapositives
Big Data & Data Mining par
Big Data & Data MiningBig Data & Data Mining
Big Data & Data MiningMd Mizanur Rahman
768 vues17 diapositives
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
51 vues13 diapositives
challenges of big data to big data mining with their processing framework par
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkKamleshKumar394
343 vues12 diapositives

Tendances(19)

Research issues in the big data and its Challenges par Kathirvel Ayyaswamy
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
Data Mining and Big Data Challenges and Research Opportunities par Kathirvel Ayyaswamy
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par ijdpsjournal
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
ijdpsjournal51 vues
challenges of big data to big data mining with their processing framework par KamleshKumar394
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing framework
KamleshKumar394343 vues
Lecture1 introduction to big data par hktripathy
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy5K vues
Big data issues challenges tools n good practices par Ravi Ganghas
Big data issues challenges tools n good practicesBig data issues challenges tools n good practices
Big data issues challenges tools n good practices
Ravi Ganghas3.1K vues
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I... par IRJET Journal
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...
IRJET-Unraveling the Data Structures of Big data, the HDFS Architecture and I...
IRJET Journal23 vues
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA par IJMIT JOURNAL
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATAA REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
A REVIEW ON CLASSIFICATION OF DATA IMBALANCE USING BIGDATA
IJMIT JOURNAL17 vues
Introduction to-data-mining chapter 1 par Mahmoud Alfarra
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
Mahmoud Alfarra1.7K vues
Big Data Paradigm - Analysis, Application and Challenges par Uyoyo Edosio
Big Data Paradigm - Analysis, Application and ChallengesBig Data Paradigm - Analysis, Application and Challenges
Big Data Paradigm - Analysis, Application and Challenges
Uyoyo Edosio4.3K vues

En vedette

Experimental Investigation of Multi Aerofoil Configurations Using Propeller T... par
Experimental Investigation of Multi Aerofoil Configurations Using Propeller T...Experimental Investigation of Multi Aerofoil Configurations Using Propeller T...
Experimental Investigation of Multi Aerofoil Configurations Using Propeller T...ijceronline
38 vues3 diapositives
Automation of Wheelchair Using Iris Movement par
Automation of Wheelchair Using Iris MovementAutomation of Wheelchair Using Iris Movement
Automation of Wheelchair Using Iris Movementijceronline
41 vues4 diapositives
Textile application in bio-potential recording used for medicinal purposes: a... par
Textile application in bio-potential recording used for medicinal purposes: a...Textile application in bio-potential recording used for medicinal purposes: a...
Textile application in bio-potential recording used for medicinal purposes: a...ijceronline
29 vues5 diapositives
On intuitionistic fuzzy 휷 generalized closed sets par
On intuitionistic fuzzy 휷 generalized closed setsOn intuitionistic fuzzy 휷 generalized closed sets
On intuitionistic fuzzy 휷 generalized closed setsijceronline
33 vues6 diapositives
A survey on multiple access technologies beyond fourth generation wireless co... par
A survey on multiple access technologies beyond fourth generation wireless co...A survey on multiple access technologies beyond fourth generation wireless co...
A survey on multiple access technologies beyond fourth generation wireless co...ijceronline
37 vues8 diapositives
Robotic Soldier with EM Gun using Bluetooth Module par
Robotic Soldier with EM Gun using Bluetooth ModuleRobotic Soldier with EM Gun using Bluetooth Module
Robotic Soldier with EM Gun using Bluetooth Moduleijceronline
19 vues3 diapositives

En vedette(20)

Experimental Investigation of Multi Aerofoil Configurations Using Propeller T... par ijceronline
Experimental Investigation of Multi Aerofoil Configurations Using Propeller T...Experimental Investigation of Multi Aerofoil Configurations Using Propeller T...
Experimental Investigation of Multi Aerofoil Configurations Using Propeller T...
ijceronline38 vues
Automation of Wheelchair Using Iris Movement par ijceronline
Automation of Wheelchair Using Iris MovementAutomation of Wheelchair Using Iris Movement
Automation of Wheelchair Using Iris Movement
ijceronline41 vues
Textile application in bio-potential recording used for medicinal purposes: a... par ijceronline
Textile application in bio-potential recording used for medicinal purposes: a...Textile application in bio-potential recording used for medicinal purposes: a...
Textile application in bio-potential recording used for medicinal purposes: a...
ijceronline29 vues
On intuitionistic fuzzy 휷 generalized closed sets par ijceronline
On intuitionistic fuzzy 휷 generalized closed setsOn intuitionistic fuzzy 휷 generalized closed sets
On intuitionistic fuzzy 휷 generalized closed sets
ijceronline33 vues
A survey on multiple access technologies beyond fourth generation wireless co... par ijceronline
A survey on multiple access technologies beyond fourth generation wireless co...A survey on multiple access technologies beyond fourth generation wireless co...
A survey on multiple access technologies beyond fourth generation wireless co...
ijceronline37 vues
Robotic Soldier with EM Gun using Bluetooth Module par ijceronline
Robotic Soldier with EM Gun using Bluetooth ModuleRobotic Soldier with EM Gun using Bluetooth Module
Robotic Soldier with EM Gun using Bluetooth Module
ijceronline19 vues
A NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONS par ijceronline
A NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONSA NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONS
A NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONS
ijceronline18 vues
Maintenance and Performance Analysis of Draglines Used In Mines par ijceronline
Maintenance and Performance Analysis of Draglines Used In MinesMaintenance and Performance Analysis of Draglines Used In Mines
Maintenance and Performance Analysis of Draglines Used In Mines
ijceronline34 vues
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple... par ijceronline
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
ijceronline47 vues
Market Challenges for Pumped Storage Hydropower Plants par ijceronline
Market Challenges for Pumped Storage Hydropower PlantsMarket Challenges for Pumped Storage Hydropower Plants
Market Challenges for Pumped Storage Hydropower Plants
ijceronline35 vues
Double feedback technique for reduction of Noise LNA with gain enhancement par ijceronline
Double feedback technique for reduction of Noise LNA with gain enhancementDouble feedback technique for reduction of Noise LNA with gain enhancement
Double feedback technique for reduction of Noise LNA with gain enhancement
ijceronline23 vues
Implementation of Linear Controller for a DC-DC Forward Converter par ijceronline
Implementation of Linear Controller for a DC-DC Forward ConverterImplementation of Linear Controller for a DC-DC Forward Converter
Implementation of Linear Controller for a DC-DC Forward Converter
ijceronline25 vues
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme... par ijceronline
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
ijceronline38 vues
Secure Spectrum Sensing In Cognitive Radio Sensor Networks: A Survey par ijceronline
Secure Spectrum Sensing In Cognitive Radio Sensor Networks: A SurveySecure Spectrum Sensing In Cognitive Radio Sensor Networks: A Survey
Secure Spectrum Sensing In Cognitive Radio Sensor Networks: A Survey
ijceronline34 vues
OTSU Thresholding Method for Flower Image Segmentation par ijceronline
OTSU Thresholding Method for Flower Image SegmentationOTSU Thresholding Method for Flower Image Segmentation
OTSU Thresholding Method for Flower Image Segmentation
ijceronline48 vues
A NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONS par ijceronline
A NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONSA NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONS
A NOVEL BOOTH WALLACE MULTIPLIER FOR DSP APPLICATIONS
ijceronline19 vues
Experimental Study of Some Agro-Industrial Residues Targeted As Partial Subst... par ijceronline
Experimental Study of Some Agro-Industrial Residues Targeted As Partial Subst...Experimental Study of Some Agro-Industrial Residues Targeted As Partial Subst...
Experimental Study of Some Agro-Industrial Residues Targeted As Partial Subst...
ijceronline15 vues
Reactive Power Compensation in 132kv & 33kv Grid of Narsinghpur Area par ijceronline
Reactive Power Compensation in 132kv & 33kv Grid of Narsinghpur AreaReactive Power Compensation in 132kv & 33kv Grid of Narsinghpur Area
Reactive Power Compensation in 132kv & 33kv Grid of Narsinghpur Area
ijceronline34 vues
Lightning Acquisition and Processing On Sensor Node Using NI cRIO par ijceronline
Lightning Acquisition and Processing On Sensor Node Using NI cRIOLightning Acquisition and Processing On Sensor Node Using NI cRIO
Lightning Acquisition and Processing On Sensor Node Using NI cRIO
ijceronline51 vues

Similaire à An Comprehensive Study of Big Data Environment and its Challenges.

RESEARCH IN BIG DATA – AN OVERVIEW par
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
381 vues20 diapositives
RESEARCH IN BIG DATA – AN OVERVIEW par
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal
5 vues20 diapositives
RESEARCH IN BIG DATA – AN OVERVIEW par
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEWieijjournal1
52 vues20 diapositives
Research in Big Data - An Overview par
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overviewieijjournal
61 vues20 diapositives
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO par
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIOA REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIORobin Beregovska
4 vues8 diapositives
Big Data A Review par
Big Data A ReviewBig Data A Review
Big Data A Reviewijtsrd
10 vues7 diapositives

Similaire à An Comprehensive Study of Big Data Environment and its Challenges.(20)

RESEARCH IN BIG DATA – AN OVERVIEW par ieijjournal
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal381 vues
RESEARCH IN BIG DATA – AN OVERVIEW par ieijjournal
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal5 vues
RESEARCH IN BIG DATA – AN OVERVIEW par ieijjournal1
RESEARCH IN BIG DATA – AN OVERVIEWRESEARCH IN BIG DATA – AN OVERVIEW
RESEARCH IN BIG DATA – AN OVERVIEW
ieijjournal152 vues
Research in Big Data - An Overview par ieijjournal
Research in Big Data - An OverviewResearch in Big Data - An Overview
Research in Big Data - An Overview
ieijjournal61 vues
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO par Robin Beregovska
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIOA REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
Big Data A Review par ijtsrd
Big Data A ReviewBig Data A Review
Big Data A Review
ijtsrd10 vues
AN ANALYTICAL REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO par Dereck Downing
AN ANALYTICAL REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIOAN ANALYTICAL REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
AN ANALYTICAL REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
Big data is a broad term for data sets so large or complex that tr.docx par hartrobert670
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docx
hartrobert6704 vues
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU par Love Arora
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Love Arora826 vues
Big data - what, why, where, when and how par bobosenthil
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil6.2K vues
Big Data Mining - Classification, Techniques and Issues par Karan Deep Singh
Big Data Mining - Classification, Techniques and IssuesBig Data Mining - Classification, Techniques and Issues
Big Data Mining - Classification, Techniques and Issues
Karan Deep Singh459 vues
The Big Data Importance – Tools and their Usage par IRJET Journal
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
IRJET Journal32 vues
Data modeling techniques used for big data in enterprise networks par Dr. Richard Otieno
Data modeling techniques used for big data in enterprise networksData modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networks
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC... par ijdpsjournal
LEVERAGING CLOUD BASED BIG DATA ANALYTICS  IN KNOWLEDGE MANAGEMENT FOR ENHANC...LEVERAGING CLOUD BASED BIG DATA ANALYTICS  IN KNOWLEDGE MANAGEMENT FOR ENHANC...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC...
ijdpsjournal5 vues
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par ijdpsjournal
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
ijdpsjournal20 vues
introduction to data science par Johnson Ubah
introduction to data scienceintroduction to data science
introduction to data science
Johnson Ubah36 vues
Big data service architecture: a survey par ssuser0191d4
Big data service architecture: a surveyBig data service architecture: a survey
Big data service architecture: a survey
ssuser0191d470 vues

Dernier

Literature review and Case study on Commercial Complex in Nepal, Durbar mall,... par
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...AakashShakya12
72 vues115 diapositives
Object Oriented Programming with JAVA par
Object Oriented Programming with JAVAObject Oriented Programming with JAVA
Object Oriented Programming with JAVADemian Antony D'Mello
140 vues28 diapositives
DevOps-ITverse-2023-IIT-DU.pptx par
DevOps-ITverse-2023-IIT-DU.pptxDevOps-ITverse-2023-IIT-DU.pptx
DevOps-ITverse-2023-IIT-DU.pptxAnowar Hossain
9 vues45 diapositives
What is Whirling Hygrometer.pdf par
What is Whirling Hygrometer.pdfWhat is Whirling Hygrometer.pdf
What is Whirling Hygrometer.pdfIIT KHARAGPUR
12 vues3 diapositives
GDSC Mikroskil Members Onboarding 2023.pdf par
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdfgdscmikroskil
51 vues62 diapositives
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx par
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptxlwang78
53 vues19 diapositives

Dernier(20)

Literature review and Case study on Commercial Complex in Nepal, Durbar mall,... par AakashShakya12
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
Literature review and Case study on Commercial Complex in Nepal, Durbar mall,...
AakashShakya1272 vues
GDSC Mikroskil Members Onboarding 2023.pdf par gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil51 vues
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx par lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang7853 vues
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L... par Anowar Hossain
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
DevOps to DevSecOps: Enhancing Software Security Throughout The Development L...
Anowar Hossain13 vues
Effect of deep chemical mixing columns on properties of surrounding soft clay... par AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... par AltinKaradagli
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) par Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)

An Comprehensive Study of Big Data Environment and its Challenges.

  • 1. ISSN (e): 2250 – 3005 || Volume, 06 || Issue, 03||March – 2016 || International Journal of Computational Engineering Research (IJCER) www.ijceronline.com Open Access Journal Page 62 An Comprehensive Study of Big Data Environment and its Challenges. Saravanan.C Assistant Professor, Department of Master of Computer Applications, R.V.College of Engineering, Bangalore I. Introduction Big data is a term that refers to data sets or combinations of data sets whose size, complexity and rate of growth make them difficult to be captured, managed, and processed by conventional technologies and tools, within the time necessary to make them useful [1]. Big Data are high-volume, high-velocity, and high-variety information assets that require new forms of processing to enable enhanced decision making and process optimization [2]. There is no clear definition for ‗Big Data‘. It is defined based on some of its characteristics. There are three basic characteristics that can be used for defining big data namely volume, variety, and velocity [3] (figure 3). Volume: It refers to size of the data such as Terabytes (TB), Petabytes (PB), Zettabytes (ZB), etc. Variety : It refers to different types of data and sources of data. The different mediums that will produce big data are sensors, devices, social networks, the web, mobile phones, etc. Velocity: It refers to the frequency of data generation. and how data is generated frequently For example, every millisecond, second, minute, hour, day, week, month, year. Processing frequency may also differ from the user requirements. When big data is effectively and efficiently captured, processed, and analyzed, companies are able to gain more complete understanding of their business, customers, products, competitors which can lead to efficiency improvements, increased sales, lower costs, improved customer service, and improved products and services. II. Big Data Generation. In Earlier generations, fairly small volume of data generated was available through limited number of channels. Today an huge amount of data is frequently being produced and flowing from various sources, through different channels, every minute in today‘s digital age [4]. Traditionally few Companies were generating data and all others were consuming data. Now the scenario has been changed. All of us are generating data and everyone is consuming data. Big Data is generated by many channels such as Social Media and Networks like Facebook, twitter, YouTube Flickr ,Scientific instruments Mobile devices and Sensor Technology and networks Figure 1. Shows the different channels that produce big data. ABSTRACT Big Data is a data analysis methodology enabled by recent advances in technologies and Architecture. Big data is a massive volume of both structured and unstructured data, which is so large that it's difficult to process with traditional database and software techniques. This paper provides insight to Big data and discusses its nature, definition that include such features as Volume, Velocity, and Variety .This paper also provides insight to source of big data generation, tools available for processing large volume of variety of data, applications of big data and challenges involved in handling big data . Keywords: Big data, Challenges, Evolution, Structured data, unstructured data, Traditional database.
  • 2. An Comprehensive Study Of Big Data… www.ijceronline.com Open Access Journal Page 63 Figure 1. Different Medium generating Big data. The data generated is off different format namely Structured data ,Semi structure data and Unstructured data. Data that resides in fixed fields is termed as Structured data. Data that does not exist in fixed fields is unstructured data. Data that do not adapt to fixed fields but contain tags and other markers to separate data elements is semi structured data. According to Gartner Research, it is predicted that enterprise data will grow by 800 percent in five years, with 80 percent of data to be unstructured.[5]. According to recent survey of Gartner, September ,2015 It presents that more than 75 Percent of Companies are Investing or Planning to Invest in Big Data in the Next Two Years. Figure 2 shows the percentage of structured, unstructured and semi structured digital data. Figure 2. Percentage of Structured, Semi structure and unstructured data to be generated in next 5 years Table 1 describes the different data format and their sources from which the data is being generated. Table 1. Sources of Data format Sl.No Data Format Sources 1 Structured Data Access, SQL, Spread sheet , Online Transaction processing (OLTP). 2 Semi Structured Data Blog, XML, Emails, Electronic data Interchange [EDI] 3 Unstructured Data Social media data, chat messages, Location stream data, Streamed Video, Audio/Music files, Images, Sensor data Global data volumes—flowing from social Web sites, sensors, smartphones are increasing faster than every two years[6].The Amount of data stored in a country differs from one country to another. Fig 3 shows amount of new data stored across the globe[7].
  • 3. An Comprehensive Study Of Big Data… www.ijceronline.com Open Access Journal Page 64 Figure 3:Amount of new data stored across globe SOURCE:IDC storage reports; McKinsey Global Institute analysis III. Tools Available for Big Data There are various tools available for handling big data 3.1 Hadoop.: This is open source software platform helpful in storing and managing vast amounts of data cheaply and efficiently. It has two main parts – i) Data processing framework ii) Distributed file system for data storage. The distributed file system is array of storage clusters i,e the Hadoop component that holds the actual data The data processing framework is the tool used to work with the data itself. By default, this is the Java-based system known as Map Reduce. It is a general programming model and an associated implementation for generating and processing large data sets. [8] 3.2 NOSQL NoSQL databases are fast becoming an essential tool for managing big data while databases based on the relational model guarantee certain properties which at first sight might seem more important or even necessary nowadays it is impossible to handle certain volumes without relaxing some of them. It is precisely out of this relaxing and out of the need to provide other properties that a new data management paradigm has arisen: NoSQL. There are four different forms of NOSQL technologies namely Key value , document store, wide column stores and graph databases. Key-values Stores : This database uses hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. Tokyo, Redis, Voldemort, Oracle BDB, Amazon Simple DB, Riak are few examples of Key value Stores databases. 3.3 Wide Column store This type of database were created to store, process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. Cassandra which is open source Nosql database [9] and Hbase are examples of wide column store databases. 3.4 Document Databases These databases are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. CouchDB[12] and MongoDb which is open source Nosql database[9] are examples for this type of databases. 3.5 Graph Databases This type of database are built with nodes, relationships between notes and the properties of nodes. Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which can scale across many machines. Neo4J, InfoGrid, Infinite Graph are examples of Graph databases. IV .Applications Big data technology has become part and parcel of our life. It can be applied in every fields namely Optimization of IT Infrastructure ,Social Network Analysis, Optimization of Traffic flow in a city ,Web app Optimization, Natural resource exploration, Weather Forecasting ,Health care outcomes, Fraud detection, Life science research, Advertising analysis, Equipment monitoring, smart meter monitoring and so on.
  • 4. An Comprehensive Study Of Big Data… www.ijceronline.com Open Access Journal Page 65 V. Challenges In Handling Big Data There are various challenges to be addressed when handling big data. Some of the inherited challenges in big data are capture, storage, search, analysis, and virtualization. The other challenges are discussed below i. Organizational Challenge The management of the organization often lacks the understanding of the value in big data as well as how to solve this value. Many organizations do not have the talent pool or experts to derive insights from the new technology big data .According to Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012, 31% people are not sure how big data analytics will create business opportunities for their business organization[10] ii. Lack of Analytical Skills There is scarcity of analytical and managerial skills to make best use of big data. It is found that United States alone faces a shortage of 1,40,000 to 1,90,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.[11].According to Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012, around 38% people find scarcity in big data expertise is scarce and find big data is expensive.[10]. iii. Lack of Analytical tools in big data platform Big data can be managed by tools like Hadoop and Nosql. Analytical tools are lacking for big data platforms.Hadoop and nosql technologies also lack management features [10]. iv. Design hurdles The barrier is in technology i,e new architecture, algorithms, techniques are needed to handle or manage big data. v. Data discovery Large volumes of different data can be combined and explored freely using a visual analytic development environment. Data discovery provides a new insight to use the enriched data to make better-informed decisions. The challenge lies in finding out high quality data from the vast collections of data that is available. vi. Data volume The capacity to process the high volume of data at an adequate speed so that the information is available to decision makers when they are in need of it. vii. Data integration The capacity to integrate data which is dissimilar in nature,quickly with in short period of time from various source at reasonable cost is challenging task VI. Conclusion In recent times, big data has invited a lot of attention from academia, industry, public as well as government. The increasing volume and details of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future. Improvements in professional data management will result in better science and will benefit the social community. A proposed direction for the future work could be the study of big data analytics, which has become an immense tool affecting every part of the economy. References [1]. Mukherjee A. Datta, J. , Jorapur, R. ,Singhvi, R., Haloi, S., Akram, W. ―Shared disk big data analytics with Apache Hadoop‖ High Performance Computing (HiPC), 2012 19th International Conference. [2] Douglas and Laney, ―The importance of ‗big data‘: A definition,‖ 2008. [3] D. Laney, ―Application Delivery Strategies‖, http://blogs.gartner.com/douglaney/files/2012/01/ad949- 3D-Data- Management-Controlling Business Intelligence and Information Management Survey Data-Volume-Velocity-and-‗ Variety.pdf [4] King, Gary. ―Ensuring the Data-Rich Future of Social Science.‖ Science Mag 331, Feb 2011, 719-721 . [5] Win with Advanced Business Analytics.Creating Business Value from Your Data, Wiley and SAS Business Series,Oct12. [6]. Jacques Bughin, Michael Chui, and James Manyik,‖Ten IT-enabled business trends for the decade ahead‖, McKinsey & Company, Tech.rep June 2013.
  • 5. An Comprehensive Study Of Big Data… www.ijceronline.com Open Access Journal Page 66 [7]. James Manyika et al., "Big data: The next frontier for innovation, competition, and productivity," McKinsey Global Institute, Tech. rep. May 2011. [8] JeffreyDean,SanjayGhemawat, ‖ MapReduce: Simplified Data Processing on Large Clusters‖ USENIX Association OSDI ‘ 04: 6th Symposium On Operating Systems Design and Implementation [9] Van der Veen, J.S , van der Waaij. B , Meijer, R.J. Sensor Data Storage Performance: SQL or NoSQL, Physical or Virtual. In Proceedings of IEEE 5th International Conference on cloud Computing (CLOUD 2012), Nice,France ,22-27 July 2012; pp. 431–438 [10]. InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012 [11] Big data: The next frontier for innovation, ompetition and productivity , McKinsey Global Institute, June 2011. [12]. ―Post-relational data management for Ineractive software systems, NoSQL database Technology, www.Couchbase.com.