SlideShare a Scribd company logo
1 of 7
Download to read offline
REPORT: BIG DATA
UNIVERSITY OF LEICESTER
Data Analysis for Business Intelligence MSc
P a g e 1 | 6
BIG DATA
We have heard the term “flood” been used with money, people or new
technologies. But this has led to define a new term which is related to flooding of
the data. Moore in 1965 described that number of transistors on dense IC
doubles approximately every two year. Which indeed is true and thus new
technologies which could fit in our hand and could be used as our personal
computer is invented. We are surrounded by electronic machine, we don’t realise
but we are monitored at some moment of our day by an electronic device that
can be our mobile phones, CCTV, weighting machine, computer and list never
ends. This in turns generate data and it so huge which made us define it as Big
Data.
Big Data is a vast repository of the data whose size is beyond the ability
of conventional data database. Size of such database cannot be defined; it is
growing at each second. For instance, famous social networking site facebook
collects 500+ terabytes of data every day. But this data is just collection of facts
and events of day to day life. These data doesn’t lie neither do it tell the truth.
We actually need to understand what can the data tells us. And that extracted
idea is called information. So information tells us that what does the data
means. This information is useless if we are not able to make use of it and use is
to change our action. So we need to provide insights that how will this
information be useful to achieve our goal.
Data have swept into every industry and business sectors. Mckinsey
Global Institute (MGI) estimated that an enterprise globally stores 7 exabytes of
data while consumer stores 6 exabytes in 2010. They have estimated that if US
health care use big data effectively, then potential value from data in this sector
could be more than $300 billion in value every year. European organisations
have approximately 11 exabytes of data, making efficient use of it can generate
nearly $149 billon in operational efficiency improvements. However in near term
there is huge potential to leverage big data in developing countries.
Organization can Leverage big data and improves its design and
functionality. Big data can create value in several ways; it can create
transparency by simply making data available to stakeholders in timely manner
which can create tremendous value. Also making data readily available to all the
departments can steeply reduce the search and processing time. It can also help
to experiment to discover the needs. As most of the data are stored in digital
format, one can discover if there is need or change that to be done on product.
Big data enables us to segment the populations according to ones need and to
help to deliver a customize action. It can also support in decision making with
automated algorithms. It can minimize risks, dig up the valuable Insight, for
instance tax agencies can use automated risk engines. Manufacturers are using
P a g e 2 | 6
data obtained from current product to improve their development of new ones.
Big data has created entirely new categories of companies, such as those that
aggregate and analyse industry data and provide useful information and insight
to manufacturing or financial company. Value of big data can be measured by
estimating the total value created from taking particular action with the use of
big data. But to capture full potential of big data several issues will have to be
addressed like; they have to consider the legal aspect in order to handle and
analyse data, there is no room for information breach which can result in serious
consequences. Organization which handles nation’s data need to be careful if
any of the information is exposed or not as this might have very high loss of the
nation. Also they have to be extremely careful when analysing any health care
data. As wrong prescription might cost someone’s life. Also company has to hire
new personnel’s who has understanding of big data.
Abundant variety of technologies have been develop that can be applied
to big data to get useful insight from it. Researchers continue to develop new
techniques to analyse the big data. Set of techniques that is used to extract
patterns from large database by combining methods from statistics, machine
learning and database management is called Data Mining. Technique which is
used is used to discover an interesting relationship among different variables in
large database is Association rule learning, which can be helped to determine
which products are frequently bought together. To study the buying strategy of a
customer or to determine the most consumed product them, a technique called
Classification is used. It categories the existing data and the new data can be
predicted using this already classified data, it is also called Unsupervised
Learning. Cluster Analysis is another statistical method that is used to group the
similar objects, whose characteristics of similarity is not known in advance. A
technique to collect large data from the crowd that are submitted through open
calls is known as Crowdsourcing. But analysing data from single source might
not be of great use. So it will be more efficient if multiple source is taken into
consideration, which is called Data fusion and data integration. Natural language
processing can be used to analyse data from social media websites like twitter,
facebook etc. Also idea of natural evolution that is “survival of the fittest” can be
used to optimize parameters of business or manufacturing models, called
Genetic Algorithm. Also a unique technique called spatial analysis is used to
analyse the geographic property which helps to make decision for selection of
manufacturing sites.
To support all the techniques of data mining many technologies have been
developed. Google developed Big Table to store data in compressed form on
Google File System. An application program which is used to report, analyse and
present data is Business Intelligence (BI). It reads data that have been
previously stored in a data warehouse and then enables to perform on it. Also
the computing paradigm which provide highly scalable computing resources
through the network is addressed as cloud computing. An open source software
P a g e 3 | 6
framework for processing huge datasets on distributed system is managed by
Apache Software Foundation which is named as Hadoop. Data can be structured
that is, data which resides in fixed fields like in spreadsheet. While I contrast,
there are unstructured data which includes free-form text, untagged audio,
image and video data. All the analyses and information from it will be in vain if
we are unable to present it to the people which should be easily consumable. So
proper visualization is the key challenge which needs to be met if proper action
is needed for the result of the analyses.
Earlier before the internet revolution, the methods used to mine data was
restricted to small data sets and less variability in datatypes. But in information
age, due to easier and less costly accumulation of data. It is estimated that
amount of information stored doubles every twenty months. So making effective
use of these data is new challenge we need to encounter. An automatic,
exploratory analysis and modelling of large data repositories is known as
Knowledge Discovery in Database (KDD). It is a novel approach to identify
understandable patterns in large data sets. The process of KDD starts with
determining the goal of particular project and ends with implementation of
discovered knowledge. KDD process is a nine-step process, starting with
managerial step. Firstly development of understanding of application domain is
carried out, it prepares scene for understanding what should be done with
available decision variables. People involved in this step need to understand the
requirement of end-user and the environment in which knowledge discovery will
take place. After that pre-processing need to be done of data sets. For this we
need to determine which all data are available and which will be used for
particular application. For success of the process we should consider all the
relevant data available because if some of the attributes are missed then whole
process may fail. Then we should clean the data that is, handling missing values
and removal of noise and outliers. This step will enhance our data reliability, to
give an example, if one suspects that a certain less important attribute is
unreliable or has lot of missing data then ignoring that attribute is smart choice.
But if that attribute is dominant for an application then we can make that
attribute as goal of data mining supervised algorithm and make a prediction of
missing variable. As certain attributes may not be useful or doesn’t affect the
goal of an application, but these variables may not be spotted by a person. So
after cleaning we will have data transformation step, this includes methods like
data reduction such as record sampling and feature selection, and attribute
transformation such as discretization of numerical attributes and functional
transformation. This step is crucial for success of entire KDD process but is
usually very project-specific. Having completed the above four step on data we
have to focus on algorithmic aspects of each project which are related to data
mining part. At first we need to choose appropriate Data Mining task, for
example is it regression, classification or clustering. There are two major goals in
Data Mining: prediction and description. Prediction is considered as supervised
data mining while description data mining includes unsupervised and
P a g e 4 | 6
visualization aspects. Most data mining techniques are based on inductive
learning, where model is constructed by generalizing from sufficient number or
training samples. Next we need to choose Data Mining algorithm, specific
method is to be chosen to search patterns. For example, precision is better with
Neural Network approach and to get understanding of attributes Decision trees is
better choice. Each algorithm has parameters and tactics of learning like cross-
validation or division of training and testing. Finally implementation of chosen
algorithm is employed, we might have to employ algorithm several times to get
satisfying results, for instance by tuning the algorithm’s control parameters.
Final step of data mining part is Evaluation and interpretation of mined patters,
with respect to goal defined in first step. Here we also pre-processing steps with
respect to their effects on result of data mining algorithm. Discovered knowledge
is also documented for further use. Lastly we need to make proper used of
discovered knowledge, the success of this step determines the effectiveness of
entire KDD process. As now we will use this results in real life, so many
challenges need to be considered like loosing the laboratory conditions under
which we have operated. For example, the knowledge was discovered from
certain snapshots (samples) but now data is dynamic.
Data Mining is classified into two subgroups: Verification and Discovery.
Discover methods are those that automatically identify hidden patterns in the
data. It is branched out as prediction and description. Description methods are
oriented to data interpretation, which focuses on understanding of way the data
relate to each other (for example by visualization). Prediction-oriented methods
aims to automatically build a behavioural model, which is capable of predicting
values of one or more variables related to samples and can obtain new and
unseen samples. It can help us provide understanding of the data. Verification
methods deals with evaluation of hypothesis proposed by an external source like
expert. Methods included are drawn out of traditional statistics like tests of
hypotheses (e.g., t-test of means), goodness of fit test and analysis of variance
(ANOVA). This methods are not related to data mining as most of the data
mining task are concerned with discovering a hypothesis (out of a very large set
of hypotheses), rather than testing which is already known. Under discovery
based methods, prediction is also called as supervised learning, as opposed to
unsupervised learning. Unsupervised learning generally maps high dimensional
data to reduced dimension. It groups data without prespecified, dependent
attributes. Unsupervised learning covers a portion of description method. For
instance it cover clustering methods (like K-means, K-medoids, Adaptive
Resonance Theory (ART) 2, etc.) but doesn’t cover visualization methods.
Supervised methods tries to discover the relationship between input attributes
and target attributes. It is useful to distinguish between two supervised models:
classification models and regression models. Regression maps input space into a
real-valued domain, for example, a regressor can predict the demand for a
certain product given its characteristics. On the other hand, classifier maps input
space into predefined classes.
P a g e 5 | 6
So we can say that Data Mining is a new science which consist of
techniques or methods developed using statistics, artificial intelligence, machine
learning and database systems.
Reforming the US health care service in order to reduce the rate at which
the cost have been increasing and to sustain its currency strength is critical to
United States both as society and as an economy. It is possible to address the
challenges faced by emulating and implementing best practices in health care
which may require to analyse large datasets. MGI have identified different
sectors through which US health care department can generate revenue and
bring down the spending on this department. One of them is developing
personalized medicine which will produce value in R&D arena. The goal of this
application is to examine the relationships among genetic variation,
predisposition for specific diseases, and specific drug responses and then to
account for the genetic variability of individuals in the drug development
process. Personalized medicine holds promise of improving health care in three
main ways: offering early detection and diagnosis; more effective therapies
because patients with same diseases often don’t respond in the same way to the
same therapy; and the adjustment of drug dosages according to a patient’s
molecular profile to minimize side effects and maximize response. Well but to
thoroughly understand the structure of any diseases, one need to consider all
the available clinical data which is very massive in amount. So this can be done
by proper modelling and efficiently applying data mining method. This new lever
was successful in early detection of breast cancer.
Governments in many parts of the world are under increasing pressure to
increase their productivity. Big data can offer them powerful arsenal of
strategies and techniques for boosting productivity and achieving higher level of
effectiveness. Public sector offer challenges because it is very diverse in its
functions and budgets. MGI focussed on administration in two types of
government agencies, tax and labour. So these agencies collects data on large
scale from different sectors. But it can face significant performance challenge.
For instance, Europe’s public sectors accounts for almost half of its GDP. This
high level of shares of economic output puts considerable long-term strain on
Europe’s budgetary. It has been estimated that by 2025 over 30 percent of
population in mature economy across the globe will be aged 60 or over and so
social security, health care, and pensions will face increasing demand.
As big data and its levers are becoming increasingly valuable assets, the
use of it will become key basis to compete across sectors. So it’s important for
organization leader to incorporate big data into their business plans. Also they
need to ensure that along with the sufficient skills in back-office analytics, they
also manage a transition towards the right managerial talent on front line.
Leader should understand the assets (i.e. data) they hold or which they could
have access. Organization should have inventory of their own and should also
systematically catalogue other data which they could gain access to, like
P a g e 6 | 6
government data, internet data. Also there might be third party who has not
consider to share their data. So organization need to thoughtfully consider and
present a compelling value to that party for able to gain access to their data.
Leader need to consider to adopting a process of purposeful experimentation
which can be powerful path to leverage big data, rather than just specifying
complete plan prior to doing any implementation. At first one can consider just
few high-potential areas in which to experiment with big data and then can be
scaled to larger domain. A sophisticated leader will first apply technique like
“scrubbing” on data which will generate, structure and organize the data, this
will improve its quality. Next these data should be made easily accessible to all
the departments of the organization through networks. Then very basic and
simple analytics will be applied on it, e.g., those techniques which doesn’t
require customized analyses to be designed by people with deep analytics skills.
Fourth and highest level is applying advanced and complex analytics like
automated algorithms and real-time data analysis that can create some new
business model. Leader should build a team with deep analytics capability which
will supply new information to the company and new insight for further business
growth. Also these leader will need to have baseline understanding of this
analytics techniques in order to become effective user of these types of
analyses. The lack of customer-centric view can limits the organization’s ability
to use any big data levers to create new value. So they might require to invest
in IT hardware, software, and services to capture, store, organize, and analyse
large datasets. Data privacy and security will become paramount as it travels
across boundaries for various purposes. Privacy, not only require to compliance
with laws and regulations, but also is fundamental to an organization’s trust
relationship with its customers and partners. Organizational leader will have to
wrestle with legal issues relating to their stance on intellectual property for data.
A significant constraint of realizing value of big data will be shortage of
talents, particularly people with expertise in statistics and machine learning. It’s
been estimated by MGI that demand of people with deep analytical talents in US
could be grater than 50-60 percent than its projected supply by 2018. It is
considered as 21st
century sexiest job, written by USA Today. Current trends
indicate that 4,000 new positions are being created annually, perhaps
significantly more. This has brought new wave in the market as most of the
sectors want to gain more from big data. Hal Varian, the chief economist at
Google, is known to have said, “The sexy job in the next 10 years will be
statisticians. People think I’m joking, but who would’ve guessed that computer
engineers would’ve been the sexy job of the 1990s?”

More Related Content

What's hot

What's hot (20)

Big data
Big dataBig data
Big data
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classes
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient World
 
Big Data analytics best practices
Big Data analytics best practicesBig Data analytics best practices
Big Data analytics best practices
 
Big data
Big dataBig data
Big data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Big data
Big dataBig data
Big data
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
El big data analytics donde menos te lo esperas - Alex Rayón
El big data analytics donde menos te lo esperas - Alex RayónEl big data analytics donde menos te lo esperas - Alex Rayón
El big data analytics donde menos te lo esperas - Alex Rayón
 

Viewers also liked

Post Merger Integration - An Approach
Post Merger Integration - An ApproachPost Merger Integration - An Approach
Post Merger Integration - An Approach
Scott Isenstein MBA
 
Ihsan_resume_exchange_admin
Ihsan_resume_exchange_adminIhsan_resume_exchange_admin
Ihsan_resume_exchange_admin
Ihsan Al-Din
 
Dropout-Brochure-ENG
Dropout-Brochure-ENGDropout-Brochure-ENG
Dropout-Brochure-ENG
Original Ltd
 
Conventions On Religious Horrors
Conventions On Religious HorrorsConventions On Religious Horrors
Conventions On Religious Horrors
BobbyMarkWright123
 

Viewers also liked (13)

Project 1 : The Psychology of a Child's Mind
Project 1 : The Psychology of a Child's Mind Project 1 : The Psychology of a Child's Mind
Project 1 : The Psychology of a Child's Mind
 
Liesbeth Inghelram - Bakkerijmuseum Veurne
Liesbeth Inghelram - Bakkerijmuseum VeurneLiesbeth Inghelram - Bakkerijmuseum Veurne
Liesbeth Inghelram - Bakkerijmuseum Veurne
 
2304 respiratory
2304 respiratory2304 respiratory
2304 respiratory
 
Post Merger Integration - An Approach
Post Merger Integration - An ApproachPost Merger Integration - An Approach
Post Merger Integration - An Approach
 
Ihsan_resume_exchange_admin
Ihsan_resume_exchange_adminIhsan_resume_exchange_admin
Ihsan_resume_exchange_admin
 
Veronicah ayuma ruth cv
Veronicah ayuma ruth  cvVeronicah ayuma ruth  cv
Veronicah ayuma ruth cv
 
Dropout-Brochure-ENG
Dropout-Brochure-ENGDropout-Brochure-ENG
Dropout-Brochure-ENG
 
Ikt
IktIkt
Ikt
 
Helpful tips to increase trade in value of
Helpful tips to increase trade in value ofHelpful tips to increase trade in value of
Helpful tips to increase trade in value of
 
برنامج المؤتمر (1)
برنامج المؤتمر (1)برنامج المؤتمر (1)
برنامج المؤتمر (1)
 
Regulatory Challenges In Exponential Trends - Stefano Quintarelli
Regulatory Challenges In Exponential Trends - Stefano QuintarelliRegulatory Challenges In Exponential Trends - Stefano Quintarelli
Regulatory Challenges In Exponential Trends - Stefano Quintarelli
 
Conventions On Religious Horrors
Conventions On Religious HorrorsConventions On Religious Horrors
Conventions On Religious Horrors
 
Presentation1
Presentation1Presentation1
Presentation1
 

Similar to Big data upload

Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
John Enoch
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
saranya270513
 
Big Data - CRM's Promise Land
Big Data - CRM's Promise LandBig Data - CRM's Promise Land
Big Data - CRM's Promise Land
Danny Camprubi Douglas
 
141900791 big-data
141900791 big-data141900791 big-data
141900791 big-data
glittaz
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
Brian Crotty
 

Similar to Big data upload (20)

Big data
Big dataBig data
Big data
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
 
Big Data - CRM's Promise Land
Big Data - CRM's Promise LandBig Data - CRM's Promise Land
Big Data - CRM's Promise Land
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
Big Data
Big DataBig Data
Big Data
 
141900791 big-data
141900791 big-data141900791 big-data
141900791 big-data
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Kapil Big Data Seminar PPT.pptx
Kapil Big Data Seminar PPT.pptxKapil Big Data Seminar PPT.pptx
Kapil Big Data Seminar PPT.pptx
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Big data upload

  • 1. REPORT: BIG DATA UNIVERSITY OF LEICESTER Data Analysis for Business Intelligence MSc
  • 2. P a g e 1 | 6 BIG DATA We have heard the term “flood” been used with money, people or new technologies. But this has led to define a new term which is related to flooding of the data. Moore in 1965 described that number of transistors on dense IC doubles approximately every two year. Which indeed is true and thus new technologies which could fit in our hand and could be used as our personal computer is invented. We are surrounded by electronic machine, we don’t realise but we are monitored at some moment of our day by an electronic device that can be our mobile phones, CCTV, weighting machine, computer and list never ends. This in turns generate data and it so huge which made us define it as Big Data. Big Data is a vast repository of the data whose size is beyond the ability of conventional data database. Size of such database cannot be defined; it is growing at each second. For instance, famous social networking site facebook collects 500+ terabytes of data every day. But this data is just collection of facts and events of day to day life. These data doesn’t lie neither do it tell the truth. We actually need to understand what can the data tells us. And that extracted idea is called information. So information tells us that what does the data means. This information is useless if we are not able to make use of it and use is to change our action. So we need to provide insights that how will this information be useful to achieve our goal. Data have swept into every industry and business sectors. Mckinsey Global Institute (MGI) estimated that an enterprise globally stores 7 exabytes of data while consumer stores 6 exabytes in 2010. They have estimated that if US health care use big data effectively, then potential value from data in this sector could be more than $300 billion in value every year. European organisations have approximately 11 exabytes of data, making efficient use of it can generate nearly $149 billon in operational efficiency improvements. However in near term there is huge potential to leverage big data in developing countries. Organization can Leverage big data and improves its design and functionality. Big data can create value in several ways; it can create transparency by simply making data available to stakeholders in timely manner which can create tremendous value. Also making data readily available to all the departments can steeply reduce the search and processing time. It can also help to experiment to discover the needs. As most of the data are stored in digital format, one can discover if there is need or change that to be done on product. Big data enables us to segment the populations according to ones need and to help to deliver a customize action. It can also support in decision making with automated algorithms. It can minimize risks, dig up the valuable Insight, for instance tax agencies can use automated risk engines. Manufacturers are using
  • 3. P a g e 2 | 6 data obtained from current product to improve their development of new ones. Big data has created entirely new categories of companies, such as those that aggregate and analyse industry data and provide useful information and insight to manufacturing or financial company. Value of big data can be measured by estimating the total value created from taking particular action with the use of big data. But to capture full potential of big data several issues will have to be addressed like; they have to consider the legal aspect in order to handle and analyse data, there is no room for information breach which can result in serious consequences. Organization which handles nation’s data need to be careful if any of the information is exposed or not as this might have very high loss of the nation. Also they have to be extremely careful when analysing any health care data. As wrong prescription might cost someone’s life. Also company has to hire new personnel’s who has understanding of big data. Abundant variety of technologies have been develop that can be applied to big data to get useful insight from it. Researchers continue to develop new techniques to analyse the big data. Set of techniques that is used to extract patterns from large database by combining methods from statistics, machine learning and database management is called Data Mining. Technique which is used is used to discover an interesting relationship among different variables in large database is Association rule learning, which can be helped to determine which products are frequently bought together. To study the buying strategy of a customer or to determine the most consumed product them, a technique called Classification is used. It categories the existing data and the new data can be predicted using this already classified data, it is also called Unsupervised Learning. Cluster Analysis is another statistical method that is used to group the similar objects, whose characteristics of similarity is not known in advance. A technique to collect large data from the crowd that are submitted through open calls is known as Crowdsourcing. But analysing data from single source might not be of great use. So it will be more efficient if multiple source is taken into consideration, which is called Data fusion and data integration. Natural language processing can be used to analyse data from social media websites like twitter, facebook etc. Also idea of natural evolution that is “survival of the fittest” can be used to optimize parameters of business or manufacturing models, called Genetic Algorithm. Also a unique technique called spatial analysis is used to analyse the geographic property which helps to make decision for selection of manufacturing sites. To support all the techniques of data mining many technologies have been developed. Google developed Big Table to store data in compressed form on Google File System. An application program which is used to report, analyse and present data is Business Intelligence (BI). It reads data that have been previously stored in a data warehouse and then enables to perform on it. Also the computing paradigm which provide highly scalable computing resources through the network is addressed as cloud computing. An open source software
  • 4. P a g e 3 | 6 framework for processing huge datasets on distributed system is managed by Apache Software Foundation which is named as Hadoop. Data can be structured that is, data which resides in fixed fields like in spreadsheet. While I contrast, there are unstructured data which includes free-form text, untagged audio, image and video data. All the analyses and information from it will be in vain if we are unable to present it to the people which should be easily consumable. So proper visualization is the key challenge which needs to be met if proper action is needed for the result of the analyses. Earlier before the internet revolution, the methods used to mine data was restricted to small data sets and less variability in datatypes. But in information age, due to easier and less costly accumulation of data. It is estimated that amount of information stored doubles every twenty months. So making effective use of these data is new challenge we need to encounter. An automatic, exploratory analysis and modelling of large data repositories is known as Knowledge Discovery in Database (KDD). It is a novel approach to identify understandable patterns in large data sets. The process of KDD starts with determining the goal of particular project and ends with implementation of discovered knowledge. KDD process is a nine-step process, starting with managerial step. Firstly development of understanding of application domain is carried out, it prepares scene for understanding what should be done with available decision variables. People involved in this step need to understand the requirement of end-user and the environment in which knowledge discovery will take place. After that pre-processing need to be done of data sets. For this we need to determine which all data are available and which will be used for particular application. For success of the process we should consider all the relevant data available because if some of the attributes are missed then whole process may fail. Then we should clean the data that is, handling missing values and removal of noise and outliers. This step will enhance our data reliability, to give an example, if one suspects that a certain less important attribute is unreliable or has lot of missing data then ignoring that attribute is smart choice. But if that attribute is dominant for an application then we can make that attribute as goal of data mining supervised algorithm and make a prediction of missing variable. As certain attributes may not be useful or doesn’t affect the goal of an application, but these variables may not be spotted by a person. So after cleaning we will have data transformation step, this includes methods like data reduction such as record sampling and feature selection, and attribute transformation such as discretization of numerical attributes and functional transformation. This step is crucial for success of entire KDD process but is usually very project-specific. Having completed the above four step on data we have to focus on algorithmic aspects of each project which are related to data mining part. At first we need to choose appropriate Data Mining task, for example is it regression, classification or clustering. There are two major goals in Data Mining: prediction and description. Prediction is considered as supervised data mining while description data mining includes unsupervised and
  • 5. P a g e 4 | 6 visualization aspects. Most data mining techniques are based on inductive learning, where model is constructed by generalizing from sufficient number or training samples. Next we need to choose Data Mining algorithm, specific method is to be chosen to search patterns. For example, precision is better with Neural Network approach and to get understanding of attributes Decision trees is better choice. Each algorithm has parameters and tactics of learning like cross- validation or division of training and testing. Finally implementation of chosen algorithm is employed, we might have to employ algorithm several times to get satisfying results, for instance by tuning the algorithm’s control parameters. Final step of data mining part is Evaluation and interpretation of mined patters, with respect to goal defined in first step. Here we also pre-processing steps with respect to their effects on result of data mining algorithm. Discovered knowledge is also documented for further use. Lastly we need to make proper used of discovered knowledge, the success of this step determines the effectiveness of entire KDD process. As now we will use this results in real life, so many challenges need to be considered like loosing the laboratory conditions under which we have operated. For example, the knowledge was discovered from certain snapshots (samples) but now data is dynamic. Data Mining is classified into two subgroups: Verification and Discovery. Discover methods are those that automatically identify hidden patterns in the data. It is branched out as prediction and description. Description methods are oriented to data interpretation, which focuses on understanding of way the data relate to each other (for example by visualization). Prediction-oriented methods aims to automatically build a behavioural model, which is capable of predicting values of one or more variables related to samples and can obtain new and unseen samples. It can help us provide understanding of the data. Verification methods deals with evaluation of hypothesis proposed by an external source like expert. Methods included are drawn out of traditional statistics like tests of hypotheses (e.g., t-test of means), goodness of fit test and analysis of variance (ANOVA). This methods are not related to data mining as most of the data mining task are concerned with discovering a hypothesis (out of a very large set of hypotheses), rather than testing which is already known. Under discovery based methods, prediction is also called as supervised learning, as opposed to unsupervised learning. Unsupervised learning generally maps high dimensional data to reduced dimension. It groups data without prespecified, dependent attributes. Unsupervised learning covers a portion of description method. For instance it cover clustering methods (like K-means, K-medoids, Adaptive Resonance Theory (ART) 2, etc.) but doesn’t cover visualization methods. Supervised methods tries to discover the relationship between input attributes and target attributes. It is useful to distinguish between two supervised models: classification models and regression models. Regression maps input space into a real-valued domain, for example, a regressor can predict the demand for a certain product given its characteristics. On the other hand, classifier maps input space into predefined classes.
  • 6. P a g e 5 | 6 So we can say that Data Mining is a new science which consist of techniques or methods developed using statistics, artificial intelligence, machine learning and database systems. Reforming the US health care service in order to reduce the rate at which the cost have been increasing and to sustain its currency strength is critical to United States both as society and as an economy. It is possible to address the challenges faced by emulating and implementing best practices in health care which may require to analyse large datasets. MGI have identified different sectors through which US health care department can generate revenue and bring down the spending on this department. One of them is developing personalized medicine which will produce value in R&D arena. The goal of this application is to examine the relationships among genetic variation, predisposition for specific diseases, and specific drug responses and then to account for the genetic variability of individuals in the drug development process. Personalized medicine holds promise of improving health care in three main ways: offering early detection and diagnosis; more effective therapies because patients with same diseases often don’t respond in the same way to the same therapy; and the adjustment of drug dosages according to a patient’s molecular profile to minimize side effects and maximize response. Well but to thoroughly understand the structure of any diseases, one need to consider all the available clinical data which is very massive in amount. So this can be done by proper modelling and efficiently applying data mining method. This new lever was successful in early detection of breast cancer. Governments in many parts of the world are under increasing pressure to increase their productivity. Big data can offer them powerful arsenal of strategies and techniques for boosting productivity and achieving higher level of effectiveness. Public sector offer challenges because it is very diverse in its functions and budgets. MGI focussed on administration in two types of government agencies, tax and labour. So these agencies collects data on large scale from different sectors. But it can face significant performance challenge. For instance, Europe’s public sectors accounts for almost half of its GDP. This high level of shares of economic output puts considerable long-term strain on Europe’s budgetary. It has been estimated that by 2025 over 30 percent of population in mature economy across the globe will be aged 60 or over and so social security, health care, and pensions will face increasing demand. As big data and its levers are becoming increasingly valuable assets, the use of it will become key basis to compete across sectors. So it’s important for organization leader to incorporate big data into their business plans. Also they need to ensure that along with the sufficient skills in back-office analytics, they also manage a transition towards the right managerial talent on front line. Leader should understand the assets (i.e. data) they hold or which they could have access. Organization should have inventory of their own and should also systematically catalogue other data which they could gain access to, like
  • 7. P a g e 6 | 6 government data, internet data. Also there might be third party who has not consider to share their data. So organization need to thoughtfully consider and present a compelling value to that party for able to gain access to their data. Leader need to consider to adopting a process of purposeful experimentation which can be powerful path to leverage big data, rather than just specifying complete plan prior to doing any implementation. At first one can consider just few high-potential areas in which to experiment with big data and then can be scaled to larger domain. A sophisticated leader will first apply technique like “scrubbing” on data which will generate, structure and organize the data, this will improve its quality. Next these data should be made easily accessible to all the departments of the organization through networks. Then very basic and simple analytics will be applied on it, e.g., those techniques which doesn’t require customized analyses to be designed by people with deep analytics skills. Fourth and highest level is applying advanced and complex analytics like automated algorithms and real-time data analysis that can create some new business model. Leader should build a team with deep analytics capability which will supply new information to the company and new insight for further business growth. Also these leader will need to have baseline understanding of this analytics techniques in order to become effective user of these types of analyses. The lack of customer-centric view can limits the organization’s ability to use any big data levers to create new value. So they might require to invest in IT hardware, software, and services to capture, store, organize, and analyse large datasets. Data privacy and security will become paramount as it travels across boundaries for various purposes. Privacy, not only require to compliance with laws and regulations, but also is fundamental to an organization’s trust relationship with its customers and partners. Organizational leader will have to wrestle with legal issues relating to their stance on intellectual property for data. A significant constraint of realizing value of big data will be shortage of talents, particularly people with expertise in statistics and machine learning. It’s been estimated by MGI that demand of people with deep analytical talents in US could be grater than 50-60 percent than its projected supply by 2018. It is considered as 21st century sexiest job, written by USA Today. Current trends indicate that 4,000 new positions are being created annually, perhaps significantly more. This has brought new wave in the market as most of the sectors want to gain more from big data. Hal Varian, the chief economist at Google, is known to have said, “The sexy job in the next 10 years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”