SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Abstract.................................................................................................................................................3
Introduction...........................................................................................................................................3
Data lifecycle – phases and their challenges........................................................................................4
• Data Acquisition & Data Warehousing
• Data Extraction & Structuring
• Data Modeling & Data Analysis
• Data Interpretation
5Vs of Big Data – Problems & Challenges...........................................................................................6
• Volume & Scalability
• Heterogeneous & unstructured nature of Big Data
• Data governance and security
• Infrastructure and system architecture
Securing Success in Big Data Projects................................................................................................7
Businesses in the Era of Big Data........................................................................................................8
Conclusion............................................................................................................................................8
References...........................................................................................................................................8
Copyright Information...........................................................................................................................9
Contents
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved2
Abstract
The big impact of Big Data in the post-modern world is
unquestionable, un-ignorable and unstoppable today.
While there are certain discussions around Big Data being
really big, here to stay or just an over hyped fad; there are
facts as shared in the following sections of this whitepaper
that validate one thing - there is no knowing of the limits
and dimensions that data in the digital world can assume.
As of now, there are only predictions, forecasts that form
the basis for businesses to decide their own course of
action by taking advantage of this new window of opportu-
nity with high-end intricate technology based resources
and tools. These are nothing but ways and means to
harness the mammoth and draw necessary inferences for
making smart, intelligent business decisions.
The other outstanding feature of this century is the mush-
rooming of more billion-dollar enterprises around the world
than ever before. This phenomenon is making the world a
more competitive, knowledge-rich, mercurial and dynamic
place. Survival of the fittest has never been more pertinent
in the business sector. This backdrop serves as the most
important reason for businesses to be data-driven. It is the
only premise that can yield confident, sure-shot and
actionable insights to make future-ready, fool-proof
decisions. Even though, intuition and acumen – the human
factors will always have a large role to play in deciding
course of success, a strong data foundation can go a long
way in minimizing risks of observational understanding or
guess.
While this sounds like a good game to get on, the
challenge however, lies in data itself. As the name
suggests – it is big, heterogeneous, unstructured and
scattered. In that case, how to mine diamonds off rocks?
From acquisition of data that comes in an ad hoc manner,
to storing the right metadata and data integration – are all
different facets with both opportunities and challenges.
Data analysis, structuring, retrieval and modeling are other
key aspects with their own set of challenges. This paper
looks at deriving real value from the big data giant through
a look at the data lifecycle, its dimensions and challenges,
best practices and application benefits.
Introduction
Eric Schmidt of Google had stated in 2010 that every two
days we create as much information as we did from the
dawn of civilization up until 2003. That was 2010.
In 2014, an IDC and EMC report stated that the digital
universe is doubling every two years, and will reach
40,000 exabytes (40 trillion gigabytes) by 2020. Here are
some more interesting and latest statistics on data:
• 90% of the world’s data has been created in last two
years
• Big Data market growth projection is $53.4 billion by
2017, from $10.2 billion in 2013
• 70% of the digital universe, approx. 900 exabytes are
generated by users
• 98% of global information is now digital, that was
25% in 2000
• 10% increase in data visibility means additional
$65.7 million for a typical Fortune 1000 company
These facts contribute to the imperative that enterprises
must develop effective methodologies and upscale
capabilities to gather, process, and harvest data, now
before it gets too late. The world spent a large part of
2013 in data collection. It is now essential to create
capabilities that can narrow down big data into relevance
and importance, and carve out only the information that
matters most to the business. The Big Data Revolution is
not in the volume explosion of data, but in the capability
of actually doing something with the data, making more
sense out of it.
In order to build a capability that can achieve beneficial
data targets, enterprises need to understand the data
lifecycle and challenges at different stages.
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved3
Data Extraction & Structuring
Data that has been collected, even after filtering, is not in
a format ready for analysis. It is has multiple modes of
content, such as text, pictures, videos, multiple sources
of data with different file formats. This mandates for a
Data Extraction Strategy that integrates data from
diverse enterprise information repositories and trans-
forms it into a consumable format.
Data is basically of two categories – structured and
unstructured. Structured data is that which is available in
a pre-set format such as row and column based data-
bases. These are easy to enter, store and analyze. This
type of data is mostly actual and transactional.
Unstructured data on the other hand is free form, attitudi-
nal and behavioral. This does not come in traditional
formats. It is heterogeneous, variable and comes in
multiple formats, such as text, document, image, video
and so on. Unstructured data is growing at a super-fast
speed. In 2011, IDC held a study that stated that 90
percent of all data in the next decade will be unstruc-
tured. However, from a business benefit perspective,
true value and insights reside in this massive volume of
unstructured data that is rather difficult to tame and
channelize.
Data
Acquisition &
Warehousing
Data
Extraction &
Structuring
Data Modeling
& Analysis
Data
Interpretation
Data lifecycle – phases and their challenges
The end-to-end lifecycle of data involves multiple phases as represented below, with challenges at each phase. Big Data
needs contextual management from a haphazard, heterogeneous mix. There is also a question of credibility, uncertainty
and error in understanding the relevance of data. This management of data requires smart systems and also better human
collaboration for user interaction. A lot of debate on Big Data is focused on the technology aspect. However, there is also
a lot more than technology required to set up the fundamental basis of managing data analysis. It doesn’t reckon throwing
away of existing structures, warehouses and analytics. Instead, needs to build up on existing capabilities in data quality,
Master Data Management and data protection frameworks. Data Management needs to be seen from a business perspec-
tive, by prioritizing business needs and taking realistic actions.
Data Acquisition & Data Warehousing
Data always has a source. It doesn’t come out of nowhere.
And just as big as data is, so are the multifarious sources
that can produce up to 1 million terabytes of raw data
every day. This enormity and dispersion in data is not of
much use, unless it is filtered and compressed on the
basis of several criteria. The foremost challenge in this
aspect is to define these criteria for filters, so as to not lose
out any valuable information. For instance, customer
preference data can sourced from the information they
share on key social media channels. But then, how to tap
the non-social media users who might also be an impor-
tant customer segment. What are the data sources for
them?
Data reduction is a science that needs substantial
research to establish an intelligent process that brings
down raw data to a user-friendly size without missing out
the minute information pieces of relevance. And this is
required in real-time, as it would be an expensive and
arduous affair to store the data first and reduce later.
An important part of building a robust Data Warehousing
platform is the consolidation of data across various
sources to create a good repository of master data, which
will help in providing consistent information across the
organization.
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved4
Data Interpretation
The most important aspect of success in Big Data Analyt-
ics is the presentation of analyzed data in a user-friendly,
re-usable and intelligible format. And the complexity of
data is adding to the complexity of its presentation as well.
Sometimes, simple tabular representations may not be
sufficient to represent data in certain cases, requiring
further explanations, historical incidences, etc. Some-
times, predictive or statistical analysis from the data is
also expected from the analytics tool to support decision-
making. In other words, the final phase or culmination of
the entire Big Data exercise is Data Interpretation or Data
Visualization.
Visualization of data is a key component of Business Intel-
ligence. Here’s a snapshot of the Visualization framework
that assists in Business Intelligence.
Functional
Features
Non
Functional
Features
Others
Semantic Layer Reports
Development
Advanced
Visualization
Write Back Offline Analysis Collaboration
Search Data Mashup Custom
Development
Query
Writing
Advanced
Analytics
Integration
What-If
Modeling
Metrics
Management
OLAP
Metadata
Exchange
Dashboards Event
Management
Big Data
Integration
Recoverability
Concurrent
Sessions
Scale Up /
Scale Out
Cloud Platform Security In Memory
Data Storage Aggregate
Aware
In-DB
Processing
Licensing
& Pricing
Release
Frequency
Community
Size
Geo Presence
Analyst
References
3rd Party
Support,
Extensions
Customer
Adoption
Marketplace Support Cost
Data manipulation & analytic applications addressing
automation, application development and testing
Data modelling covering key areas like experimental
design, graphical models and path analysis
Statistics and machine learning through classical and
spatial statistics, simulation and optimization techniques
Text data analysis through pattern analysis, text mining
and NLP by developing and integrating solutions or
deploying packaged solutions
•
•
•
•
Extract-Transform-Load (ETL) is the process that covers
the entire stage of getting data loaded in the proper,
cleaned format from the source to the target data ware-
house. There are several ETL tools in available, principles
of making the right selection are same as that of deciding
the right course of big data implementation as explained
later in the paper.
Data Modeling & Data Analysis
Once the proper mechanism of creating a data repository
is established, then sets in the rather complex procedure
of Data Analysis. Big Data Analytics is one of the most
crucial aspects and room for development in the data
industry. Data analysis is not only about locating, identify-
ing, understanding, and presenting data. Industries
demand for large-scale analysis that is entirely automated
which requires processing of different data structures and
semantics in an understandable and computer intelligent
format.
Technological advancements in this direction are making
this kind of analytics of unstructured possible and cost-
effective. A distributed grid of computing resources utiliz-
ing easily scalable architecture, processing framework
and non-relational, parallel-relational databases is rede-
fining data management and governance. Databases
today have shifted to non-relational to meet the complex-
ity of unstructured data. NoSQL database solutions are
capable of working without fixed table schemas, avoid join
operations, and scale horizontally.
Data sciences has emerged as a sophisticated discipline
that draws from various elements across statistical tech-
niques, mathematical modelling and visualization. It
encapsulates:
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved5
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved6
Interactive Data Visualization is the next big thing that the industry is moving into. From static graphs and spreadsheets to
using mobile devices and interacting with data in real time – the future of data interpretation is becoming more agile and
responsive.
In 2001, Gartner had introduced the 3V’s definition of data growth that was then in the inception stages. 3V signified the
three dimensions of – Volume, Velocity and Variety. Now, the industry recognizes 5V, adding Veracity and Value as the
additional aspects of data. While volume, velocity and variety are spices that the world has already tasted and still continue
to be a major consideration in the data domain, veracity and value are aspects of the modernized data that throw up major
challenges. Increasing channels or sources of data such as social media has made users a major part of data contribution
and consumption. This is a boon and a bane at the same time. While it opens up a huge window into understanding
consumers, there is a massive amount of junk that is created at various levels. Credibility and trustworthiness of data has
never been of more concern before. Value, of course, is the big thing out of big data that is the driver of the entire revolu-
tion. Businesses no longer look only for access to Big Data, but generating true value out of it is the real objective.
This five-dimensional emergence of big data has given rise to related problems that cut across all the phases of the data
lifecycle.
Volume
Value Velocity
Veracity Variety
5Vs of Big Data – Problems & Challenges
Securing Success in Big
Data Projects
With all its opportunities and challenges, there are certain
guiding principles in the implementation of Big Data that
can push the envelope for success. The secret lies in a
robust data strategy and data management program that
is aligned to the business goals and strategies.
Points to be noted and considered before jumping into
big data are:
Volume & Scalability
This is the basic problem that every system or tool
grapples with when dealing with Big Data – it is big and
there is no knowing of the limits of its scale. Therefore,
Big Data tools and infrastructure need to ensure
enough scalability and elasticity to be able to handle
the sonic speed of data growth.
Heterogeneous & unstructured nature of Big
Data
As explained earlier, most data is unstructured and
therefore, heterogeneous is nature. In terms of
sources, formats, modes and feeds – data influx
happens in all shapes and sizes. Analytic tools there-
fore need to be smart enough to decipher all the
diverse natures of data, assimilate them with advanced
algorithm development, optimization and automation to
bring it on a uniform, consumable format.
Data governance and security
Increase in mobility and access to information has led to
massive discussions around data governance, protec-
tion and security. Industries such as banking, health-
care, pharma, and defense are under strict compliance
and regulatory mandates that make it a tough job to
create a proper data protection framework. It is not
enough to have an IT infrastructure and security in
place. Data governance has taken primary importance
in these sectors where opportunity is big in Big Data, but
risks can be huge.
Infrastructure and system architecture
While the advanced technologies of Hadoop and
MapReduce are scaled to meet the 5Vs of big data, they
assert significant demands on infrastructure in terms of
scale, storage capacities that are efficient and cost-
effective. Intelligent storage capacities can leverage
through data compression, automatic data tiering and
data deduplication. The question is how much is needed
to implement Big Data and how much is enough.
•
•
•
•
Discern business requirements before beginning to
gather data – what is the actual business need
Big Data implementation is a business decision, not an
IT or technological function
Take small steps to Big Data – an agile and iterative
implementation approach can go a long way in yielding
results giving the business space to respond, adapt
and realize the true value in Big Data
Standardize Big Data efforts into IT governance
program in order to make up for skill shortages
Align Big Data strategy with Cloud Strategy to tackle
several issues around storage, security and scalability
Embed Data Analytics into the system DNA to see real
value by including big data into organizational data and
breaking silos of teams
•
•
•
•
•
•
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved7
Conclusion
Big Data is the biggest opportunity in the modern world.
There is a whole set of advantages that businesses can
yield from big data and evolve as smart and connected
enterprises. Also, it has now become evident that Big
Data is permeating all aspects of life, making it an
imperative for businesses, whether they are ready or not.
However, there are also unique and new challenges
thrown up by the data revolution that enterprises need to
be careful about. With proper caution and preparedness,
a business can thrive into excellence with big data,
without being exposed to a risk situation. Knowing how
big data works and being aware of the challenges is one
of the first steps towards preparing for taking on the big
elephant.
http://www.ibmbigdatahub.com/blog/10-big-data-
implementation-best-practices
http://www.intel.in/content/dam/www/public/us/en/doc
uments/solution-briefs/big-data-101-brief.pdf
http://hbr.org/special-collections/insight/big-data
http://www.experian.co.uk/assets/business-
strategies/white--
papers/wp-big-data-evolution-not-revolution.pdf
http://tdwi.org/whitepapers/list/advanced-
analytics/big-data-analytics.aspx
•
•
•
•
•
References:
Businesses in the Era of Big
Data
Big Data, being big enough, is merely just warming up.
Soon, every aspect of the human society is going to be
affected by big data. There is scope for every sector in
every field to make great use of Big Data in enabling better
lives for users. Here are some classic examples of Big
Data in play in the field of business:
Deeper levels of understanding and targeting custom-
ers: A large US retailer has been able to accurately
predict when a customer of theirs is expecting a baby.
Churn management has become easily predictable for
telecom companies and car insurance companies are
able to understand how well their customers are driving.
Optimizing Business Process: Big Data is not only giving
a peek into the external audience, but also a great way
for introspection into business processes. Stock optimi-
zation in retail through predictive analysis from social
media, web trends and weather forecasts is leading
huge cost benefits. Supply chain management is particu-
larly benefitting from data analytics. Geographic
positioning and radio frequency identification sensors
can now track goods or delivery vehicles and optimize
routes by integrating live traffic data.
Driving smarter machines and devices: The recently
launched and widely talked about Google’s self-driven
car is majorly using Big Data tools. The energy sector is
also taking advantage by optimizing energy grids using
data from smart meters. Big Data tools are also being
used to improve performance of computers and data
warehouses.
Smarter Financial Trading: High-Frequency Trading
(HFT) is finding huge application of Big Data today. Big
Data algorithms used to make trading decisions has led
to a majority of equity trading data algorithms taking into
account data feeds from social media networks and
news websites to make decisions in split seconds.
These are some of the existing illustrations where Big
Data is in application in the business sector. There are
several other avenues, with newer ones opening by the
day, where Big Data can drive organizations into being
smarter, secured and connected.
•
•
•
•
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved8
© Happiest Minds Technologies Pvt. Ltd. All Rights Reserved
Happiest Minds is focused on helping customers build Smart Secure and Connected experience by leveraging disruptive
technologies like mobility, analytics, security, cloud computing, social computing and unified communications. Enterprises
are embracing these technologies to implement Omni-channel strategies, manage structured & unstructured data and
make real time decisions based on actionable insights, while ensuring security for data and infrastructure. Happiest Minds
also offers high degree of skills, IPs and domain expertise across a set of focused areas that include IT Services, Product
Engineering Services, Infrastructure Management, Security, Testing and Consulting.
Headquartered in Bangalore, India, Happiest Minds has operations in the US, UK, Singapore and Australia. It secured a
$45 million Series-A funding led by Canaan Partners, Intel Capital and Ashok Soota.
Anand Veeramani has 15 years of IT experience, including 3 years in Big Data technologies. He has
worked in various roles including Chief Architect, SME, Practice Lead, Account Manager, Transition
Manager, Senior DBA for different database & Big data services. Key past projects include: - Archi-
tecting Enterprise Data warehouse solutions for the largest Bank in USA - Managing Complex data-
base Consolidation & Migration Initiative for largest Bank in USA - Managing end to end database
service transition for leading Airlines in North America
About the Author
Happiest Minds
Anand Veeramani
© 2014 Happiest Minds. All Rights Reserved.
E-mail: Business@happiestminds.com
Visit us: www.happiestminds.com
Follow us on
9
Copyright Information
This document is an exclusive property of Happiest Minds Technologies Pvt. Ltd. It is intended for limited circulation.
Practice Director - Big Data & Database
Services

Contenu connexe

Tendances

Making Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High ExpectationsMaking Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High ExpectationsRackspace
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET Journal
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management Abhishek Sood
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group
 
Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0CrowdFlower
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Dana Gardner
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Ibm 1129-the big data zoo
Ibm 1129-the big data zooIbm 1129-the big data zoo
Ibm 1129-the big data zooAccenture
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Jennifer Walker
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...IT Support Engineer
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderJennifer Walker
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business ModelingNeil Raden
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...Ritesh Shrivastava
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paperJohn Enoch
 

Tendances (16)

Making Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High ExpectationsMaking Sense of NoSQL and Big Data Amidst High Expectations
Making Sense of NoSQL and Big Data Amidst High Expectations
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management
 
Snowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big DataSnowball Group Whitepaper - Spotlight on Big Data
Snowball Group Whitepaper - Spotlight on Big Data
 
Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0Virtual Data Steward: Data Management 3.0
Virtual Data Steward: Data Management 3.0
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Ibm 1129-the big data zoo
Ibm 1129-the big data zooIbm 1129-the big data zoo
Ibm 1129-the big data zoo
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
Big Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not HarderBig Data Management: Work Smarter Not Harder
Big Data Management: Work Smarter Not Harder
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 
Policy paper need for focussed big data & analytics skillset building throu...
Policy  paper  need for focussed big data & analytics skillset building throu...Policy  paper  need for focussed big data & analytics skillset building throu...
Policy paper need for focussed big data & analytics skillset building throu...
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 

Similaire à Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds

Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052kavi172
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Lora Cecere
 
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...Sokho TRINH
 
From Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data AFrom Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data Agwdeodhar
 
From hype to action getting what's needed from big data a
From hype to action getting what's needed from big data aFrom hype to action getting what's needed from big data a
From hype to action getting what's needed from big data agwdeodhar
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringRy Walker
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...IRJET Journal
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET Journal
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big DataIRJET Journal
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big dataMark Albala
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraVin Malhotra
 
Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data WGroup
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Angie Jorgensen
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015Edgar Alejandro Villegas
 
White_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copyWhite_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copyTania Mushtaq
 
Top 10 areas of expertise in data science
Top 10 areas of expertise in data scienceTop 10 areas of expertise in data science
Top 10 areas of expertise in data scienceGlobalTechCouncil
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?Logi Analytics
 

Similaire à Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds (20)

Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
 
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
 
From Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data AFrom Hype to Action-Getting What's Needed from Big Data A
From Hype to Action-Getting What's Needed from Big Data A
 
From hype to action getting what's needed from big data a
From hype to action getting what's needed from big data aFrom hype to action getting what's needed from big data a
From hype to action getting what's needed from big data a
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big data
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
 
Achieving Business Success with Data.pdf
Achieving Business Success with Data.pdfAchieving Business Success with Data.pdf
Achieving Business Success with Data.pdf
 
Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
 
Big data upload
Big data uploadBig data upload
Big data upload
 
White_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copyWhite_Paper_Beyond_Visualisation copy
White_Paper_Beyond_Visualisation copy
 
Top 10 areas of expertise in data science
Top 10 areas of expertise in data scienceTop 10 areas of expertise in data science
Top 10 areas of expertise in data science
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?
 

Dernier

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds

  • 1.
  • 2. Abstract.................................................................................................................................................3 Introduction...........................................................................................................................................3 Data lifecycle – phases and their challenges........................................................................................4 • Data Acquisition & Data Warehousing • Data Extraction & Structuring • Data Modeling & Data Analysis • Data Interpretation 5Vs of Big Data – Problems & Challenges...........................................................................................6 • Volume & Scalability • Heterogeneous & unstructured nature of Big Data • Data governance and security • Infrastructure and system architecture Securing Success in Big Data Projects................................................................................................7 Businesses in the Era of Big Data........................................................................................................8 Conclusion............................................................................................................................................8 References...........................................................................................................................................8 Copyright Information...........................................................................................................................9 Contents © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved2
  • 3. Abstract The big impact of Big Data in the post-modern world is unquestionable, un-ignorable and unstoppable today. While there are certain discussions around Big Data being really big, here to stay or just an over hyped fad; there are facts as shared in the following sections of this whitepaper that validate one thing - there is no knowing of the limits and dimensions that data in the digital world can assume. As of now, there are only predictions, forecasts that form the basis for businesses to decide their own course of action by taking advantage of this new window of opportu- nity with high-end intricate technology based resources and tools. These are nothing but ways and means to harness the mammoth and draw necessary inferences for making smart, intelligent business decisions. The other outstanding feature of this century is the mush- rooming of more billion-dollar enterprises around the world than ever before. This phenomenon is making the world a more competitive, knowledge-rich, mercurial and dynamic place. Survival of the fittest has never been more pertinent in the business sector. This backdrop serves as the most important reason for businesses to be data-driven. It is the only premise that can yield confident, sure-shot and actionable insights to make future-ready, fool-proof decisions. Even though, intuition and acumen – the human factors will always have a large role to play in deciding course of success, a strong data foundation can go a long way in minimizing risks of observational understanding or guess. While this sounds like a good game to get on, the challenge however, lies in data itself. As the name suggests – it is big, heterogeneous, unstructured and scattered. In that case, how to mine diamonds off rocks? From acquisition of data that comes in an ad hoc manner, to storing the right metadata and data integration – are all different facets with both opportunities and challenges. Data analysis, structuring, retrieval and modeling are other key aspects with their own set of challenges. This paper looks at deriving real value from the big data giant through a look at the data lifecycle, its dimensions and challenges, best practices and application benefits. Introduction Eric Schmidt of Google had stated in 2010 that every two days we create as much information as we did from the dawn of civilization up until 2003. That was 2010. In 2014, an IDC and EMC report stated that the digital universe is doubling every two years, and will reach 40,000 exabytes (40 trillion gigabytes) by 2020. Here are some more interesting and latest statistics on data: • 90% of the world’s data has been created in last two years • Big Data market growth projection is $53.4 billion by 2017, from $10.2 billion in 2013 • 70% of the digital universe, approx. 900 exabytes are generated by users • 98% of global information is now digital, that was 25% in 2000 • 10% increase in data visibility means additional $65.7 million for a typical Fortune 1000 company These facts contribute to the imperative that enterprises must develop effective methodologies and upscale capabilities to gather, process, and harvest data, now before it gets too late. The world spent a large part of 2013 in data collection. It is now essential to create capabilities that can narrow down big data into relevance and importance, and carve out only the information that matters most to the business. The Big Data Revolution is not in the volume explosion of data, but in the capability of actually doing something with the data, making more sense out of it. In order to build a capability that can achieve beneficial data targets, enterprises need to understand the data lifecycle and challenges at different stages. © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved3
  • 4. Data Extraction & Structuring Data that has been collected, even after filtering, is not in a format ready for analysis. It is has multiple modes of content, such as text, pictures, videos, multiple sources of data with different file formats. This mandates for a Data Extraction Strategy that integrates data from diverse enterprise information repositories and trans- forms it into a consumable format. Data is basically of two categories – structured and unstructured. Structured data is that which is available in a pre-set format such as row and column based data- bases. These are easy to enter, store and analyze. This type of data is mostly actual and transactional. Unstructured data on the other hand is free form, attitudi- nal and behavioral. This does not come in traditional formats. It is heterogeneous, variable and comes in multiple formats, such as text, document, image, video and so on. Unstructured data is growing at a super-fast speed. In 2011, IDC held a study that stated that 90 percent of all data in the next decade will be unstruc- tured. However, from a business benefit perspective, true value and insights reside in this massive volume of unstructured data that is rather difficult to tame and channelize. Data Acquisition & Warehousing Data Extraction & Structuring Data Modeling & Analysis Data Interpretation Data lifecycle – phases and their challenges The end-to-end lifecycle of data involves multiple phases as represented below, with challenges at each phase. Big Data needs contextual management from a haphazard, heterogeneous mix. There is also a question of credibility, uncertainty and error in understanding the relevance of data. This management of data requires smart systems and also better human collaboration for user interaction. A lot of debate on Big Data is focused on the technology aspect. However, there is also a lot more than technology required to set up the fundamental basis of managing data analysis. It doesn’t reckon throwing away of existing structures, warehouses and analytics. Instead, needs to build up on existing capabilities in data quality, Master Data Management and data protection frameworks. Data Management needs to be seen from a business perspec- tive, by prioritizing business needs and taking realistic actions. Data Acquisition & Data Warehousing Data always has a source. It doesn’t come out of nowhere. And just as big as data is, so are the multifarious sources that can produce up to 1 million terabytes of raw data every day. This enormity and dispersion in data is not of much use, unless it is filtered and compressed on the basis of several criteria. The foremost challenge in this aspect is to define these criteria for filters, so as to not lose out any valuable information. For instance, customer preference data can sourced from the information they share on key social media channels. But then, how to tap the non-social media users who might also be an impor- tant customer segment. What are the data sources for them? Data reduction is a science that needs substantial research to establish an intelligent process that brings down raw data to a user-friendly size without missing out the minute information pieces of relevance. And this is required in real-time, as it would be an expensive and arduous affair to store the data first and reduce later. An important part of building a robust Data Warehousing platform is the consolidation of data across various sources to create a good repository of master data, which will help in providing consistent information across the organization. © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved4
  • 5. Data Interpretation The most important aspect of success in Big Data Analyt- ics is the presentation of analyzed data in a user-friendly, re-usable and intelligible format. And the complexity of data is adding to the complexity of its presentation as well. Sometimes, simple tabular representations may not be sufficient to represent data in certain cases, requiring further explanations, historical incidences, etc. Some- times, predictive or statistical analysis from the data is also expected from the analytics tool to support decision- making. In other words, the final phase or culmination of the entire Big Data exercise is Data Interpretation or Data Visualization. Visualization of data is a key component of Business Intel- ligence. Here’s a snapshot of the Visualization framework that assists in Business Intelligence. Functional Features Non Functional Features Others Semantic Layer Reports Development Advanced Visualization Write Back Offline Analysis Collaboration Search Data Mashup Custom Development Query Writing Advanced Analytics Integration What-If Modeling Metrics Management OLAP Metadata Exchange Dashboards Event Management Big Data Integration Recoverability Concurrent Sessions Scale Up / Scale Out Cloud Platform Security In Memory Data Storage Aggregate Aware In-DB Processing Licensing & Pricing Release Frequency Community Size Geo Presence Analyst References 3rd Party Support, Extensions Customer Adoption Marketplace Support Cost Data manipulation & analytic applications addressing automation, application development and testing Data modelling covering key areas like experimental design, graphical models and path analysis Statistics and machine learning through classical and spatial statistics, simulation and optimization techniques Text data analysis through pattern analysis, text mining and NLP by developing and integrating solutions or deploying packaged solutions • • • • Extract-Transform-Load (ETL) is the process that covers the entire stage of getting data loaded in the proper, cleaned format from the source to the target data ware- house. There are several ETL tools in available, principles of making the right selection are same as that of deciding the right course of big data implementation as explained later in the paper. Data Modeling & Data Analysis Once the proper mechanism of creating a data repository is established, then sets in the rather complex procedure of Data Analysis. Big Data Analytics is one of the most crucial aspects and room for development in the data industry. Data analysis is not only about locating, identify- ing, understanding, and presenting data. Industries demand for large-scale analysis that is entirely automated which requires processing of different data structures and semantics in an understandable and computer intelligent format. Technological advancements in this direction are making this kind of analytics of unstructured possible and cost- effective. A distributed grid of computing resources utiliz- ing easily scalable architecture, processing framework and non-relational, parallel-relational databases is rede- fining data management and governance. Databases today have shifted to non-relational to meet the complex- ity of unstructured data. NoSQL database solutions are capable of working without fixed table schemas, avoid join operations, and scale horizontally. Data sciences has emerged as a sophisticated discipline that draws from various elements across statistical tech- niques, mathematical modelling and visualization. It encapsulates: © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved5
  • 6. © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved6 Interactive Data Visualization is the next big thing that the industry is moving into. From static graphs and spreadsheets to using mobile devices and interacting with data in real time – the future of data interpretation is becoming more agile and responsive. In 2001, Gartner had introduced the 3V’s definition of data growth that was then in the inception stages. 3V signified the three dimensions of – Volume, Velocity and Variety. Now, the industry recognizes 5V, adding Veracity and Value as the additional aspects of data. While volume, velocity and variety are spices that the world has already tasted and still continue to be a major consideration in the data domain, veracity and value are aspects of the modernized data that throw up major challenges. Increasing channels or sources of data such as social media has made users a major part of data contribution and consumption. This is a boon and a bane at the same time. While it opens up a huge window into understanding consumers, there is a massive amount of junk that is created at various levels. Credibility and trustworthiness of data has never been of more concern before. Value, of course, is the big thing out of big data that is the driver of the entire revolu- tion. Businesses no longer look only for access to Big Data, but generating true value out of it is the real objective. This five-dimensional emergence of big data has given rise to related problems that cut across all the phases of the data lifecycle. Volume Value Velocity Veracity Variety 5Vs of Big Data – Problems & Challenges
  • 7. Securing Success in Big Data Projects With all its opportunities and challenges, there are certain guiding principles in the implementation of Big Data that can push the envelope for success. The secret lies in a robust data strategy and data management program that is aligned to the business goals and strategies. Points to be noted and considered before jumping into big data are: Volume & Scalability This is the basic problem that every system or tool grapples with when dealing with Big Data – it is big and there is no knowing of the limits of its scale. Therefore, Big Data tools and infrastructure need to ensure enough scalability and elasticity to be able to handle the sonic speed of data growth. Heterogeneous & unstructured nature of Big Data As explained earlier, most data is unstructured and therefore, heterogeneous is nature. In terms of sources, formats, modes and feeds – data influx happens in all shapes and sizes. Analytic tools there- fore need to be smart enough to decipher all the diverse natures of data, assimilate them with advanced algorithm development, optimization and automation to bring it on a uniform, consumable format. Data governance and security Increase in mobility and access to information has led to massive discussions around data governance, protec- tion and security. Industries such as banking, health- care, pharma, and defense are under strict compliance and regulatory mandates that make it a tough job to create a proper data protection framework. It is not enough to have an IT infrastructure and security in place. Data governance has taken primary importance in these sectors where opportunity is big in Big Data, but risks can be huge. Infrastructure and system architecture While the advanced technologies of Hadoop and MapReduce are scaled to meet the 5Vs of big data, they assert significant demands on infrastructure in terms of scale, storage capacities that are efficient and cost- effective. Intelligent storage capacities can leverage through data compression, automatic data tiering and data deduplication. The question is how much is needed to implement Big Data and how much is enough. • • • • Discern business requirements before beginning to gather data – what is the actual business need Big Data implementation is a business decision, not an IT or technological function Take small steps to Big Data – an agile and iterative implementation approach can go a long way in yielding results giving the business space to respond, adapt and realize the true value in Big Data Standardize Big Data efforts into IT governance program in order to make up for skill shortages Align Big Data strategy with Cloud Strategy to tackle several issues around storage, security and scalability Embed Data Analytics into the system DNA to see real value by including big data into organizational data and breaking silos of teams • • • • • • © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved7
  • 8. Conclusion Big Data is the biggest opportunity in the modern world. There is a whole set of advantages that businesses can yield from big data and evolve as smart and connected enterprises. Also, it has now become evident that Big Data is permeating all aspects of life, making it an imperative for businesses, whether they are ready or not. However, there are also unique and new challenges thrown up by the data revolution that enterprises need to be careful about. With proper caution and preparedness, a business can thrive into excellence with big data, without being exposed to a risk situation. Knowing how big data works and being aware of the challenges is one of the first steps towards preparing for taking on the big elephant. http://www.ibmbigdatahub.com/blog/10-big-data- implementation-best-practices http://www.intel.in/content/dam/www/public/us/en/doc uments/solution-briefs/big-data-101-brief.pdf http://hbr.org/special-collections/insight/big-data http://www.experian.co.uk/assets/business- strategies/white-- papers/wp-big-data-evolution-not-revolution.pdf http://tdwi.org/whitepapers/list/advanced- analytics/big-data-analytics.aspx • • • • • References: Businesses in the Era of Big Data Big Data, being big enough, is merely just warming up. Soon, every aspect of the human society is going to be affected by big data. There is scope for every sector in every field to make great use of Big Data in enabling better lives for users. Here are some classic examples of Big Data in play in the field of business: Deeper levels of understanding and targeting custom- ers: A large US retailer has been able to accurately predict when a customer of theirs is expecting a baby. Churn management has become easily predictable for telecom companies and car insurance companies are able to understand how well their customers are driving. Optimizing Business Process: Big Data is not only giving a peek into the external audience, but also a great way for introspection into business processes. Stock optimi- zation in retail through predictive analysis from social media, web trends and weather forecasts is leading huge cost benefits. Supply chain management is particu- larly benefitting from data analytics. Geographic positioning and radio frequency identification sensors can now track goods or delivery vehicles and optimize routes by integrating live traffic data. Driving smarter machines and devices: The recently launched and widely talked about Google’s self-driven car is majorly using Big Data tools. The energy sector is also taking advantage by optimizing energy grids using data from smart meters. Big Data tools are also being used to improve performance of computers and data warehouses. Smarter Financial Trading: High-Frequency Trading (HFT) is finding huge application of Big Data today. Big Data algorithms used to make trading decisions has led to a majority of equity trading data algorithms taking into account data feeds from social media networks and news websites to make decisions in split seconds. These are some of the existing illustrations where Big Data is in application in the business sector. There are several other avenues, with newer ones opening by the day, where Big Data can drive organizations into being smarter, secured and connected. • • • • © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved8
  • 9. © Happiest Minds Technologies Pvt. Ltd. All Rights Reserved Happiest Minds is focused on helping customers build Smart Secure and Connected experience by leveraging disruptive technologies like mobility, analytics, security, cloud computing, social computing and unified communications. Enterprises are embracing these technologies to implement Omni-channel strategies, manage structured & unstructured data and make real time decisions based on actionable insights, while ensuring security for data and infrastructure. Happiest Minds also offers high degree of skills, IPs and domain expertise across a set of focused areas that include IT Services, Product Engineering Services, Infrastructure Management, Security, Testing and Consulting. Headquartered in Bangalore, India, Happiest Minds has operations in the US, UK, Singapore and Australia. It secured a $45 million Series-A funding led by Canaan Partners, Intel Capital and Ashok Soota. Anand Veeramani has 15 years of IT experience, including 3 years in Big Data technologies. He has worked in various roles including Chief Architect, SME, Practice Lead, Account Manager, Transition Manager, Senior DBA for different database & Big data services. Key past projects include: - Archi- tecting Enterprise Data warehouse solutions for the largest Bank in USA - Managing Complex data- base Consolidation & Migration Initiative for largest Bank in USA - Managing end to end database service transition for leading Airlines in North America About the Author Happiest Minds Anand Veeramani © 2014 Happiest Minds. All Rights Reserved. E-mail: Business@happiestminds.com Visit us: www.happiestminds.com Follow us on 9 Copyright Information This document is an exclusive property of Happiest Minds Technologies Pvt. Ltd. It is intended for limited circulation. Practice Director - Big Data & Database Services