2. WHAT IS DATA?
• Data is represented with the help of characters such as
alphabets (A-Z, a-z), digits (0-9) or special characters
(+,-,/,*,<,>,= etc.).
• This information may be in the form of text documents,
images, audio clips, software programs, or other types of
data.
3. WHAT IS BIG DATA?
• Big data is also a data but with huge size.
• The term “big data” refers to data that is so large, fast or complex that it’s difficult
or impossible to process using traditional methods.
• The concept of big data gained momentum in the early 2000s when industry analyst
Doug Laney articulated the now-mainstream definition of big data as the five V’s:
4.
5. • With the development and increase of apps and social media and people and businesses
moving online, there’s been a huge increase in data. If we look at only social media platforms,
they interest and attract over a million users daily, scaling up data more than ever before.
The next question is how exactly is this huge amount of data handled and how is it processed
and stored. This is where Big Data comes into play.
• The term not only refers to the data, but also to the various frameworks, tools, and
techniques involved.
• As you build your big data solution, consider open source software such as Apache
Hadoop, Apache Spark and the entire Hadoop ecosystem as cost-effective, flexible data
processing and storage tools designed to handle the volume of data being generated today.
• Big data is most often stored in computer databases and is analyzed using software
specifically designed to handle large, complex data sets. Many software-as-a-service (SaaS)
companies specialize in managing this type of complex data.
7. SOURCES OF BIG DATA
• Social media : such as Twitter, Facebook, Instagram, Pinterest, and Google+.
• Cloud : public, private, or third party cloud platforms.
• Web : data publically available on the web.
• IOT : data generated from the interconnection of web.(medical devices , video games,
cameras, meters)
• Databases : traditional and modern databases.
8. HOW DOES IT WORKS?
• Integrate : Big data brings together data from many disparate sources and
applications. During integration, you need to bring in the data, process it, and make
sure it’s formatted and available in a form that your business analysts can get
started with.
• Manage : Big data requires storage. Your storage solution can be in the cloud, on
premises, or both.
• Analyze : Get new clarity with a visual analysis of your varied data sets. Explore
the data further to make new discoveries. Build data models with machine learning
and Artificial Intelligence.
9.
10. TYPES OF BIG DATA……
1. Structured data
2. Unstructured data
3. Semi-structured data
11. STRUCTURED DATA
• Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data.
• Structured data is the easiest to work with.
• Sources : machine generated data(satellite images) & human generated (data entered by user as an
input).
• It’s all quantitative data.
• The ETL process for structured data stores the finished product in what is called a data warehouse.
These databases are highly structured and filtered for specific analytics purpose the initial data was
harvested for.
• Structured data is the easiest type of data to analyze because it requires little to no preparation before
processing.
• A table in a database is an example of structured data.
12. UN-STRUCTURED DATA
• Any data with unknown form or the structure is classified as unstructured data.
• About 80% of the total account for unstructured big data.
• The Unstructured data is further divided into –
I. Captured data (GPS)
II. User-Generated data (likes, shares, comments, tweets).
• The hardest part of analyzing unstructured data is teaching an application to understand the
information it is extracting. More often than not, this means translating it into some form of
structured data.
• Examples of Un-structured Data is text, video, audio, social media activity, satellite imagery,
email
13. SEMI-STRUCTURED DATA
• Semi-structured data can contain both the forms of data.
• Most of the time, this translates to unstructured data with metadata attached to it.
• If you send an email, the time sent, email addresses to and from, the IP address
from the device sent from, and other pieces of information are linked to the actual
content of the email.
• Semi-structured data has no set schema.
• Example of semi-structured data is a data represented in an XML file. For example,
NoSQL documents are considered to be semi-structured, since they contain
keywords that can be used to process the document easily.
16. CONCLUSION
• As more and more data is generated and collected, data analysis requires scalable,
flexible, and high performing tools to provide insights in a timely fashion. However,
organizations are facing a growing big data ecosystem where new tools emerge and
“die” very quickly. Therefore, it can be very difficult to keep pace and choose the
right tools.
• They have also been able to more accurately predict daily weather as well as natural
disasters.
• By using these big data-related systems, engineers and scientists have been able to
more easily design cars, airplanes, and other vehicles.
17. FUTURE ENHANCEMENT
• Increasing demand for data analytics
• Increasing enterprise adoption of big data
• Big Data finds application across various parallels of the industry. (Professional,
Scientific and Technical Services (25%), Information Technology (17%), Manufacturing
(15%), Finance and Insurance (9%), and Retail Trade (8%)).
• Flexible career options.
• Promises exponential salary growth.