This document provides an overview of social media and big data analytics. It discusses key concepts like Web 2.0, social media platforms, big data characteristics involving volume, velocity, variety, veracity and value. The document also discusses how social media data can be extracted and analyzed using big data tools like Hadoop and techniques like social network analysis and sentiment analysis. It provides examples of analyzing social media data at scale to gain insights and make informed decisions.
3. Web 2.0 is
A Complex,
Organic Online
Conversation
WHAT IS WEB 2.0?
Web 2.0 is powered by:
• Social Networks
•News and
Bookmarking
•Blogs
•Microblogging
•Video/Photo-sharing
•Message Boards
•Wikis
•Virtual reality
•Social gaming
•Podcasts
•Real Simple
syndication (RSS)
•Social Media Press
Release
4. TECHNOLOGY OVERVIEW
Search: The ease of finding information through keyword search
Links: Ad-hoc guides to other relevant information
Authoring: The ability to create constantly updating content over a platform
that is shifted from being the creation of a few to being constantly updated,
interlinked work.
Tags: Categorization of content by creating tags: simple,one-word user-
determined descriptions to facilitate searching and avoid rigid, pre-made
categories
Extensions: Powerful algorithms that leverage the Web as an application
platform as well as a documentserver
Signals: The use of RSS technology to rapidly notify users of content changes
Web 2.0 websites typically include some of the following features/techniques-
SLATES
5. Social media:
is an umbrella
term that
defines the
various activities
that integrate
technology,
social
interaction, and
the construction
of words,
pictures, videos
and audio.
WEB 2.0 TECHNOLOGIES:
SOCIAL MEDIA
6. “Creation of web content, by the
people, for the people”
In Simple Language…
9. Variety of sources from where data is being
generated has also undergone a shift
The types of data being created has changed
from structured to semi-structured to
unstructured data
Structured
Data
Semi-
Structured
Data
Unstructured
Data Need to manage broad range of data types
Process analytic queries across numerous data
types
Need to extract meaningful analysis from this
data has led to several technologies to gain
traction
Examples include NoSQL databases to store
unstructured data as well as innovative
processing methods like Hadoop and massive
parallel processing (MPP)
Today 80% Of Data Existing In
Any Enterprise Is Unstructured
Data
Unstructured data from social
media has to be approached in a
non traditional manner.
UNSTRUCTURED DATA
10. Facebook
- User Likes and
Favorites
- Article/Video/Link
Shares
- Views
- Comments
- Location / Geospatial
Twitter
Tweet Characteristics
- Length
- Language Model
- Semantics
- Emoticons
- Location / Geospatial
Google / You Tube
- Blogs
- Comments
- Search Statistics
- Likes vs Dislikes
- Shares / Views /
Comments
IDENTIFYING UNSTRUCTURED DATA
SOURCES
11.
12.
13. “Big Data”
is data whose
scale, diversity,
and complexity
require new
architecture,
techniques,
algorithms, and
analytics to
manage it and
extract value and
hidden knowledge
from it…
BIG DATA IS…
BIG DATA =
15. Implication for an organization
2009 2011 2015 2020
0.8
1.9
7.9
35.0
CAGR
(2009-2020)
41.0%
Zetabytes
THE GLOBAL DATA GROWTH
16. >3,500
>40
>2,000
>200
>400
Key verticals: Healthcare,
Manufacturing, Retail, Digital
Marketing
Demand trend: High demand
of Big Data analytics
>250
Key verticals: Telecom, Retail, Banking
Demand trend: Still embryonic; most
organizations have wait and watch approach
Demand trend: Current demand
appears to be limited, however,
lack of skills may drive
outsourcing of Big Data analytics
Low awareness levels
Key verticals: Technology, Financial services,
Oil & Gas, Utilities, Manufacturing
Demand trend: European MNC’s are still in
the early stages of the adoption cycle
North
America
South America
Europe
Middle East
India
China
Japan
Key verticals: Manufacturing,
Telecom, Health & Life Sciences
Demand trend: Demand for BI
to derive operational efficiency
Key verticals: Telecom, Bioinformatics,
Retail
Demand trend: Industry is in nascent stage
with demand catching up, particularly in retail
>50
16
NORTH AMERICA & EUROPE DRIVES THE BIG DATA
OPPORTUNITY WITH OVER 85%
OF THE WORLD’S DATA
17. Tools Description
The Hadoop
Distributed
File System
(HDFS)
HDFS divides the data into smaller parts and distributes
it across the various servers/nodes
SQL Server
Integration
Service
These tools allow posts can be downloaded and loaded
into Hadoop
Apache
Flume
MapReduce
MapReduce is a process that transforms data loaded
into Hadoop into a format that can be used for analysis.
Hive
a runtime Hadoop support architecture that leverages
Structure Query Language (SQL) with the Hadoop
platform.
Jaql Jaql converts high-level queries into low-level queries
and
Zookeeper Zookeeper coordinate parallel processing across big
clusters
HBase HBase is a column-oriented database management
system that sits on top of HDFS by using a non-SQL
approach.
BIG DATA TOOLS
19. Volume
refers to the vast amounts of
data generated every second.
We are not talking Terabytes
but Zettabytes or Brontobytes.
If we take all the data
generated in the world
between the beginning of time
and 2008, the same amount of
data will soon be generated
every minute.
This makes most data sets too
large to store and analyse
using traditional database
technology.
Variety
Veracity
Value
BIG DATA: VOLUME
20. BIG DATA: VELOCITY
Variety
Veracity
Value
Velocity
refers to the speed at which
new data is generated and
the speed at which data
moves around. Just think of
social media messages
going viral in seconds.
Technology allows us now to
analyse the data while it is
being generated
(sometimes referred to as
in-memory analytics),
without ever putting it into
databases.
21. Variety
Veracity
Value
Variety
refers to the different types
of data we can now use. In
the past we only focused on
structured data that neatly
fitted into tables or
relational databases, such
as financial data. In fact,
80% of the world’s data is
unstructured (text, images,
video, voice, etc.)
BIG DATA: VARIETY
22. Variety
Veracity
Value
Veracity
refers to the messiness or
trustworthiness of the data.
With many forms of big
data quality and accuracy
are less controllable (just
think of Twitter posts with
hash tags, abbreviations,
typos and colloquial speech
as well as the reliability and
accuracy of content) but
technology now allows us to
work with this type of data.
BIG DATA: VERACITY
23. Variety
Veracity
Value
VALUE
Then there is another V to
take into account when
looking at Big Data: Value!
Having access to big data is
no good unless we can turn
it into value.
Companies are starting to
generate amazing value
from their big data.
BIG DATA: VALUE
26. Big Data is also characterized by
velocity or speed i.e. frequency of
data generation or the frequency of
data delivery
New age communication channels
such as mobile phones, emails, social
networking has increased the rate of
information flows
Examples:
Telcos adopting location based
marketing based on user location
sensed by mobile towers
Satellite images can help monitor
and analyze troop movements, a
flood plane, cloud patterns, or forest
fires
Video analysis systems could monitor
a sensitive or valuable facility,
watching for possible intruders and
alert authorities in real time
Big Data velocity enabling real
time use of data
Data
velocity
per
minute
600+
videos on
YouTube
200
million+
emails sent
2
million+
Google
search
queries
400,000+
minutes of
Skype
calling
400,000+
tweets on
Twitter
US$
300,000+
are spent
on online
shopping
700,000+
Facebook
updates
7,000+
photos on
flickr
1,500+
blog posts
3500+
ticks per
minute in
securities
trading
BIG DATA & REAL TIME USE
27. BIG DATA FOR SOCIAL MEDIA ANALYTICS
PROCESS MODEL
28. CONCEPTUAL VIEW OF FRAMEWORK FOR BIG DATA
EXTRACTION, MESSAGING AND STORE
This phase has a composite pattern that is
based on the store-and-explore and focuses on
obtaining and storing the relevant data from
sources outside our establishment.
29. CONCEPTUAL VIEW OF DISCUSSION TOPIC AND
OPINION ANALYSIS COMPONENT
This phase has a composite pattern that is based on
purposeful-and-predictive analytics to gain advanced
insight.
30. WHAT IS HADOOP?
*Hadoop is an open source
framework which is used for
storing and processing the
large scale of data sets on
large clusters of hardware.
*The specialty of Hadoop
involves in HDFS which is used
for storing data on large
commodity machines and
provides very huge bandwidth
for the cluster.
32. CONCEPTUAL VIEW OF DATA VISUALIZATION AND
DECISION-MAKING COMPONENT
This project has a composite pattern based on
actionable-analysis with the aim of taking the next best
actions that leads to take appropriate actions by
related customers.
38. Sentiment analysis…
• Analyzes people’s sentiments,
opinions, appraisals, attitudes,
evaluations, and emotions
• Towards entities such as
organizations, products,
services, individuals, topics,
issues, events, and their
attributes
• As presented online via text,
video and other means of
communication.
• These communications can fall
into three broad categories:
positive, neutral or negative.
SENTIMENT ANALYSIS
39. We can inquire about sentiment at
various linguistic levels:
O Words – objective, positive,
negative, neutral
O Clauses – “going out of my
mind”
O Sentences – possibly multiple
sentiments
O Documents
LEVEL OF ANALYSIS
41. TRUTHY: A SOCIAL MEDIA RESEARCH
PROJECT
Truthy is a research project to study how memes spread on social
media. A meme is a transmissible unit of information, such as a hashtag,
phrase, or link. This website highlights some of the research coming from
this effort and showcases some visualizations, tools, and data resources
demonstrating broader impacts of the project.