Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Social Media with Big Data Analytics
Mohammed Zuhair Al-Taie
Big Data Centre - Universiti Teknologi Malaysia - 2016
AGENDA
Web 2.0
Social Media
Big Data
Social Media with Big Data Analytics
Social Network Analysis
Sentiment Analysis
Web 2.0 is
A Complex,
Organic Online
Conversation
WHAT IS WEB 2.0?
Web 2.0 is powered by:
• Social Networks
•News and
Book...
TECHNOLOGY OVERVIEW
Search: The ease of finding information through keyword search
Links: Ad-hoc guides to other relevant ...
Social media:
is an umbrella
term that
defines the
various activities
that integrate
technology,
social
interaction, and
t...
“Creation of web content, by the
people, for the people”
In Simple Language…
SOCIAL MEDIA PLATFORMS
WHAT HAPPENS EVERY 1 MIN?
 Variety of sources from where data is being
generated has also undergone a shift
 The types of data being created has c...
Facebook
- User Likes and
Favorites
- Article/Video/Link
Shares
- Views
- Comments
- Location / Geospatial
Twitter
Tweet C...
“Big Data”
is data whose
scale, diversity,
and complexity
require new
architecture,
techniques,
algorithms, and
analytics ...
BIG DATA VS USUAL DATA
Implication for an organization
2009 2011 2015 2020
0.8
1.9
7.9
35.0
CAGR
(2009-2020)
41.0%
Zetabytes
THE GLOBAL DATA GROW...
>3,500
>40
>2,000
>200
>400
 Key verticals: Healthcare,
Manufacturing, Retail, Digital
Marketing
 Demand trend: High dem...
Tools Description
The Hadoop
Distributed
File System
(HDFS)
HDFS divides the data into smaller parts and distributes
it ac...
Variety
Veracity
Value
BIG DATA IS OFTEN DESCRIBED USING
FIVE Vs
Volume
refers to the vast amounts of
data generated every second.
We are not talking Terabytes
but Zettabytes or Brontobyt...
BIG DATA: VELOCITY
Variety
Veracity
Value
Velocity
refers to the speed at which
new data is generated and
the speed at whi...
Variety
Veracity
Value
Variety
refers to the different types
of data we can now use. In
the past we only focused on
struct...
Variety
Veracity
Value
Veracity
refers to the messiness or
trustworthiness of the data.
With many forms of big
data qualit...
Variety
Veracity
Value
VALUE
Then there is another V to
take into account when
looking at Big Data: Value!
Having access t...
THE INTERSECTION OF SOCIAL MEDIA
AND BIG DATA
 Big Data is also characterized by
velocity or speed i.e. frequency of
data generation or the frequency of
data delivery
...
BIG DATA FOR SOCIAL MEDIA ANALYTICS
PROCESS MODEL
CONCEPTUAL VIEW OF FRAMEWORK FOR BIG DATA
EXTRACTION, MESSAGING AND STORE
This phase has a composite pattern that is
based...
CONCEPTUAL VIEW OF DISCUSSION TOPIC AND
OPINION ANALYSIS COMPONENT
This phase has a composite pattern that is based on
pur...
WHAT IS HADOOP?
*Hadoop is an open source
framework which is used for
storing and processing the
large scale of data sets ...
CONCEPTUAL VIEW OF APACHE HADOOP
ARCHITECTURE
CONCEPTUAL VIEW OF DATA VISUALIZATION AND
DECISION-MAKING COMPONENT
This project has a composite pattern based on
actionab...
SOCIAL NETWORK ANALYSIS
A GLOBAL SOCIAL NETWORK
NETWORK PERSPECTIVE
WHY SOCIAL NETWORK ANALYSIS
MATTERS?
SOCIAL NETWORK ANALYSIS: THE NEW
SCIENCE OF NETWORKS
Sentiment analysis…
• Analyzes people’s sentiments,
opinions, appraisals, attitudes,
evaluations, and emotions
• Towards e...
We can inquire about sentiment at
various linguistic levels:
O Words – objective, positive,
negative, neutral
O Clauses – ...
Elections 2012 Dashboard
FILTER BY:
Facebook
Twitter
Google
Mitt Romney
RepublicanPrimary
Democratic Vote
Republican Vote
...
TRUTHY: A SOCIAL MEDIA RESEARCH
PROJECT
Truthy is a research project to study how memes spread on social
media. A meme is ...
Social media with big data analytics
Social media with big data analytics
Social media with big data analytics
Social media with big data analytics
Prochain SlideShare
Chargement dans…5
×

Social media with big data analytics

5 368 vues

Publié le

Social media is an umbrella term that defines the various activities that integrate technology, social interaction, and the construction of words, pictures, videos and audio.
Big Data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
Hadoop is an open source framework which is used for storing and processing the large scale of data sets on large clusters of hardware.

Publié dans : Données & analyses

Social media with big data analytics

  1. 1. Social Media with Big Data Analytics Mohammed Zuhair Al-Taie Big Data Centre - Universiti Teknologi Malaysia - 2016
  2. 2. AGENDA Web 2.0 Social Media Big Data Social Media with Big Data Analytics Social Network Analysis Sentiment Analysis
  3. 3. Web 2.0 is A Complex, Organic Online Conversation WHAT IS WEB 2.0? Web 2.0 is powered by: • Social Networks •News and Bookmarking •Blogs •Microblogging •Video/Photo-sharing •Message Boards •Wikis •Virtual reality •Social gaming •Podcasts •Real Simple syndication (RSS) •Social Media Press Release
  4. 4. TECHNOLOGY OVERVIEW Search: The ease of finding information through keyword search Links: Ad-hoc guides to other relevant information Authoring: The ability to create constantly updating content over a platform that is shifted from being the creation of a few to being constantly updated, interlinked work. Tags: Categorization of content by creating tags: simple,one-word user- determined descriptions to facilitate searching and avoid rigid, pre-made categories Extensions: Powerful algorithms that leverage the Web as an application platform as well as a documentserver Signals: The use of RSS technology to rapidly notify users of content changes Web 2.0 websites typically include some of the following features/techniques- SLATES
  5. 5. Social media: is an umbrella term that defines the various activities that integrate technology, social interaction, and the construction of words, pictures, videos and audio. WEB 2.0 TECHNOLOGIES: SOCIAL MEDIA
  6. 6. “Creation of web content, by the people, for the people” In Simple Language…
  7. 7. SOCIAL MEDIA PLATFORMS
  8. 8. WHAT HAPPENS EVERY 1 MIN?
  9. 9.  Variety of sources from where data is being generated has also undergone a shift  The types of data being created has changed from structured to semi-structured to unstructured data Structured Data Semi- Structured Data Unstructured Data Need to manage broad range of data types  Process analytic queries across numerous data types  Need to extract meaningful analysis from this data has led to several technologies to gain traction  Examples include NoSQL databases to store unstructured data as well as innovative processing methods like Hadoop and massive parallel processing (MPP) Today 80% Of Data Existing In Any Enterprise Is Unstructured Data Unstructured data from social media has to be approached in a non traditional manner. UNSTRUCTURED DATA
  10. 10. Facebook - User Likes and Favorites - Article/Video/Link Shares - Views - Comments - Location / Geospatial Twitter Tweet Characteristics - Length - Language Model - Semantics - Emoticons - Location / Geospatial Google / You Tube - Blogs - Comments - Search Statistics - Likes vs Dislikes - Shares / Views / Comments IDENTIFYING UNSTRUCTURED DATA SOURCES
  11. 11. “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… BIG DATA IS… BIG DATA =
  12. 12. BIG DATA VS USUAL DATA
  13. 13. Implication for an organization 2009 2011 2015 2020 0.8 1.9 7.9 35.0 CAGR (2009-2020) 41.0% Zetabytes THE GLOBAL DATA GROWTH
  14. 14. >3,500 >40 >2,000 >200 >400  Key verticals: Healthcare, Manufacturing, Retail, Digital Marketing  Demand trend: High demand of Big Data analytics >250  Key verticals: Telecom, Retail, Banking  Demand trend: Still embryonic; most organizations have wait and watch approach  Demand trend: Current demand appears to be limited, however, lack of skills may drive outsourcing of Big Data analytics  Low awareness levels  Key verticals: Technology, Financial services, Oil & Gas, Utilities, Manufacturing  Demand trend: European MNC’s are still in the early stages of the adoption cycle North America South America Europe Middle East India China Japan  Key verticals: Manufacturing, Telecom, Health & Life Sciences  Demand trend: Demand for BI to derive operational efficiency  Key verticals: Telecom, Bioinformatics, Retail  Demand trend: Industry is in nascent stage with demand catching up, particularly in retail >50 16 NORTH AMERICA & EUROPE DRIVES THE BIG DATA OPPORTUNITY WITH OVER 85% OF THE WORLD’S DATA
  15. 15. Tools Description The Hadoop Distributed File System (HDFS) HDFS divides the data into smaller parts and distributes it across the various servers/nodes SQL Server Integration Service These tools allow posts can be downloaded and loaded into Hadoop Apache Flume MapReduce MapReduce is a process that transforms data loaded into Hadoop into a format that can be used for analysis. Hive a runtime Hadoop support architecture that leverages Structure Query Language (SQL) with the Hadoop platform. Jaql Jaql converts high-level queries into low-level queries and Zookeeper Zookeeper coordinate parallel processing across big clusters HBase HBase is a column-oriented database management system that sits on top of HDFS by using a non-SQL approach. BIG DATA TOOLS
  16. 16. Variety Veracity Value BIG DATA IS OFTEN DESCRIBED USING FIVE Vs
  17. 17. Volume refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes most data sets too large to store and analyse using traditional database technology. Variety Veracity Value BIG DATA: VOLUME
  18. 18. BIG DATA: VELOCITY Variety Veracity Value Velocity refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyse the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases.
  19. 19. Variety Veracity Value Variety refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) BIG DATA: VARIETY
  20. 20. Variety Veracity Value Veracity refers to the messiness or trustworthiness of the data. With many forms of big data quality and accuracy are less controllable (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech as well as the reliability and accuracy of content) but technology now allows us to work with this type of data. BIG DATA: VERACITY
  21. 21. Variety Veracity Value VALUE Then there is another V to take into account when looking at Big Data: Value! Having access to big data is no good unless we can turn it into value. Companies are starting to generate amazing value from their big data. BIG DATA: VALUE
  22. 22. THE INTERSECTION OF SOCIAL MEDIA AND BIG DATA
  23. 23.  Big Data is also characterized by velocity or speed i.e. frequency of data generation or the frequency of data delivery  New age communication channels such as mobile phones, emails, social networking has increased the rate of information flows Examples:  Telcos adopting location based marketing based on user location sensed by mobile towers  Satellite images can help monitor and analyze troop movements, a flood plane, cloud patterns, or forest fires  Video analysis systems could monitor a sensitive or valuable facility, watching for possible intruders and alert authorities in real time Big Data velocity enabling real time use of data Data velocity per minute 600+ videos on YouTube 200 million+ emails sent 2 million+ Google search queries 400,000+ minutes of Skype calling 400,000+ tweets on Twitter US$ 300,000+ are spent on online shopping 700,000+ Facebook updates 7,000+ photos on flickr 1,500+ blog posts 3500+ ticks per minute in securities trading BIG DATA & REAL TIME USE
  24. 24. BIG DATA FOR SOCIAL MEDIA ANALYTICS PROCESS MODEL
  25. 25. CONCEPTUAL VIEW OF FRAMEWORK FOR BIG DATA EXTRACTION, MESSAGING AND STORE This phase has a composite pattern that is based on the store-and-explore and focuses on obtaining and storing the relevant data from sources outside our establishment.
  26. 26. CONCEPTUAL VIEW OF DISCUSSION TOPIC AND OPINION ANALYSIS COMPONENT This phase has a composite pattern that is based on purposeful-and-predictive analytics to gain advanced insight.
  27. 27. WHAT IS HADOOP? *Hadoop is an open source framework which is used for storing and processing the large scale of data sets on large clusters of hardware. *The specialty of Hadoop involves in HDFS which is used for storing data on large commodity machines and provides very huge bandwidth for the cluster.
  28. 28. CONCEPTUAL VIEW OF APACHE HADOOP ARCHITECTURE
  29. 29. CONCEPTUAL VIEW OF DATA VISUALIZATION AND DECISION-MAKING COMPONENT This project has a composite pattern based on actionable-analysis with the aim of taking the next best actions that leads to take appropriate actions by related customers.
  30. 30. SOCIAL NETWORK ANALYSIS
  31. 31. A GLOBAL SOCIAL NETWORK
  32. 32. NETWORK PERSPECTIVE
  33. 33. WHY SOCIAL NETWORK ANALYSIS MATTERS?
  34. 34. SOCIAL NETWORK ANALYSIS: THE NEW SCIENCE OF NETWORKS
  35. 35. Sentiment analysis… • Analyzes people’s sentiments, opinions, appraisals, attitudes, evaluations, and emotions • Towards entities such as organizations, products, services, individuals, topics, issues, events, and their attributes • As presented online via text, video and other means of communication. • These communications can fall into three broad categories: positive, neutral or negative. SENTIMENT ANALYSIS
  36. 36. We can inquire about sentiment at various linguistic levels: O Words – objective, positive, negative, neutral O Clauses – “going out of my mind” O Sentences – possibly multiple sentiments O Documents LEVEL OF ANALYSIS
  37. 37. Elections 2012 Dashboard FILTER BY: Facebook Twitter Google Mitt Romney RepublicanPrimary Democratic Vote Republican Vote Democratic Sentiment Republican Sentiment
  38. 38. TRUTHY: A SOCIAL MEDIA RESEARCH PROJECT Truthy is a research project to study how memes spread on social media. A meme is a transmissible unit of information, such as a hashtag, phrase, or link. This website highlights some of the research coming from this effort and showcases some visualizations, tools, and data resources demonstrating broader impacts of the project.

×