http://www.zd8a.com
Slide Deck Focus
Sentiment analysis through Facebook and Twitter leveraging
-Hadoop
-MongoDB
-Mahout
-Greenplum
-Solr
This slide deck was a product of developing a sentiment and text analytics engine. We leveraged Facebook Connect, Twitter Firehose and web scrapting to gather text and store it in both MongoDB and Hadoop. Once we had it stored we performed Mahout and Solr text searching and anlytics to determine trends within the data. Although our dataset was not large enough to need it, we used Greenplum as a test MPP database to tie all three of those technologies into one dashboard using Pentaho.
2. Z DATA’S AGILE ANALYSIS – THE “BIG DATA STACK”
• How we leverage the “Big Data” stack?
– Technology
• Don’t back your problem into available technologies, leave your toolset open.
• Organically grow new skillsets, hire the right individuals
– Development
• Be agile in your approach
• Comparative analysis both using new mathematical methods and open source
technologies
– Embrace the shift into a data driven world
• Empower your Engineering and Science team to be creative
• Let the data lead your direction
• Use new data types previously unavailable to drive insights
“Associating structured and unstructured data at relevant points is
where the most value is gained and where the highest level of
challenge is presented.” – Ryan Abo PHD – Z Data Inc.
3. ANALYZING THE POLITICAL LANDSCAPE
• Location based Google
Search and Twitter mentions
Phase 1 • Word pair mentions
• Facebook and Twitter
Sentiment and Geospatial
Phase 2 Analysis
4. UNSTRUCTURED AND STRUCTURED DATA
COMPLEMENTING YOUR TECHNOLOGIES
Structured Data
• Standard Datawarehouse – finance, sales
• GeoSpatial – locations, places
• Technologies – Greenplum, Netezza, Teradata
Unstructured Data
• Textual Objects - Social Media, Blogs, forums
• Bitmap Objects – images, video, audio
• Technologies – Hadoop, Cassandra, Solr, NoSql
5. Identifying Unstructured Data Sources
Objective: Identify and leverage social media outlets to better predict the overall
sentiment across political candidates.
Facebook
Twitter
- User Likes and Favorites
- Article/Video/Link Shares Google / You Tube
Tweet Characteristics
- Views
- Length
- Comments - Blogs
- Language Model
- Location / Geospatial - Comments
- Symantics
- Search Statistics
- Emoticons
- Likes vs Dislikes
- Location / Geospatial
- Shares / Views /
Comments
6. SEARCH, MENTION AND WORD PAIR ANALYSIS
Search Engine Data
• Number of Searches for a candidate or
political party
• Word pair / combination analysis
Why should we care?
• Determine the most successful candidate
online
• Effectiveness of campaigns and conversion
to online competitive content
7. ADVANCED SENTIMENT ANALYSIS
What is this sentiment they speak of?
• Unstructured Text Data
• Using computational linguistics to
accurately determine the attitude of a
writer with respect to a topic.
Why should we care?
• Use “Opinion Mining” to predict political
bias