2. What is webhose.io?
On-demand access to structured, clean, and organized web data
Anyone can consume with a RESTful API
Easy to integrate & machine readable (JSON, XML, RSS, Excel)
Operating since 2015
With over 20K users consuming data via Webhose
6. Example: social media analytics companies
The need
To enrich machine analysis and
models using publicly available
data for deep insights
Webhose.io solution Your software
Webhose.io DaaS Platform
7. Example from our clients: social media analytics companies
This is what we do
The needs Data Our client software
To enrich the analyzed data
sets with extensive relevant
web data, in order to derive
deeper business insight
Title
Date
Author
Content
Replies
website
Arena
Title
Date
Author
Content
Vbulletin.com
Message board
Follow vBulletin on Google+
Thu 4th Apr ‘ 14, 2:43pm.
Lawrence Cole
The team here at vBulletin is working hard to become more engaged
with our market and customer base both in the forums and on social
media. To help us in this effort, we are asking that you please go …
Moreand “follow” us on Google+ by adding us to your G+ circles.
This is how it looks on Webhose.io side
8. Example: social media analytics companies
The need Webhose.io solution Your software
Webhose.io DaaS Platform
To enrich machine analysis and
models using publicly available
data for deep insights
9. How to win
Sign up with the SAME email you used to sign up for the hackathon
Use webhose.io in your Colman hack project
Share your project and mention webhose.io
16. Analyze Datasets
General model
80/20 train/test on Booking.com data
Precision Recall F1-Measure
Positive 0.864 0.843 0.8536585366
Negative 0.852 0.872 0.8620689655
General Model test
on Expedia.com data
Positive 0.878 0.805 0.8400520156
Negative 0.659 0.77 0.7105882353
Domain specific model
80/20 train/test on Booking.com data
Positive 0.878 0.908 0.8926553672
Negative 0.902 0.871 0.8862275449
Domain specific Model test
on Expedia.com data
Positive 0.899 0.892 0.8926553672
Negative 0.825 0.836 0.8307692308
http://blog.webhose.io/2017/02/09/how-to-use-rated-reviews-for-sentiment-classification
17. Natural language processing & generation
• Extract training datasets
• Identify patterns (e.g. positive reviews)
• Generate new reviews
19. The Fraudster Pattern
1. Identify victim talking about CC information on Twitter while using
benign account (e.g. @harmless-good-guy1)
2. Create new dummy account and engage with victim
(follow, friend, RT using fresh new account @harmeless-good-guy2)
3. Send victim link to blog/forum that contains malicious phishing site
4. Harvest victim CC information
5. Post harvested CC database for sale
20. Research Mission: Expose
Fraud PatternObtain two datasets over a 48 hour period
by querying for fraud signal keywords
(ICQ, cvv, cvv2, amex)
Multi-layered graph-based model for
social engineering vulnerability assessment