Data Mining for Moderation of Social Data

Data Mining for Moderation of
Social Data

Fernando G. Guerrero
CEO SolidQ
fguerrero@solidq.com

Introductions
• Fernando G. Guerrero
• Global CEO of SolidQ
• fguerrero@solidq.com

• Microsoft Regional Director for Spain since 2004
• SQL Server MVP from year 2000 till 2007
• Usual suspect at many international conferences

SolidQ 2012… 10th anniversary
• 160 people in 23 countries:
• Argentina, Australia, Austria, Bulgaria, Canada, Chile,
Costa Rica, Croatia, Denmark, France, Germany, India,
Israel, Italy, Mexico, Saudi Arabia, Serbia, Slovakia,
Slovenia, Spain, Sweden, UK, USA
• 50 current or former RDs or MVPs
• Authors of many books, articles, and whitepapers
• Research Collaboration with:
• Universidad de Alicante
• Universidad de les Illes Balears
• Universidad de Santiago de Compostela
• The European Union
• The Spanish Ministry of Economy and Innovation

Agenda

• Social Data
• Market Research
• Sentiment Analysis, Text Mining
• Moderation, Data Mining
• SolidQ Research Lines in Social Data

© 2012 SolidQ 6

Social data is everywhere

© 2012 SolidQ 7

Social data is about everything

Music

© 2012 SolidQ 9

Social is there
• Is your organization promoting social about you?

Products
Services
Stories

© 2012 SolidQ 10

Social is there, reputation
• What is social saying about you?
• Product
• Services
• Decisions
• Image

© 2012 SolidQ 11

Market Research
• What is social requesting you?
• Future Services
• Product updates

• Can you ask questions to social?
• Is this service going to succeed
• How can I fixed the current problem
• Is society ready for this law

© 2012 SolidQ 12

Sentiment Analysis, Text Mining

The movie The movie The movie
was fabulous! stars Mr. X was horrible!

[ Sentimental ] [ Factual ] [ Sentimental ]

© 2012 SolidQ 13

What is Data Mining?
• Inform actionable business decisions
• Contrasts with “machine learning”

© 2012 SolidQ 15

Media Case Study
• Millions of posts per year (different moderation
scenarios)
• About 25% are human moderated
• About 10% of the moderated posts fail
• No Business Intelligence applications for analysis
or reporting

© 2012 SolidQ 16

Moderation, Data Mining
• Contextual Information
• Time
• Location
• User
• At 10am comments are safer than at 2AM.
• A user maybe safe talking about science bad
dangerous talking about sports.
• If a thread is hot (dangerous), comment maybe hot.
• Combining context pattern the systems assign risk to
posts without going into the text.
© 2012 SolidQ 17

Solution – Logical Model
• Post Context (behavior analysis)
• Patterns, data mining.
• Post Content (text analysis)
• Profanity, low score sentences, text mining, mood or
tone (sentiment analysis)

© 2012 SolidQ 18

Typically Available Data on Posts
• Historical and real time data for:
• User (e.g. userid, email, nationalid)
• Location (e.g. Life & Style  Fashion)
• Time (e.g. 12 March 2011 18:56)
• Content (e.g. text, link, picture, video).
• Moderation result

• Other attributes like geography, age, education
could be used

© 2012 SolidQ 19

Post context, Patterns, Data
Mining
• User behavior.
• Time behavior.
• Location behavior.

© 2012 Solid Quality Mentors 20

Building useful attributes
• 1.- Thread ( % Fails in a certain thread)
• 2.- User (% Fails per User)
• 3.- Diff Hour Forum Created (TimeDatePosted-TimeForumCreated)
• 4.- User Forum (% Fails in a certain forum)
• 5.- Diff Last for User (TimeDatePosted - TimeLastFailUser)
• 6.- Hour of the day
• 7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined)
• 8.- User Thread (% Fails per User in a thread)
• 9.- Diff Hour Thread Created (TimeDatePosted-TimeThreadCreated)
• 10.- Day of Week
• More than 100 attributes.


Hard Work
• Periods.
• Algorithms.
• Algorithms' parameters.
• Model refreshing.
• Attribute analysis.
• Outliers.
• Overpopulating.
• Behavior after this systems is in production.


Data Mining Algorithms
• Decision Trees/Linear Regression
• Sequence Analysis
• Neural Networks/Logistic Regression
• Clustering
• Text Mining (Words and Phrases)

© 2012 SolidQ 23

Conclusion on Context
• Risk based on context of the post
• Time
• User’s history
• Publish location
• Enables risk analysis for all type of content
• Comments (in any language)
• Links
• Pictures
• Videos

© 2012 SolidQ 24

Logical Model: Post content
• Profanity Analysis
• Text Mining
The first minister and his secretary found sleeping together last night. They got
drunk at a nearby pub.

• Sentiment Analysis

© 2012 SolidQ 25

Analysis and Reporting
• Published through integrated web application
• Moderation statistics.
• Users statistics.
• News and Stories Statistics.
• Peaks.

© 2012 SolidQ 29

Conclusion: Benefits
• Moderating half of the total posts, the solution
captures 90% of failing posts. The remaining 10%
seem to be likely safe posts.
• Using Intelligent Moderation, media companies
scan the whole universe of posts at a
comparatively low cost.
• At peak times, Intelligent Moderation works
perfect.

© 2012 SolidQ 30

Football night in Europe
• On January 25th, 2012:
• Liverpool defeated Manchester City in the Carling Cup
• Barcelona defeated Real Madrid in Copa del Rey
• More than 100.000 comments arrived to the
different BBC sites during 10 hours
• All comments were filtered through our system
• No problems observed during that time

© 2011 SolidQ 31

SolidQ Team in this project
• Project Managers
• Francisco Gonzalez, Javier Torrenteras, Alejandro
Leguizamo
• Developers
• Itzik Ben-Gan, Enrique Puig, Ruben Pertusa, Carlos
Martinez , Fernando G. Guerrero
• Technical reviewers
• Mark Tabladillo, Dejan Sarka
• Social Media Specialist.
• Jose Quinto, Rocio Díaz
© 2012 SolidQ 32

SolidQ Reseach
• Incomplete Grammar Analysis
• Human interaction with IT systems
• Collaboration
• Contextual analysis
• Sentiment Analysis
• Market Research
• Reputation
• Data Mining of context Social
• Moderation
• Market Research
• Reputation
© 2012 SolidQ 33

Invisible computing…
… Driven by Social Data

34

Data Mining for Moderation of Social Data

Recommandé

Recommandé

Contenu connexe

Similaire à Data Mining for Moderation of Social Data

Similaire à Data Mining for Moderation of Social Data (20)

Plus de Fernando G. Guerrero

Plus de Fernando G. Guerrero (12)

Dernier

Dernier (20)

Data Mining for Moderation of Social Data