SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Data Mining for Moderation of
            Social Data



Fernando G. Guerrero
CEO SolidQ
fguerrero@solidq.com
© 2011 SolidQ   3
Introductions
• Fernando G. Guerrero
• Global CEO of SolidQ
• fguerrero@solidq.com

• Microsoft Regional Director for Spain since 2004
• SQL Server MVP from year 2000 till 2007
• Usual suspect at many international conferences
SolidQ 2012… 10th anniversary
• 160 people in 23 countries:
  • Argentina, Australia, Austria, Bulgaria, Canada, Chile,
      Costa Rica, Croatia, Denmark, France, Germany, India,
      Israel, Italy, Mexico, Saudi Arabia, Serbia, Slovakia,
      Slovenia, Spain, Sweden, UK, USA
• 50 current or former RDs or MVPs
• Authors of many books, articles, and whitepapers
• Research Collaboration with:
  •   Universidad de Alicante
  •   Universidad de les Illes Balears
  •   Universidad de Santiago de Compostela
  •   The European Union
  •   The Spanish Ministry of Economy and Innovation
Agenda

• Social Data
• Market Research
• Sentiment Analysis, Text Mining
• Moderation, Data Mining
• SolidQ Research Lines in Social Data




© 2012 SolidQ                            6
Social data is everywhere




© 2012 SolidQ               7
8
Social data is about everything




                  Music

© 2012 SolidQ                     9
Social is there
 • Is your organization promoting social about you?

Products
Services
Stories




 © 2012 SolidQ                                    10
Social is there, reputation
• What is social saying about you?
     •     Product
     •     Services
     •     Decisions
     •     Image




© 2012 SolidQ                        11
Market Research
• What is social requesting you?
     • Future Services
     • Product updates

• Can you ask questions to social?
     • Is this service going to succeed
     • How can I fixed the current problem
     • Is society ready for this law


© 2012 SolidQ                                12
Sentiment Analysis, Text Mining

       The movie       The movie      The movie
      was fabulous!    stars Mr. X   was horrible!

     [ Sentimental ]   [ Factual ]   [ Sentimental ]




© 2012 SolidQ                                          13
© 2011 SolidQ   14
What is Data Mining?
• Inform actionable business decisions
• Contrasts with “machine learning”




© 2012 SolidQ                            15
Media Case Study
• Millions of posts per year (different moderation
   scenarios)
• About 25% are human moderated
• About 10% of the moderated posts fail
• No Business Intelligence applications for analysis
   or reporting




© 2012 SolidQ                                          16
Moderation, Data Mining
• Contextual Information
     •     Time
     •     Location
     •     User
• At 10am comments are safer than at 2AM.
• A user maybe safe talking about science bad
   dangerous talking about sports.
• If a thread is hot (dangerous), comment maybe hot.
• Combining context pattern the systems assign risk to
   posts without going into the text.
© 2012 SolidQ                                        17
Solution – Logical Model
• Post Context (behavior analysis)
     • Patterns, data mining.
• Post Content (text analysis)
     • Profanity, low score sentences, text mining, mood or
           tone (sentiment analysis)




© 2012 SolidQ                                                 18
Typically Available Data on Posts
• Historical and real time data for:
     •     User (e.g. userid, email, nationalid)
     •     Location (e.g. Life & Style  Fashion)
     •     Time (e.g. 12 March 2011 18:56)
     •     Content (e.g. text, link, picture, video).
     •     Moderation result


• Other attributes like geography, age, education
   could be used

© 2012 SolidQ                                           19
Post context, Patterns, Data
Mining
• User behavior.
• Time behavior.
• Location behavior.




© 2012 Solid Quality Mentors   20
Building useful attributes
        •   1.- Thread ( % Fails in a certain thread)
        •   2.- User (% Fails per User)
        •   3.- Diff Hour Forum Created (TimeDatePosted-TimeForumCreated)
        •   4.- User Forum (% Fails in a certain forum)
        •   5.- Diff Last for User (TimeDatePosted - TimeLastFailUser)
        •   6.- Hour of the day
        •   7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined)
        •   8.- User Thread (% Fails per User in a thread)
        •   9.- Diff Hour Thread Created (TimeDatePosted-TimeThreadCreated)
        •   10.- Day of Week
        •   More than 100 attributes.




© 2012 Solid Quality Mentors                                                  21
Hard Work
• Periods.
• Algorithms.
• Algorithms' parameters.
• Model refreshing.
• Attribute analysis.
• Outliers.
• Overpopulating.
• Behavior after this systems is in production.

© 2012 Solid Quality Mentors                      22
Data Mining Algorithms
• Decision Trees/Linear Regression
• Sequence Analysis
• Neural Networks/Logistic Regression
• Clustering
• Text Mining (Words and Phrases)




© 2012 SolidQ                           23
Conclusion on Context
• Risk based on context of the post
     • Time
     • User’s history
     • Publish location
• Enables risk analysis for all type of content
     •     Comments (in any language)
     •     Links
     •     Pictures
     •     Videos

© 2012 SolidQ                                     24
Logical Model: Post content
• Profanity Analysis
• Text Mining
      The first minister and his secretary found sleeping together last night. They got
      drunk at a nearby pub.

• Sentiment Analysis




© 2012 SolidQ                                                                             25
© 2011 SolidQ   26
Moderation, Data Mining System




© 2012 SolidQ                    27
© 2011 SolidQ   28
Analysis and Reporting
• Published through integrated web application
     •     Moderation statistics.
     •     Users statistics.
     •     News and Stories Statistics.
     •     Peaks.




© 2012 SolidQ                                    29
Conclusion: Benefits
• Moderating half of the total posts, the solution
   captures 90% of failing posts. The remaining 10%
   seem to be likely safe posts.
• Using Intelligent Moderation, media companies
   scan the whole universe of posts at a
   comparatively low cost.
• At peak times, Intelligent Moderation works
   perfect.



© 2012 SolidQ                                        30
Football night in Europe
• On January 25th, 2012:
     • Liverpool defeated Manchester City in the Carling Cup
     • Barcelona defeated Real Madrid in Copa del Rey
• More than 100.000 comments arrived to the
   different BBC sites during 10 hours
• All comments were filtered through our system
• No problems observed during that time


© 2011 SolidQ                                              31
SolidQ Team in this project
• Project Managers
     • Francisco Gonzalez, Javier Torrenteras, Alejandro
           Leguizamo
• Developers
     • Itzik Ben-Gan, Enrique Puig, Ruben Pertusa, Carlos
           Martinez , Fernando G. Guerrero
• Technical reviewers
     • Mark Tabladillo, Dejan Sarka
• Social Media Specialist.
     • Jose Quinto, Rocio Díaz
© 2012 SolidQ                                               32
SolidQ Reseach
• Incomplete Grammar Analysis
• Human interaction with IT systems
     • Collaboration
     • Contextual analysis
• Sentiment Analysis
     • Market Research
     • Reputation
• Data Mining of context Social
     • Moderation
     • Market Research
     • Reputation
© 2012 SolidQ                         33
Invisible computing…
… Driven by Social Data




                          34
THANK YOU!

Fernando G. Guerrero
Global CEO SolidQ
fguerrero@solidq.com
 © 2012 SolidQ         35

Contenu connexe

Similaire à Data Mining for Moderation of Social Data

Big Data in Media
Big Data in MediaBig Data in Media
Big Data in MediaKris Tuttle
 
Social Listening – Gateway to Innovation
Social Listening – Gateway to InnovationSocial Listening – Gateway to Innovation
Social Listening – Gateway to InnovationNetBase Solutions Inc.
 
Content is King: Presentation to Cross Media Innocation Center at RIT
Content is King: Presentation to Cross Media Innocation Center at RITContent is King: Presentation to Cross Media Innocation Center at RIT
Content is King: Presentation to Cross Media Innocation Center at RITMatt Turner
 
Scott Whitmire - Just What is Architecture Anyway
Scott Whitmire - Just What is Architecture AnywayScott Whitmire - Just What is Architecture Anyway
Scott Whitmire - Just What is Architecture Anywayiasaglobal
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopGrant Ingersoll
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
Desktop Network Systems what we do
Desktop Network Systems what we doDesktop Network Systems what we do
Desktop Network Systems what we doVince Bailey
 
doolyk_rev_p_001.compressed
doolyk_rev_p_001.compresseddoolyk_rev_p_001.compressed
doolyk_rev_p_001.compressedDoolytics
 
Advanced Persistent Threat - Evaluating Effective Responses
Advanced Persistent Threat - Evaluating Effective ResponsesAdvanced Persistent Threat - Evaluating Effective Responses
Advanced Persistent Threat - Evaluating Effective ResponsesNetIQ
 
Technical_Update_Germany
Technical_Update_GermanyTechnical_Update_Germany
Technical_Update_GermanyBogdan Doinea
 
"How to create usless software... and distribute it" (Alto university lecture...
"How to create usless software... and distribute it" (Alto university lecture..."How to create usless software... and distribute it" (Alto university lecture...
"How to create usless software... and distribute it" (Alto university lecture...Marcin Kokott
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Embracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and GoogleEmbracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and GoogleDaniel Bryant
 
Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...
Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...
Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...AVEVA Group plc
 
Track c 1015_slideshare_roerdenmckenziechadwickdias
Track c 1015_slideshare_roerdenmckenziechadwickdiasTrack c 1015_slideshare_roerdenmckenziechadwickdias
Track c 1015_slideshare_roerdenmckenziechadwickdiasBentleyDUC
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science OverviewDavide Mauri
 
Optimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint FarmOptimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint FarmChris McNulty
 

Similaire à Data Mining for Moderation of Social Data (20)

Big Data in Media
Big Data in MediaBig Data in Media
Big Data in Media
 
Social Listening – Gateway to Innovation
Social Listening – Gateway to InnovationSocial Listening – Gateway to Innovation
Social Listening – Gateway to Innovation
 
Content is King: Presentation to Cross Media Innocation Center at RIT
Content is King: Presentation to Cross Media Innocation Center at RITContent is King: Presentation to Cross Media Innocation Center at RIT
Content is King: Presentation to Cross Media Innocation Center at RIT
 
Scott Whitmire - Just What is Architecture Anyway
Scott Whitmire - Just What is Architecture AnywayScott Whitmire - Just What is Architecture Anyway
Scott Whitmire - Just What is Architecture Anyway
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and Hadoop
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Desktop Network Systems what we do
Desktop Network Systems what we doDesktop Network Systems what we do
Desktop Network Systems what we do
 
Npd 2 0 Product Camp
Npd 2 0 Product CampNpd 2 0 Product Camp
Npd 2 0 Product Camp
 
doolyk_rev_p_001.compressed
doolyk_rev_p_001.compresseddoolyk_rev_p_001.compressed
doolyk_rev_p_001.compressed
 
Advanced Persistent Threat - Evaluating Effective Responses
Advanced Persistent Threat - Evaluating Effective ResponsesAdvanced Persistent Threat - Evaluating Effective Responses
Advanced Persistent Threat - Evaluating Effective Responses
 
Technical_Update_Germany
Technical_Update_GermanyTechnical_Update_Germany
Technical_Update_Germany
 
"How to create usless software... and distribute it" (Alto university lecture...
"How to create usless software... and distribute it" (Alto university lecture..."How to create usless software... and distribute it" (Alto university lecture...
"How to create usless software... and distribute it" (Alto university lecture...
 
Developing Social Networks
Developing Social NetworksDeveloping Social Networks
Developing Social Networks
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Embracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and GoogleEmbracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and Google
 
Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...
Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...
Risking all you have, for what you can’t leave behind by Adam Cooke, Yancoal ...
 
Track c 1015_slideshare_roerdenmckenziechadwickdias
Track c 1015_slideshare_roerdenmckenziechadwickdiasTrack c 1015_slideshare_roerdenmckenziechadwickdias
Track c 1015_slideshare_roerdenmckenziechadwickdias
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
Optimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint FarmOptimizing and Accelerating your SharePoint Farm
Optimizing and Accelerating your SharePoint Farm
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 

Plus de Fernando G. Guerrero

Itinerarios de Grado de Ingenieria Informatica EPS Alicante
Itinerarios de Grado de Ingenieria Informatica EPS AlicanteItinerarios de Grado de Ingenieria Informatica EPS Alicante
Itinerarios de Grado de Ingenieria Informatica EPS AlicanteFernando G. Guerrero
 
New gTLDs between two rounds: trade mark challenges
 New gTLDs between two rounds: trade mark challenges New gTLDs between two rounds: trade mark challenges
New gTLDs between two rounds: trade mark challengesFernando G. Guerrero
 
Dealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NETDealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NETFernando G. Guerrero
 
Concurrency problems and locking techniques in SQL Server 2000 and VB.NET
Concurrency problems and locking techniques in SQL Server 2000 and VB.NETConcurrency problems and locking techniques in SQL Server 2000 and VB.NET
Concurrency problems and locking techniques in SQL Server 2000 and VB.NETFernando G. Guerrero
 
Achieve the Impossible: Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...
Achieve the Impossible:Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...Achieve the Impossible:Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...
Achieve the Impossible: Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...Fernando G. Guerrero
 
Dealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NETDealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NETFernando G. Guerrero
 
Datos Geométricos y Espaciales en SQL Server 2008
Datos Geométricos y Espaciales en SQL Server 2008Datos Geométricos y Espaciales en SQL Server 2008
Datos Geométricos y Espaciales en SQL Server 2008Fernando G. Guerrero
 
Microsoft Changed the Game Again and Gave New Wings to an Entire Industry
Microsoft Changed the Game Again and Gave New Wings to an Entire IndustryMicrosoft Changed the Game Again and Gave New Wings to an Entire Industry
Microsoft Changed the Game Again and Gave New Wings to an Entire IndustryFernando G. Guerrero
 
Designing Role-Based Database Systems to Achieve Unlimited Database Scalability
Designing Role-Based Database Systems to Achieve Unlimited Database ScalabilityDesigning Role-Based Database Systems to Achieve Unlimited Database Scalability
Designing Role-Based Database Systems to Achieve Unlimited Database ScalabilityFernando G. Guerrero
 
Solid q universidad empresa 2011 10 27
Solid q universidad empresa 2011 10 27Solid q universidad empresa 2011 10 27
Solid q universidad empresa 2011 10 27Fernando G. Guerrero
 

Plus de Fernando G. Guerrero (12)

Udf eficientes
Udf eficientesUdf eficientes
Udf eficientes
 
Itinerarios de Grado de Ingenieria Informatica EPS Alicante
Itinerarios de Grado de Ingenieria Informatica EPS AlicanteItinerarios de Grado de Ingenieria Informatica EPS Alicante
Itinerarios de Grado de Ingenieria Informatica EPS Alicante
 
New gTLDs between two rounds: trade mark challenges
 New gTLDs between two rounds: trade mark challenges New gTLDs between two rounds: trade mark challenges
New gTLDs between two rounds: trade mark challenges
 
Dealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NETDealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NET
 
Concurrency problems and locking techniques in SQL Server 2000 and VB.NET
Concurrency problems and locking techniques in SQL Server 2000 and VB.NETConcurrency problems and locking techniques in SQL Server 2000 and VB.NET
Concurrency problems and locking techniques in SQL Server 2000 and VB.NET
 
Vda305 concurrency guerrero
Vda305 concurrency guerreroVda305 concurrency guerrero
Vda305 concurrency guerrero
 
Achieve the Impossible: Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...
Achieve the Impossible:Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...Achieve the Impossible:Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...
Achieve the Impossible: Use INSTEAD OF triggers in SQL Server 2000 to Deal Tr...
 
Dealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NETDealing with SQL Security from ADO.NET
Dealing with SQL Security from ADO.NET
 
Datos Geométricos y Espaciales en SQL Server 2008
Datos Geométricos y Espaciales en SQL Server 2008Datos Geométricos y Espaciales en SQL Server 2008
Datos Geométricos y Espaciales en SQL Server 2008
 
Microsoft Changed the Game Again and Gave New Wings to an Entire Industry
Microsoft Changed the Game Again and Gave New Wings to an Entire IndustryMicrosoft Changed the Game Again and Gave New Wings to an Entire Industry
Microsoft Changed the Game Again and Gave New Wings to an Entire Industry
 
Designing Role-Based Database Systems to Achieve Unlimited Database Scalability
Designing Role-Based Database Systems to Achieve Unlimited Database ScalabilityDesigning Role-Based Database Systems to Achieve Unlimited Database Scalability
Designing Role-Based Database Systems to Achieve Unlimited Database Scalability
 
Solid q universidad empresa 2011 10 27
Solid q universidad empresa 2011 10 27Solid q universidad empresa 2011 10 27
Solid q universidad empresa 2011 10 27
 

Dernier

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Data Mining for Moderation of Social Data

  • 1. Data Mining for Moderation of Social Data Fernando G. Guerrero CEO SolidQ fguerrero@solidq.com
  • 2.
  • 4. Introductions • Fernando G. Guerrero • Global CEO of SolidQ • fguerrero@solidq.com • Microsoft Regional Director for Spain since 2004 • SQL Server MVP from year 2000 till 2007 • Usual suspect at many international conferences
  • 5. SolidQ 2012… 10th anniversary • 160 people in 23 countries: • Argentina, Australia, Austria, Bulgaria, Canada, Chile, Costa Rica, Croatia, Denmark, France, Germany, India, Israel, Italy, Mexico, Saudi Arabia, Serbia, Slovakia, Slovenia, Spain, Sweden, UK, USA • 50 current or former RDs or MVPs • Authors of many books, articles, and whitepapers • Research Collaboration with: • Universidad de Alicante • Universidad de les Illes Balears • Universidad de Santiago de Compostela • The European Union • The Spanish Ministry of Economy and Innovation
  • 6. Agenda • Social Data • Market Research • Sentiment Analysis, Text Mining • Moderation, Data Mining • SolidQ Research Lines in Social Data © 2012 SolidQ 6
  • 7. Social data is everywhere © 2012 SolidQ 7
  • 8. 8
  • 9. Social data is about everything Music © 2012 SolidQ 9
  • 10. Social is there • Is your organization promoting social about you? Products Services Stories © 2012 SolidQ 10
  • 11. Social is there, reputation • What is social saying about you? • Product • Services • Decisions • Image © 2012 SolidQ 11
  • 12. Market Research • What is social requesting you? • Future Services • Product updates • Can you ask questions to social? • Is this service going to succeed • How can I fixed the current problem • Is society ready for this law © 2012 SolidQ 12
  • 13. Sentiment Analysis, Text Mining The movie The movie The movie was fabulous! stars Mr. X was horrible! [ Sentimental ] [ Factual ] [ Sentimental ] © 2012 SolidQ 13
  • 15. What is Data Mining? • Inform actionable business decisions • Contrasts with “machine learning” © 2012 SolidQ 15
  • 16. Media Case Study • Millions of posts per year (different moderation scenarios) • About 25% are human moderated • About 10% of the moderated posts fail • No Business Intelligence applications for analysis or reporting © 2012 SolidQ 16
  • 17. Moderation, Data Mining • Contextual Information • Time • Location • User • At 10am comments are safer than at 2AM. • A user maybe safe talking about science bad dangerous talking about sports. • If a thread is hot (dangerous), comment maybe hot. • Combining context pattern the systems assign risk to posts without going into the text. © 2012 SolidQ 17
  • 18. Solution – Logical Model • Post Context (behavior analysis) • Patterns, data mining. • Post Content (text analysis) • Profanity, low score sentences, text mining, mood or tone (sentiment analysis) © 2012 SolidQ 18
  • 19. Typically Available Data on Posts • Historical and real time data for: • User (e.g. userid, email, nationalid) • Location (e.g. Life & Style  Fashion) • Time (e.g. 12 March 2011 18:56) • Content (e.g. text, link, picture, video). • Moderation result • Other attributes like geography, age, education could be used © 2012 SolidQ 19
  • 20. Post context, Patterns, Data Mining • User behavior. • Time behavior. • Location behavior. © 2012 Solid Quality Mentors 20
  • 21. Building useful attributes • 1.- Thread ( % Fails in a certain thread) • 2.- User (% Fails per User) • 3.- Diff Hour Forum Created (TimeDatePosted-TimeForumCreated) • 4.- User Forum (% Fails in a certain forum) • 5.- Diff Last for User (TimeDatePosted - TimeLastFailUser) • 6.- Hour of the day • 7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined) • 8.- User Thread (% Fails per User in a thread) • 9.- Diff Hour Thread Created (TimeDatePosted-TimeThreadCreated) • 10.- Day of Week • More than 100 attributes. © 2012 Solid Quality Mentors 21
  • 22. Hard Work • Periods. • Algorithms. • Algorithms' parameters. • Model refreshing. • Attribute analysis. • Outliers. • Overpopulating. • Behavior after this systems is in production. © 2012 Solid Quality Mentors 22
  • 23. Data Mining Algorithms • Decision Trees/Linear Regression • Sequence Analysis • Neural Networks/Logistic Regression • Clustering • Text Mining (Words and Phrases) © 2012 SolidQ 23
  • 24. Conclusion on Context • Risk based on context of the post • Time • User’s history • Publish location • Enables risk analysis for all type of content • Comments (in any language) • Links • Pictures • Videos © 2012 SolidQ 24
  • 25. Logical Model: Post content • Profanity Analysis • Text Mining The first minister and his secretary found sleeping together last night. They got drunk at a nearby pub. • Sentiment Analysis © 2012 SolidQ 25
  • 27. Moderation, Data Mining System © 2012 SolidQ 27
  • 29. Analysis and Reporting • Published through integrated web application • Moderation statistics. • Users statistics. • News and Stories Statistics. • Peaks. © 2012 SolidQ 29
  • 30. Conclusion: Benefits • Moderating half of the total posts, the solution captures 90% of failing posts. The remaining 10% seem to be likely safe posts. • Using Intelligent Moderation, media companies scan the whole universe of posts at a comparatively low cost. • At peak times, Intelligent Moderation works perfect. © 2012 SolidQ 30
  • 31. Football night in Europe • On January 25th, 2012: • Liverpool defeated Manchester City in the Carling Cup • Barcelona defeated Real Madrid in Copa del Rey • More than 100.000 comments arrived to the different BBC sites during 10 hours • All comments were filtered through our system • No problems observed during that time © 2011 SolidQ 31
  • 32. SolidQ Team in this project • Project Managers • Francisco Gonzalez, Javier Torrenteras, Alejandro Leguizamo • Developers • Itzik Ben-Gan, Enrique Puig, Ruben Pertusa, Carlos Martinez , Fernando G. Guerrero • Technical reviewers • Mark Tabladillo, Dejan Sarka • Social Media Specialist. • Jose Quinto, Rocio Díaz © 2012 SolidQ 32
  • 33. SolidQ Reseach • Incomplete Grammar Analysis • Human interaction with IT systems • Collaboration • Contextual analysis • Sentiment Analysis • Market Research • Reputation • Data Mining of context Social • Moderation • Market Research • Reputation © 2012 SolidQ 33
  • 35. THANK YOU! Fernando G. Guerrero Global CEO SolidQ fguerrero@solidq.com © 2012 SolidQ 35