Marc Hadfield is the CTO of Inform Technologies, a semantic technology company. Inform provides a semantic technology service called the Inform Service that uses natural language processing and graph algorithms to track subjects, entities, and users' interests in order to personalize content selection and ad targeting. The Interest Graph captures how content and users are related based on topics to improve relevance. Inform has seen a 30% boost in engagement for publisher customers using the Inform Service for in-article links and related content. Inform also operates the Yuku forums and is testing interest graph algorithms and personalization techniques on the forum content and traffic.
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Inform: Targeting the Interest Graph
1. Targeting the Interest Graph:
Personalization of content and ad selection
using the Inform Service
Marc Hadfield
CTO, Inform
Semantic Technology Conference, 2011
2. Introduction
Marc Hadfield is CTO of Inform Technologies.
Interests:
Natural Language Processing, Semantics, Life Science
Graph Algorithms, Machine Learning, Big Data
Inform Technologies is a semantic technology company.
Inform provides semantic technology – NLP and Analytics to
Publishers, and operates a user generated forum site
Yuku.com.
We at Inform have been evolving our technology to the user
generated content space. We’ve adapted our technology to
different kinds of content such as informal text, photos, videos,
and questions.
We’ve recently addressed Ad Selection, Video Selection, and
Personalization.
I’ll discuss some of our results with the Interest Graph.
2
3. Inform Service
Semantic Software-as-a-Service for Publishers
Advantage: ~30% boost in engagement in “traditional” publisher
websites.
Tracks 4,000+ Subjects and 320,000+ Entities: Inform Topics
Inform Service:
– In-Article links to Topics Pages
– Related Articles from the Archive
– Related Articles around the Web
– Related Photos
– Related Videos
– Topic Pages including mix of content sources
– Tools (Publishing Tools, etc.)
3
5. Yuku Forums
Forum Content
– “Old School” user generated content
– ~40,000 forums
– Top 100 forums account for about 50% of traffic
– ~1 Billion short form content pieces
– ~1 Million monthly unique users
– ~150K new content objects per day
– ~1 Million Page Views per Day
Subscription / Advertising Revenue
Inform adapting / integration our Semantic Tech
Great laboratory for testing algorithms / theories
– Apply more broadly than Yuku platform
Nice A/B testing environment
Testing new algorithms on our ForumFind search engine
– And embedded widgets in Yuku
Good reason to improve Ad Selection
5
6. Today: Personalization for Enhanced Targeting
• Capturing the Interest Graph
• Personalized experience
Help People find interesting content
Make Ads relevant
Occam
6
7. Inform Content & Analytics Platform
Licensed / 3rd Content / Data
Crawled Party / Ingestion
Content Activity
Data
Text Analysis
Algorithms
Core Engine
Occam
Categorization /
Personalization
Content Distribution
Publisher site Yuku Widgets
7
8. Inform “Occam” Architecture
Example Workflow:
• REST Webservice Call
Receive • Queue
Message
• Get URL
• Extract Document Features
Extract • Extract Text
• NLP Features (Machine Learning)
• Inference Engine (Prolog / Frame Logic)
NLP • Discourse / Behavior / Sentiment Models (Prolog / Frame Logic) (New)
• Trend Analysis (incremental data)
• Graph Analysis (incremental data)
Analysis
• Store in Semantic Repository (if needed)
• Send Reply Message (via Queue or Webservice)
Reply
8
9. Inform API
REST Based
Queue for high volume content exchange
Returns data in RDF, XML, or JSON
All Content has a URI
All Inform Topics have URIs (can be dereferenced)
Insert Content, Update Content, Delete Content
Login / Logout
Change Status of Content (Published, Unpublished)
Content can be “GET”
– Associated Topics (Subjects and Entities) returned
– Include scores
Search Inform Topics
Semantic Search
– Simplified queries (not full sparql)
– Typical Query: Get Content of Type “Article” about “Barack Obama”
ranked by score
9
11. AdContext™: IAB Ad Standards
IAB (Interactive Advertising Bureau) Standard to return a set of
metadata about a website, webpage, section of a webpage to
assist advertising within web content.
Defines how a Topic may be associated with web content.
Defines a set of standard upper level Topics such as “Science”,
“Sports”, and “Business”, and mid-level Topics such as “Golf” and
“Fashion”. These are tier-1 and tier-2.
Inform has aligned the IAB Topics with Inform’s Topics. Inform can
deliver more specific Topics (the full set of Inform Topics) as “tier-3”
IAB Topics.
The AdContext™ service returns this metadata. Ad Networks may
use the service to assist in ad selection.
Semantic Ad Selection may improve yield 2X – 5X (as per various
external studies).
11
12. Aside: rNews RDFa Standard
rNews: embedding metadata in online news
rNews is a proposed standard for using RDFa to annotate
news-specific metadata in HTML documents. The rNews
proposal has been developed by the IPTC, a consortium
of the world's major news agencies, news publishers and
news industry vendors. rNews is currently in draft form
and the IPTC welcomes feedback on how to improve the
standard in the rNews Forum.
http://dev.iptc.org/rNews
Why?
SEO, Rich Snippets, Reduce “scrapper” error, better metadata.
Inform API returns via the API rNews metadata ready to embed in
news articles (in testing).
12
13. Publisher Customer Example:
Inform automatically
tags entities (people,
places, companies,
and organizations)
and provides related
topics, articles, and
media
The Related
News Widget
pulls in the
most relevant
and recent
articles from
within the New
York Daily
News Archive
13
14. Customer Example:
Inform also
generates
highly
Inform’s tags engaging
can be brought and
together in relevant
numerous ways slideshows
to create a
richer
experience for
consumers
14
15. Demo Inform API w/Facebook
How to connect Inform to the social graph?
15
18. Demo Inform API w/Facebook
Inform Topics mapped to
Wikipedia Pages and thus
to other Concepts –
including the Facebook
“Like” Graph
18
19. Interest Graph
• Inform Topics • ~1 Billion content pieces
4,000+ Subjects in Hierarchy total
(SKOS) Forum Messages, Replies,
320,000+ Entities Photos, Videos
Wikipedia Pages
Wikipedia Categories
• 150K new content pieces
per day
Inform “same-as” links to
Wikipedia • 1 Million+ PageViews per
Day
• 1 Million+ Monthly Unique • ~5 Million ads serviced per
Day
Users
Goal: Link Users to Topics for selection of content and ads
19
20. Personalization Signals
• Content is “about” a Topic (subject or entity)
• User submits Content (“write”)
Message, Reply, Photo, Video, Question, …
• User reads Content (“view”)
Message, Reply, Photo, Video, Question
Trends / Global Aggregation:
• Importance Metric
• Bursty / Velocity
• Sentiment ( “:-)”, “LOL”, …)
“Like” the topic? “Dislike” the topic? Context?
– i.e. dislike a Football Team, so “likes” to hear when they lose (negative
sentiment)
• Other features…
20
21. Interest Graph Algorithms
Criteria:
• Near Real-Time
• Highly parallel to allow for scaling
• Fuzzy Data, Flexible data model
Implementation:
• General Graph Representation
Node Weights, Edge Weights, Node Types, Edge Types
• Graph walk to extract a User’s Interest Graph
• Parallel Message-Passing Algorithms for Graph Analysis
Importance, PageRank, Centrality
Spreading Activitation
Pregel-like implementation (Signal/Collect)
• Add Graph Analytics to Workflow 21
24. Niketalk User Interest Graph (global)
With global importance metric:
Recommendations can
be made reflecting the
shifting interests of the
global community.
24
29. Interest Graph – User Insights
• “Everybody Lies” (“House” TV Show)
– The only way to know the users interests is to have an implicit channel
to detect interests without impacting user behavior
• People have broad / dynamic interests
• People read “trash”
– i.e. everyone reads Celebrity Gossip
– If convenient / no one looking
• Global Data can be used to make recommendations
No surprise, but nice to have confirmation
• People move on
“Likes” need to expire
• Recommendations for content and ads can be
implemented in a highly dynamic and parallel fashion
running in real time with reasonable resources using
graph analysis
29
30. Interest Graph – Conclusion
• Using a User’s Graph of Interests can
dramatically improve the user’s engagement
Data still being gathered within Inform as to percentage
increase, but so far very encouraging numbers!
• The Inform Service can be used to implement a
more personalized content and ad experience
with minimal implementation effort.
• Talk to me about using our API!
30
Content / Activity Ingestion Diversity of content sources Data / activity ingestion Occam Big Data Processing and Scale Search, Storage, Archive Text analysis for categorization and organization Algorithms drive content discovery Intersection of content and activity data yields trends and personalization Content Distribution Dynamic content assignment and publishing Cross-platform publishing via apps and APIs Emphasis on integration with emerging data & content standards