With the growing volume of user-generated content, Web sites need to create a content moderation strategy that is scalable, effective and cost-efficient, while providing an enriched, socially-enabled user experience.
Ensuring Technical Readiness For Copilot in Microsoft 365
De-Risk User-Generated Content Moderation
1. • Cognizant 20-20 Insights
How to De-Risk the Creation and
Moderation of User-Generated Content
Executive Summary adopt and leverage scalable, effective and
cost-efficient options to continue providing an
Since the inception of the Internet, Web sites have
enriched, socially-enabled user experience.
enabled easy creation and distribution of user-
generated content (UGC) to global audiences. The This white paper discusses the growth of user-
tremendous growth of UGC, following the advent generated content, the challenges of effectively
of Web 2.0, has highlighted the need for Web sites moderating UGC content and how to think through
that can more proactively alert organizations to approaching these challenges to provide the best
the existence of mal-content (i.e., bad content). long-term UGC moderation solution.
How a Web site moderates its UGC is an essential
Growth of User-Generated Content
part of its online brand identity. While some Web
sites allow an open sharing atmosphere where 4.1 million minutes of video are uploaded to
almost anything goes, others ensure that UGC YouTube everyday … six billion images per
not only meets the highest standards but also month are uploaded to Facebook … 40%
reflects positively on the brand. The persona of a of images and 80% of videos [created] are
Web site is reflected in its approach to and consis- inappropriate for business.” 1
tency with determining which UGC is acceptable
Broadly speaking, UGC is any material piece of
vs. not acceptable. To generate traffic, some Web
content that a user creates or leverages from
sites may be very minimally moderated, often
existing sources and uploads or shares on a Web
at the expense of overall public perception. Web
site for others to view (see Figure 1). UGC comes in
sites with established, strong brands known to be
many different forms, including short-text content
family- and age-appropriate often require a more
such as tweets and forum comments; long-text
sophisticated approach toward UGC moderation.
posts on blogs and profiles; and multimedia
While the explosion in UGC poses numerous material such as images, audio, video and appli-
opportunities, it is not without risk. With the cations (see Figure 2). Such content often further
ability of individuals, groups and machines to manifests itself as targeted or non-targeted online
publish inappropriate, irrelevant or copyright- display ads, search engine results and archived
infringing materials, online companies must Web content, as well as tags, posts or hyperlinks
ensure that this content does not reflect poorly on various Web sites.
on the brand, negatively impact loyal customers
The online industry has adopted numerous
or damage the bottom line. To mitigate UGC
business models that create, capture and deliver
risks, content moderation must continuously
cognizant 20-20 insights | january 2012
2. Typical Ways Users Generate UGC
Mobile From Existing
Computers Cameras Web Cams
Devices
Mobile Content
From Existing
Computers Cameras Web Cams
Devices Content
Figure 1
Types of UGC
Short Text
Short Text Long TextText
Long Multimedia
Multimedia Other Other
• Posts
• Posts • Blogs
• Blogs • Video Video
• • User contact contact
• User
• Tweets
• Tweets • Wiki Wiki
• • Audio/podcasts
• Audio/podcasts and profile profile
and
• SMS/text messages
• SMS/text messages • Discussion forums/
• Discussion forums/ • ImagesImages
• • Location-based
• Location-based
• Comments/ Q&A &A • Flash-enabled check-inscheck-ins
• Comments/ Q • Flash-enabled
feedback/likes • Product/service content • Game content content
• Game
feedback/likes • Product/service content
• Chat rooms reviews • Opinion polls
• Chat rooms reviews • Opinion polls
Figure 2
business value. In recent years, the social offering subscription access to user-generated
phenomenon has become embedded within the business profiles, user-provided data and the
DNA of many online-only companies. Web sites like. The list of companies building communities
such as YouTube and Facebook rely on UGC (and around UGC to buoy their bottom line is large
the resulting visitor traffic) to and growing, much like the number of users who
attract and sell advertisements. create and consume UGC.
The list of Mobile and gaming companies
companies building provide platforms to sell third- The monetary benefits of UGC are obvious:
Content stickiness builds customer loyalty,
communities around party-created applications and sustained platform traffic that achieves critical
virtual elements and collect a
UGC to buoy their transaction fee for each sale. mass, and top-line growth for companies that are
bottom line is large able to monetize subscription services and page
Other companies, such as views via advertising or affiliate relationships.
and growing, much professional networking Web Non-monetary incentives include status-building
like the number of sites, sell access to user- with those who like and/or follow site content,
users who create created profile information. network and relationship-building with viewers
Various online retail and media and affiliated sites, and content sharing/commu-
and consume UGC. companies thrive on user nication with viewers’ colleagues and friends.
opinion, reviews and feedback
to enable social and viral marketing/selling, as well Users may also have the opportunity for financial
as an enhancement for helping business partners incentives by creating UGC for Web sites that
sell their own product and service offerings. leverage crowdsourcing as a specific approach
to content generation, which is the sourcing of
Some Web sites have been able to monetize UGC tasks (in this case, the generation of content) to
content beyond just selling advertising, such as a group or community of people. Business models
cognizant 20-20 insights 2
3. Projected Annual Growth of UGC (2011 – 2013)
Source: Cognizant research
Figure 3
and new technologies such as mobile devices and UGC and sustain viable social network interac-
cloud computing have made UGC creation and tions. Inappropriate content includes, but is not
publishing more convenient, thereby amplifying limited to, profanity, sedition, violence, bare skin,
UGC growth. Overall, UGC creation and distribu- false and outdated information, spam and other
tion have grown astronomically (see Figure 3). inappropriate content.
Need for Content Moderation Three main methods exist for Web sites to
It is crucial to ensure that only appropriate moderate content (see Figure 4):
UGC is posted on the Web site by screening and • Automated moderation, using computer appli-
filtering for mal-content. Failing to do so could cations and algorithms.
severely impact user traffic, company brand and
the bottom line. The huge growth and pervasive- • Community moderation, leveraging the online
community to self-moderate content (such as
ness of UGC within companies’ core online user
flagging or volunteer administration).
experience poses potentially complex challenges
and heightens unnecessary exposure to risk. • Human moderation, whether by a dedicated
staff or crowdsourced.
Strict policies must be set and applied to govern
content authenticity, originality, privacy, political/ Additionally, within each method, there are various
social correctness and legalities, both locally and ways to determine whether content is unaccept-
globally. Such policies should permit and promote able. In many cases — in addition to validating
Thinking Through Content Moderation
Figure 4
cognizant 20-20 insights 3
4. copyright or determining inappropriateness of may not be viable for all kinds of UGC; thus, a
UGC — content may also need to be moderated diligent analysis of the quality and cost tradeoffs
for quality, structure and relevance. must be performed. The dimension of moderation
time further complicates the analysis, as specific
Lapses in content regulation UGC-like tweets and blog posts require real-time
While real-time can result in costly lawsuits publishing to align with user expectations.
moderation appears from either original content
ideal, the associated rights-holders or offended When to moderate is not the only question. How
Web site visitors in countries to implement moderation that is scalable and
costs may not be where these laws apply. Such cost-efficient is just as important. For instance,
viable for all kinds lapses can additionally result should you be reactive or proactive (e.g., should
of UGC; thus, a in Web site traffic reduction, you conduct post-moderation for all UGC or just
loss of advertisers and sub- for the content reported on or flagged by users)?
diligent analysis of scribers, as well as a poor
the quality and cost user/buying experience that Inefficient Moderation Techniques
tradeoffs must be may severely impact future A large percentage of content moderation costs
earnings. In 2007, YouTube can be attributed to process inefficiencies.
performed. was sued for nearly $1 billion Choosing the appropriate moderation technique is
by Viacom for publishing also critical. Effective methods include automated
copyrighted material.2 Facebook has often been algorithms, such as Bayesian filtering and pattern
criticized for publishing posts and providing a detection of blacklisted words and phrases, color
platform to user groups that are politically or tone and user/location profiling. However, most
culturally sensitive.3 automated techniques do not moderate every
piece of content but only samples of it; this can
Challenges in Content Moderation lead to mal-content leakage.
There are many challenges to determining not
only the optimal content moderation strategy Consider scenarios where too few image samples
that corresponds to the Web site’s identity, brand of a video piece are taken or when the script is
and visitors but also how to put it into effect. in English but is merely a translation of another
language. In these cases, automated moderation
Cost, Time, and Quality Tradeoff is insufficient, and either human or community
Organizations must be diligent when choosing the moderation is also required. This is easier said
right mix of real-time moderation, pre-moderation than done.
and post-moderation of UGC on their Web sites,
Human moderation, although effective, can be
as well as whether or how to apply a combina-
highly inefficient if one has to continue moderating
tion of machine-automated and community- and/
the same UGC in different formats or if multiple
or human-moderated approaches. While real-time
moderators must continually track previously
moderation appears ideal, the associated costs
Cost Estimates by Content Type
Estimated
Estimated Approximate Machine Approximate Manual
Content Moderation
Average Size Moderation Cost Moderation Cost
Type Time
(per piece) (per 1,000 pieces) (per 1,000 pieces)
(per piece)
Video 6 min (100 MB) 1.7 min $2.61 $277
Audio 6 min (5 MB) 1.4 min $0.13 $230
Images 500 KB 0.4 sec $0.013 $0.70
Text 200 words (200 KB) 1 min $0.005 $167
Source: Cognizant research
Figure 5
cognizant 20-20 insights 4
5. moderated UGC. The biggest challenge with result in expanding the amount of UGC content
human moderation, however, is the lack of scal- created, thus increasing the demand for content
ability, which is an issue given UGC’s proliferation moderation even further.
and the high cost of sustaining such operations.
While the challenges involved The biggest
Figure 5 estimates the average cost of moderation, in moderating UGC are mul-
assuming a modest rate for a human moderator. tifaceted and complex to
challenge with
Depending on the moderation rules and policies navigate, it is vital to choose human moderation,
that need to be applied, the cost of certain types the right combination of however, is the lack
of moderation may be significantly higher. moderation techniques.
These combinations will be
of scalability, which
Holistic Moderation governed not just by the is an issue given
Organizations must moderate not only the dimension of accuracy but by UGC’s proliferation
content but also the users, as mal-content is often the total cost of operations,
the result of user ignorance or lack of awareness. as well as moderating time
and the high cost
Much mitigation can be achieved through user- required. of sustaining such
friendly and upfront communication of policies
The Right Content
operations.
and guidelines. Some Web sites allow the pos-
sibility of self-moderation through tagging, Moderation Approach
filtering and warnings. Ideally, the more upfront Content moderation has grown into a discipline
moderation with simple computerized checking that requires expertise in pattern detection and
and labeling, the less downstream volume and analysis. Although there are numerous software-
impact to content moderation processes after based solutions in the market, they do not address
submission. the custom needs of particular businesses. With
the right level of investment in moderating
Localization Challenges
content, Web site operators can create an optimal
Web sites may often need to serve global strategy that maximizes customer satisfac-
users across countries and locales. In order to tion while minimizing abuse and impact on the
determine the appropriateness of text-based company’s brand and reputation (see Figure 6).
UGC, moderators must be able to understand the
language in which UGC is written, as well as the With the increasing growth of UGC, as well as the
content’s localized context and intent. Further- technology, cost and scale needed to moderate it,
more, perception of the content’s inappropriate- achieving an optimal long-term solution requires
ness may change based on the acceptable norms detailed strategic planning and execution. A
of the locale in which the Web site and the owner variety of options currently exist to help Web
reside. Content that is deemed appropriate in the site operators protect and
U.S. may be perceived as highly inappropriate in optimize their investments
areas within Europe, Asia and the Middle East. In and reduce increasing Ideally, the more
certain parts of the Middle East, for instance, any moderation costs. Many orga- upfront moderation
degree of skin revealed on a woman is unaccept- nizations have outsourced with simple
able. In certain Asian countries, the color red may their content moderation
be perceived as inappropriate. Acceptable speech operations to reduce costs computerized
in different countries may also vary, especially and enable more scalable checking and labeling,
pertaining to elements of politics or religion. and predictable business the less downstream
outcomes. Others have
Likewise, perception may even vary in demo- implemented both custom volume and impact to
graphic groups within a single geography. In a and standardized technology content moderation
growing number of cases, the requirements for options to replace existing processes after
content moderation are starting to become more technology or to cut devel-
“hyper-localized” to focus on UGC from specific opment and maintenance submission.
regions or populations, thereby mandating the expenses.
need to create separate rules and guidelines for
each locale or user group. The personalization Some companies have also experimented with
and relevance of more hyper-localized Web sites content moderation crowdsourcing solutions
and content will not only increase the demand for to replace human moderation, with mixed
hyper-localized content moderation but will also success. While seemingly leveraging an unlimited
cognizant 20-20 insights 5
6. Content Moderation Decision Framework
Max
Number of Desired
reactive abuses Strategy
Customer satisfaction
Cost of
cleaning UGC
Min
Figure 6 Content Moderation Strategy
number of resources at minimal cost may seem management. However, finding the right content
ideal, crowdsourcing to date still produces poor moderation solution could be a difficult endeavor
moderation quality. Other Web sites have gone without solid strategic advice and a well-thought-
purely with community moderation to reduce out approach, leveraging industry best practices
costs, but this also may produce mixed results, as customized for specific needs, as well as a clear
the moderation of UGC can become overwhelm- understanding of the objectives and ecosystem of
ing even for members. your Web site.
Thus, many solutions are available for content
moderation and evaluation, implementation and
Footnotes
1
Quotes from Digitalrecognition.net and various sources, Nov. 15, 2011.
2
Anne Broache and Greg Sandoval, “Viacom Sues Google over YouTube Clips,” C-NET, March 13, 2007.
http://news.cnet.com/Viacom-sues-Google-over-YouTube-clips/2100-1030_3-6166668.
html?tag=mncol;txt
3
“Criticism of Facebook,” Wikipedia, Nov. 1, 2011. http://en.wikipedia.org/wiki/Criticism_of_Facebook
cognizant 20-20 insights 6