This document discusses the linguistic challenges of accurately monitoring social media conversations. Some key challenges include regional variations in slang terms and nicknames, intentional misspellings, uses of "1337speak" and "LOLCATS" speak that tools can't parse, and generational differences in emoticons. It provides examples of each challenge and recommends addressing these issues by researching additional relevant keywords from sources like the Urban Dictionary, Google Insights, influencer blogs/tweets, and gender analysis tools to gain a more complete understanding of consumer conversations.
3. Executive Summary
Social media monitoring tools such as Radian6, Sysomos
and Scout have tremendous capabilities for pulling in highly
targeted conversations taking place around a topic, person
or brand from all social media platforms. These tools also
enable a corporation or researcher to find the influencers
around a topic.
While the targeting potential is astonishing, variations
in languages, slang, regional idioms, misspellings and
nicknames for topics and brands make accurate targeting
difficult. What’s more, the influencers around a brand or
topic are the most likely to use a nickname, slang term or
personal parlance known only to their social circle. This
makes understanding of these language variants critical.
Campbell-Ewald’s Social Media team has addressed these
linguistic challenges in their own client monitoring projects
over the past five years by:
• Determining current search trends around a topic.
This identifies not only how users are searching (which
indicates intent), but aids in the identification of
misspellings and relevant associated topics.
through the use of various tools, knowledge of
• Determining the age and gender of the writer
generational writing patterns and comparing regional
variations against reliable reference sources of slang.
• Identifying the influencers and recording their
linguistic patterns.
• Identifying emoticons and comparing them to known
regional and generational variants.
This paper will detail these challenges—which are largely
unknown to most users of these monitoring tools—in hopes
that their own monitoring will be more accurate and complete.
3
4. Background
Campbell-Ewald has been an active participant in social
History
media since early 2006. Their lead social media planners,
Dave Linabury and Jason Macemore were among the first to
develop social media monitoring tools, such as Fat Pipe and
Sentimentor. It was the development of these early tools —
created to meet their own needs as researchers — that led
to understanding the challenges raised in this paper.
Linabury and Macemore quickly discovered that monitoring
tools were not always able to spider all of the conversations
that were known to exist. At first they theorized that
conversations weren’t being pulled in because different
coding methods and naming conventions for Web site
sections made it difficult for the tools to parse data.
As new technologies made parsing data easier, the initial theory
proved to be an incorrect assessment. It was ascertained in
mid-2007 that linguistic variants were the cause.
Since the discovery of the linguistic variant sets, Campbell-
Successes
Ewald has become the nation’s leader in social media
monitoring. They have been tasked with monitoring data
for several United States government agencies, including
the United States Navy, the United States Mint, the United
States Naval Academy, the FBI, the Center for Disease
Control (CDC)and the Environmental Protection Agency
(EPA).
In addition to government clients and projects, Campbell-
Ewald’s social media team, under the leadership of
Linabury, also provides monitoring for dozens of Fortune
500 clients, while garnering numerous awards, such as a
Gold Echo Award, a Silver Effie, Best Military Site of 2009
and Best Social Media Strategy, among others.
4
5. Target Market
Based on usage trends, the target audience for social media
monitoring applications can be divided into two main
segments: internal and external.
Categorically, corporations and public relations firms
tend to use monitoring tools for internally-driven ends.
These typically include reputation management, crisis
management and as a clipping service to capture media
mentions.
Keyword strategies for these approaches are typically
limited to formal brand names, the CEO’s name, and
associated marketing terminology. They rarely take into
consideration linguistic variations, context or subtle
sentiment variations.
Inversely, advertising agencies, researchers and social
media agencies tend to retain an external focus to their
monitoring efforts, concentrating on sentiment analysis,
brand perception and marketing effectiveness and
awareness.
External monitoring tends to consider contextual
relevance far more than PR firms do, but most still lack the
incorporation of (or even existence of) language variants
that need to be considered for accurate and inclusive brand
monitoring in the social space.
5
6. Business Challenge
Nearly 40% of corporations are turning to social
The User Base Grows Annually
monitoring to keep abreast of what’s being said about
their brand. Many take this on internally, but most hire
outside social media companies or agencies. However,
virtually none of them are aware that they are not seeing
the entire conversation and blindly put faith in their chosen
monitoring tool that it will fulfill their needs and find all of
the relevant online discussions about their brand, product
or services.
This is not the case. The tools are limited by the
thoroughness of the tool’s operator, and how much
time is spent determining appropriate keywords. Most
administrators make the assumption that the terms they
use as marketing descriptors (e.g., marketing copy, search
terms, and PR copy) are enough.
Many monitoring tools are set up for the corporation by the
tool manufacturer. It is highly unlikely the tool creator could
understand a brand as well as the employees, agencies or
long-standing vendors of the corporation.
The reality is, the marketing descriptors are generally
one-sided, somewhat aspirational and rarely match
customer expectations and perceptions. Few companies
use keywords describing themselves as “cheap”, “average”,
Tagcloud about Chrysler
“acceptable”, “poor,” “pathetic”, “good enough”, etc.” however
from BrandTags.net
those are precisely the terms consumers use with respect
to brands. For proof of this, one need only see how brands
are described at BrandTags.net, where tens of thousands
of consumers have used those exact terms to describe
hundreds of corporations in ever-growing tag clouds of
user-generated terms.
6
7. Languages constantly evolve. They evolve nationally,
The Problem with Monitoring Language
regionally and hyper-locally. For example, a popular phrase
among teens nationally to describe something amazing
is “off the hook.” Regional variants, such as “off the chain”
[Detroit] and “off the heezy” [Brooklyn] exist as well. Hyper-
locally, a neighborhood may have yet another variant, shared
among friends, but not generally known outside that block.
This presents unique challenges to the researcher who is
using social media monitoring tools. If a phrase is known,
it will be used as a key search term for the tool to use. If,
however, more people are using lesser known regional
variants, the tool loses effectiveness.
There exist several linguistic phenomenon online that do
1337speak
not exist offline. One is the well-known variant known as
hacker speak or “1337speak” (Elite speak). This variation
goes back more than a decade online. It was developed
by computer hackers in an effort to make their messages
to each other difficult to read by outsiders. Words are
deconstructed to their visual elements and replaced with
alpha-numeric and punctuation equivalents that bear a
passing resemblance to the original letter form.
For example, a capital ‘T’ may be replaced with the number
7 or a + sign. The word ‘at’ will be replaced with the @ sign.
Capital ‘E’ becomes a 3 and so on. There is no sequencing
to the replacements; it is simply a matter of finding
letters, numbers and punctuation that can be substituted.
Indeed, cleverness is praised, and while online “1337speak
generators” exist which “translate” text back and forth
between English and 1337speak, each hacker has her own
style of writing and will make personal substitutions that
others may or may not choose to adopt.
Here is a sample sentence in English first, then 1337speak:
“Time Magazine’s reporter had no idea what we were after.”
“71M3 M464z1n3’5 |23p0|273|2 H4|} n0 1|}34 wH47 w3
w3|23 4f73|2.”
7
8. If hackers were discussing a new Intel processor in
1337speak, no monitoring tool would be able to pick up
that conversation as no complete English words exist in
hacker speak for the tool to pick up.
Sites such as General Mayhem, 4chan.org and
LOLCATS
ICANHASCHEEZBURGER are responsible for spreading
one of the more popular slang variants known as LOLCATS
(pronounced, “LAHL cats”). The meme originated as a series
of cute pictures of kittens doing things with the accompanying
text purported to be the voice of the cat. Cats, according to the
meme, have unique spellings of English, poor grammar, and
prefer the “Impact” font. Eventually kids began using LOLCATS
as an accepted form of writing in text messages, instant
messages, email and even speech.
Like 1337speak, LOLCATS speak can be difficult, if not
impossible for monitoring tools to parse as plain English.
Consider the sentence used for the 1337speak example in
English, then in LOLCATS:
“Time Magazine’s reporter had no idea what we were after.”
“TIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R
AFTERZ.”
Finally, Generation Y general do not spell correctly,
Intentional Misspellings
sometimes out of laziness, sometimes — like hackers —
to intentionally disguise their messages from authority
figures. This may not matter to a company monitoring the
conversations of senior citizens, but if the target audience is
the highly sought after 18-24 crowd, it is an issue that must
be understood. Here is a real example, found on MySpace,
from a 16 year-old girl to her friends:
“HAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA
WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND
SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969
BUT BECAREFUL HE DOSNT AFRAID OF ANYTHING”
8
9. Microblogging platforms like Twitter and Foursquare,
Generational Differences in Emoticons
which necessitate short messaging, seem almost devoid
of emoticons. It is our theory that hashtags—short linked
codes preceded by the pound sign (#)—take the place of
emoticons on microblogging as many hashtags are used
sarcastically, such as #whatever or #ilovemylife.
There are distinct differences between the types of
emoticons created by the different birth generations in the
United States. Notice that with each generation, the “faces”
become slightly more realistic.
• The so-called Silent Generation (1925-1945) are the
Silent Generation Wink:
least likely to use emoticons in speech other than the
;) •
most basic (smiles and frowns).
The Baby Boomers (1946-1963) use emoticons
sparingly, but nevertheless use more than just :-) and
:-( symbols. They will include others such as :-
(unsure), :-O (surprised) and ;-) (wink). Notice the
addition of a nose formed with the hyphen key.
Baby Boomer Wink:
• Generation Xers (1964-1980) use the most emoticons
;-) of the older three generations. They include unusual
emoticons, such as >-}}}}-(°> (dead fish) and :^p
(sticking out tongue), even emoticons meant sexually
such as (o) for breasts. Noses are often present, usually
with a carat ^ in place of a hyphen, although hyphens
are prevalent as well.
Generation X Wink:
• It is with Generation Y (1981-2000) that we see
;^) the greatest change in emoticons where the “faces”
move from sideways to forward facing, taken from
the Japanese kaomoji. Compare the symbol for wink
between Generations X and Y: ;^) and (0_-)
Generation Y Wink:
(O_-)
9
10. Gender Analysis may be unfamiliar to most, and many may
Gender Analysis
question even why it is necessary. The reason is simple.
Comments may arise where either the screen name of
the writer is ambiguous, or the writing style of a known
individual seems to drastically change suddenly. In the
latter case, there is the distinct possibility of profile fraud.
Some individuals may pretend to be the opposite gender
for various reasons: to pretend to be another person for
a prank, to assume the identity of another for fraudulent
reasons, to pretend to be the opposite gender for sexual
reasons, to pretend to be another for undercover work as in
vice-squad or detective work.
Solutions
Relying solely on internal industry and marketing keywords
will not suffice. It is crucial to take additional steps.
The following sources be used to determine additional
relevant keywords:
• The Urban Dictionary: http://urbandictionary.com
Continually updated, the Urban Dictionary is easily the
largest source of regional, national and international
slang on the Internet. Excellent for typing in industry
terms to see if variations exist, and regionally where
they are used.
Google Insights allow searches to go from global down
• Google Insights: http://google.com/insights/search/
to individual cities, with timeframes from the last 30
days as far back as 2004. They provide trends on rising
search patterns based on the root key term, maps
indicating geo-density, forecasts and news headlines,
plotted on trend lines.
10
11. • Google AdWords: http://adwords.google.com/
AdWords is a free tool from Google designed to assist
companies in making better choices when selecting
keywords for paid search buys. The tool can also be
used to help select better keywords for social media
monitoring. Keywords are shown by the latest search
patterns, with search quantities displayed.
• Influencers: Ask active and influential customers for
terms, nicknames, etc. If your company does not have
a personal relationship with its influencers, find and
read their blogs and tweets, paying close attention to
the responses from their audiences. Flag unusual words,
spellings and abbreviations.
• Gender Genie: http:// bookblog.net/gender/genie.php
A free tool that can identify the gender of the writer by
pasting text into a field and running the algorithm.
With these additional keywords, misspellings, slang,
nicknames and regional variants, the new keyword list
will not only yield more data, but will finally tell the whole
consumer story surrounding the brand.
Benefits
It is no longer an option to be naïve enough to actually
believe that no one is talking. All brands are being discussed
by someone. Only through the proper configuration of
professional-grade monitoring tools like Radian6—and
preferably under the guidance of a social media agency that
specializes in monitoring and analysis—can a company
expect to truly know what is being said about their brand.
Not knowing how your brand is being discussed and
described means that brand is not getting the entire
picture, as is the case with the reports from PR agencies
and most internal social media monitoring.
By applying these techniques and using these additional
tools, a brand can be certain of seeing the full picture and
glean far more learnings from their customer base.
11
12. Case Study: Chevrolet Cobalt
The phenomenon of linguistic variants was first noticed and
described by Linabury and Macemore in 2007 to General
Motors while they were monitoring conversations pertaining
to the Chevrolet Cobalt—a small car that young males were
customizing—along with Honda Accords—into street rods
(known regionally as Rice Rods, Rice Burners, Rice Rockets,
etc.). The assignment was to find out what these young
men were saying about the Cobalt as they were deemed by
Chevrolet to be influencers to non-Chevrolet owners.
Campbell-Ewald’s monitoring was confined geographically
to the Great Lakes states. During the course of the
monitoring, Macemore noticed that some of the Chicago
and Ohio conversations in forums were referring to the
Cobalt as a “Balt”. Linabury noticed that conversations
on the West side of Michigan referred to it as a “C-Car” or
“C-Balt”. C-Car was the internal name of the vehicle used by
engineers, but in Michigan (where the car is produced), it is
possible that engineering names are known externally.
Macemore then theorized that these terms were surfac-
ing enough that they should be added to the keywords the
monitoring tool was using to spider conversations. After
adding the new terms, the number of conversations found
by the tool increased by 53%. This led to speculation that
the influential members of a social circle may be more
that circle, and that these names needed to be identified at
likely to have internal nicknames than those outside
the outset of any social media monitoring assignment to en-
sure accurate monitoring and the largest possible data set.
Result: By adding the additional terms that were manually
identified, the conversational data set increased by more than
50% and the client gained insight and learnings into how their
vehicles were referred to by the most influential purchasers of
their product.
12
13. Case Study: OnStar™
OnStar™ is a multimillion dollar company that produces a
telematics system for vehicles. As the system is responsible
for saving the lives of hundreds of people involved in
motor vehicle accidents, OnStar™’s corporate marketing
team wanted up to the minute reports on what their
subscribers were saying, their detractors, and the media. In
2007, OnStar™ hired Campbell-Ewald’s Social Media Team
to monitor conversations and report back with weekly
findings, and daily with any outstanding conversations or
topics.
Campbell-Ewald’s Social Media Team quickly discovered
there would be a few barriers to accurate monitoring. For
example, people discussing certain television shows were
appearing in the feed. Sentences like, “Did you see what
happened on Star Search last night?” or “There was one
episode on Star Trek where…” These false positives were
quickly weeded out through exclusionary phrases added to
the keyword set.
The team also discovered linguistic variants of OnStar™
appearing in the conversations of loyal fans and influencers,
which included several hackers. Some hackers were tweaking
OnStar™ at home (similar to the jail-breaking of iPhones) for
fun. We found that they used numerous variants of OnStar™
including: On*, On Star, On_Star, NOnStar, ON.Star, OffStar,
On-Star, OnsStar and BlondeStar (in reference to a YouTube
parody of OnStar™).
Result: By adding the additional terms that were manually
identified, the conversational data set increased by more
than 109% and the client gained insight and learnings into
how OnStar was being referred to by the most influential
purchasers of their product and by an unexpected fan base:
hackers.
13
14. Technical Specs
Assigning new keywords to any social
monitoring tool is simple. Finding the
keywords is the challenge. The following
demonstration shows how to add new
keywords to an existing set using the
popular social media monitoring tool,
Radian6.
Radian6
In this example, the new Dell Mini 3
cellphone has been chosen as a topic to
monitor. Narrowing the feed to cell phones
and removing “noise” about Dell laptops
makes the results more accurate.
By adding the keyword ‘cellphone’ and the
exclusionary keyword ‘laptop’, the feed
examples are more targeted.
Radian6
A search on Google Insights for ‘Dell Mini 3’
shows us that consumers are also searching
for it as a ‘cellular dell’, ‘dell android’, ‘dell
android phone’, ‘dell smartphone’, and ‘dell
mini 5’ (a different model).
A look at the Urban Dictionary indicates any
cellphone may be referred to as a “cellie” by
youth.
Google Insights
These additional keywords (except perhaps
the Mini 5) should be added to Radian6’s
keywords as they represent the intent
of users. That these keywords are listed
by Google as “Breakouts” is significant;
breakouts represent a recent increase in
search volume of more than 5,000%.
Urban Dictionary
14
15. Summary
Campbell-Ewald has been an active participant in
social media since early 2006. Their lead social media
researchers, Dave Linabury and Jason Macemore were
among the first to develop social media monitoring
software tools. It was through the development of these
early tools that were created to meet their own needs
as researchers that led to understanding the linguistic
challenges raised in this paper.
Campbell-Ewald’s Social Media team addressed these
linguistic challenges in their own client monitoring projects
over the past five years utilizing the following approaches:
• Determining current search trends around a topic
• Determining the age and gender of the writer
• Identifying the Influencers and recording their linguistic
patterns
• Identifying emoticons and comparing them to known
regional and generational variants
It is critical in monitoring to understand that internal
marketing descriptors and paid search terms are not
enough to effectively crawl all of the conversations taking
place around a brand. Nor is it enough to rely on basic
tools like Google Alerts. Accurate monitoring is done with
professional grade tools like Radian6, under the guidance of
experienced monitoring teams, like those at Campbell-Ewald.
The monitor must use the Urban Dictionary to determine
any industry or brand slang, check Google AdWords for
misspellings and current search trends and check Google
Insights for regional interest. Finally, the researcher must
either directly contact influential fans of the brand or
failing that, spend time reading blog posts by influencers
and responses to their content from their audience.
Only then can a keyword set be considered accurate and
comprehensive.
15
16. Contact
Dave Linabury, Group Director, Social Media
Dave.Linabury@c-e.com
Jason Macemore, Digital Strategist
Jason.Macemore@c-e.com
Gary Olson, Senior Social Media Planner
Gary.Olson@c-e.com
30400 Van Dyke Ave.
Campbell-Ewald
Warren, Michigan 48093
+1 (586) 574-3400
http://c-e.com