SlideShare une entreprise Scribd logo
1  sur  64
Télécharger pour lire hors ligne
How Netflix Reverse
Engineered Hollywood
+ The Antlantic
- Alexis C. Madrigal
/ 맹욱재
x 2014 winter
http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
http://thefoxisblack.com/blogimages//netflix-redesign.jpg
edited from http://thefoxisblack.com/blogimages//netflix-redesign.jpg
악마 같은 아이가 나오는 컬트 공포 영화
(Cult Evil Kid Horror Movies)
유럽 배경의 60년대 영국 공상과학/판타지물
(British set in Europe Sci-Fi & Fantasy from the 1960s)
비평가들에게 호평받은 감동적 패배자 영화
(Critically-acclaimed Emotional Underdog Movies)
Some of them just seem so specific that it's absurd.
http://digxtal.com/insight/20140107/76897-altgenres/
If Netflix can show such tiny slices of cinema
to any given user, and they have 40 million users,
how vast did their set of "personalized genres"
need to be to describe the entire Hollywood universe?
http://www.huffingtonpost.com/2013/03/07/netflix-categories_n_2813921.html
http://www.huffingtonpost.com/2013/03/07/netflix-categories_n_2813921.html
Asking my followers to submit the categories that showed up for
them on Netflix to a shared document.
"To my knowledge, no such list exists, but obviously one should,"
I wrote. "And then we can see what Netflix is really doing to us."
.
.
.

150 genre
https://docs.google.com/spreadsheet/ccc?key=0AlC_pAJFqGnHdDdhUWUzSV9mU2RmUW11Q0Rub0RrWHc&usp=drive_web#gid=0
Sarah Pavis(a writer and engineer) pointed out
Netflix's genre URLs were sequentially numbered.
One could pull up more and more genres
by simply changing the number at the end of the web address.
http://movies.netflix.com/WiAltGenre?agid=1 linked to
"African-American Crime Documentaries" http://movies.netflix.
com/WiAltGenre?agid=2 linked to
"Scary Cult Movies from the 1980s." And so on.
1000: Movies directed by Otto Preminger.
3000: Dramas Starring Sylvester Stallone.
5000! Critically-Acclaimed Crime Movies from the 1940s.
20000! Mother-Son Movies from the 1970s.
There were a lot of blanks in the data, but the entries extended into
the 90,000s.
This database probing revealed three things:
1) Netflix had an absurdly large number of genres, an order of
magnitude or two more than I had thought
2) it was organized in a way that I didn't understand
3) there was no way I could go through all those genres by hand.
there was a way to scrape all this data
expensive piece of software called UBot Studio that lets you easily
write scripts for automating things on the web.
http://www.ubotstudio.com/index7
http://www.ubotstudio.com/index7
the bot got up and running and simply copied and pasted
from URL after URL,
essentially replicating a human doing the work.
It took nearly a day of constantly running a little Asus laptop
in the corner of our kitchen to grab it all.
Emotional Independent Sports Movies
Spy Action & Adventure from the 1930s
Cult Evil Kid Horror Movies
Cult Sports Movies
Sentimental set in Europe Dramas from the 1970s
Visually-striking Foreign Nostalgic Dramas
Japanese Sports Movies
Gritty Discovery Channel Reality TV
Romantic Chinese Crime Movies
Mind-bending Cult Horror Movies from the 1980s
Dark Suspenseful Sci-Fi Horror Movies
Gritty Suspenseful Revenge Westerns
Violent Suspenseful Action & Adventure from the 1980s
Time Travel Movies starring William Hartnell
Romantic Indian Crime Dramas
Evil Kid Horror Movies
Visually-striking Goofy Action & Adventure
British set in Europe Sci-Fi & Fantasy from the 1960s
Dark Suspenseful Gangster Dramas
Critically-acclaimed Emotional Underdog Movies
Not every genre had streaming movies attached to it.
The reason for that is the genres that I was looking at
represented the total possible universe of different genres,
not just the ones that people were being shown on that
particular day in this particular geography (the United States).
Category 91,300,"Feel-good Romantic Spanish-Language TV Shows"
doesn't show me anything I can stream.
But Category 91,307, "Visually Striking Latin American Comedies"
has two movies
Category 6,307, "Visually Striking Romantic Dramas" has 20.
The existence of a genre in the database doesn't precisely
correspond to the number of movies that Netflix has in its
vaults.
All the genre's existence means is that there are some movies
out there that fit the description.
Patterns in the data: Netflix had a defined vocabulary.
The same adjectives appeared over and over.
Countries of origin
a larger-than-expected number of noun descriptions like
Westerns and Slashers.
Ways of saying where the idea for the movie came from
("Based on Real Life" "Based on Classic Literature")
where the movies were set ("Set in Edwardian Era").
various time periods — from the 1980s
references to children ("For Ages 8 to 10").
Netflix grammar - how it pieced together the words to form comprehensible genres
If a movie was both romantic and Oscar-winning,
Oscar-winning always went to the left: Oscar-winning Romantic Dramas.
Time periods always went at the end of the genre:
Oscar-winning Romantic Dramas from the 1950s.
The single-word adjectives (such as romantic) could basically just pile up,
though, at least to a point: Oscar-winning Romantic Forbidden-Love Movies.
the Content-area categories were generally tacked onto the end: Oscarwinning Romantic Movies about Marriage.
hierarchy for each category of descriptor

Region + Adjectives + Noun Genre + Based On... +
Set In... + From the... + About... + For Age X to Y
wildcards(exception) like everyone's favorite,
"With a Strong Female Lead", "For Hopeless Romantics."
starring or directed by certain individuals
76,897 genres that my bot eventually returned,
were formed from these basic components.
the atoms and logic that were used to create them were
comprehensible.
Ian Bogost suggested building the generator
http://www.antlab.sci.waseda.ac.jp/software/antconc335/AntConc_readme.pdf
generally used by linguists, digital humanities scholars, and
librarians
for dealing with corpuses, large amounts of text.
Google's Ngram tool

https://books.google.com/ngrams/
AntConc turn a bunch of text into data that can be manipulated.
count the number of times each word appears in the mass of text
that forms Netflix's database
list of the top 10 ways that Netflix likes to describe movies in their
personalized genres.
count the appearance of all 3-word phrases that begin with "from"
By searching for phrases beginning with "Set in"
By searching for phrases beginning with "For,"
a list of the age-specific genre descriptions.
Netflix has content "for kids" generally,
as well as for ages 0 to 2, 0 to 4, 2 to 4, 5 to 7, 8 to 10, 8 to 12,
and 11 to 12.
.
.
.

https://docs.google.com/spreadsheet/ccc?key=0AlC_pAJFqGnHdGxFNGlLdlVpcmc0OTBOeWNiamROMVE&usp=sharing#gid=0
Separately, calculated the top actors, directors, and creators
From spreadsheets, created several different grammars.
GONZO setting in the generator
The first and easiest method:
just lets lots of adjectives pile up and throws all the different
descriptors into the mix very often.
● Deep Sea Father-and-Son Period Pieces Based on Real Life Set in the
Middle East For Kids
● Assassination Bounty-Hunter Secret Society Dramas Based on Books Set
in Europe About Fame For Ages 8 to 10
● Post-Apocalyptic Comedies About Friendship
Scaled back the fun stuff, allowing only a few adjectives into the
titles.
Became like movie-production logic of the Hollywood studios.
Basically: endless recombination of the same few themes.
● Classic Action Movies
● Family-Friendly Westerns
● Buddy Period Pieces

Finally, played and played around with different grammatical
structures until started to see Netflix's trademark level of
specificity

● Raunchy Absurd Slashers
● Fight-the-System Political Love Triangle Mysteries
● Chilling Action Movies About Royalty
As worked on the generator, could tell someone had gone down this road before.

A single human brain had had to make the decisions that we had.
How many adjectives? How long should they be?
And even more basic: what should the adjectives be?
Why cerebral and not brainy?
Why differentiate between gory and violent?

As a writer, I kept asking myself: why are the adjectives just right?
Mind-bending, Twisty Tale, Mad Scientist, Underdog, Feel-Good,
Understated.
Cadre of film buffs helps Netflix viewers sort through the clutter
Is it 'suspenseful,' 'cerebral,' or a 'sentimental dysfunctional-family drama'? Netflix's taggers will
find out to better personalize recommendations

After Greg Harty rolls out of bed, he grabs a cup of coffee and starts his work day at a desk in the
corner of his living room. His assignment: Watch three episodes of "Modern Family."
As the hit sitcom plays, the aspiring screenwriter opens another window on his laptop and pulls up a
spreadsheet. He begins picking labels — his employer, Netflix, calls them tags — to describe what he
sees.

The comedy: "quirky."
The humor: "light dark."
The tone: "humorous," "irreverent" and "heartfelt."
Ty Burrell's Phil: "silly," "childish," a "lovable dumbass."
Julie Bowen's Claire: "controlling," "assertive."
Ed O'Neill's Jay: "surly," "alpha dog."
Later that morning, Harty uploads the spreadsheet to Netflix's computers at its Silicon Valley
headquarters; then he starts watching the movie "50/50."
"It's a perfect job because I don't go to an office. I work on my own time, and I get paid to watch
movies, which makes me a better writer," said Harty, 33
Harty's choices will become part of an algorithm that produces the personal recommendations made
to about 30 million viewers worldwide when they sign into Netflix. Some of the descriptions are seen
by subscribers, while others are used internally.
If you like "Twin Peaks," the algorithm says you might also enjoy "Quincy M.E." and "Law & Order:
Criminal Intent" because Harty and others on a team of freelancers have tagged each of them as
"cerebral," "suspenseful," "TV mysteries."
about 40 independent contractors, a group that includes a travel writer in Hawaii, a stay-at-home
mother in Illinois, an independent filmmaker in Mexico City, and several aspiring screenwriters in
Los Angeles.

The taggers were hired for their love of entertainment and their ability to evaluate it quickly. Many are
film school graduates who once worked in Hollywood — or dream of doing so.

They're the ones who pick from more than 1,000 tags to describe thousands of movies and television
series offered by Netflix to viewers in the U.S., Canada, Latin America, Great Britain and Ireland.
Every movie is watched by a single person, as are at least three episodes of every TV series.
Taggers are paid several hundred dollars per week — pocket change for a company that generated
$1.76 billion of revenue in the first six months of this year — to watch between 10 to 20 hours of
content.
This system, dependent upon taggers' cultural sensibilities and judgment, stands out among
companies that rent or sell products online. Amazon, for instance, bases its recommendations entirely
on computer analyses of comparable purchases.
Without the characterizations, viewers face an overwhelming number of choices.
Although Netflix doesn’t disclose the number of its films and TV shows available to stream,
the website InstantWatcher estimates the inventory is in excess of 14,000 titles. There are also
thousands of titles on DVD.

Netflix's tagging system is used on all of the company's content whether it's distributed in the U.S. or
to its overseas markets. The format of that content may be different however. "Modern Family," for
instance, will soon be available to stream in the United Kingdom and Ireland, but is only on DVD in
the U.S.
The company has found that 75% of its subscribers decide what to watch based on its
recommendations.
"People consume more hours of video and stick with the service longer when we use these tags," said
Todd Yellin, vice president of product innovation who oversees Netflix's recommendation engine.
When Yellin came to Netflix six years ago, computers alone were making recommendations based on
what the subscriber had previously watched. The results were not always as effective as Netflix
wanted, in one instance enigmatically pairing "Death Wish 3" with "The Bible Collection: Moses."
"We were doing a really good job with mathematical crunching, but we needed to know our content
better," Yellin said. "That requires real humans."
A film school graduate with experience making promotional videos and documentaries, Yellin wrote
the first several hundred tags in 2006. Now there are more than 1,000, created by Yellin's in-house
team.
Yellin conceived of "squirm factor" to describe material that causes viewers to feel embarrassed for the characters on the screen. That tag resulted, for example, in
linking the cringe-inducing British version of "The Office" with the disturbing, yet funny 1998 film "Happiness" and the cable comedy "Louie," which chronicles bizarre
and awkward events in a fictional version of comedian Louis C.K.'s life.
Yellin also decided, after much internal debate, that intellectually challenging films are better described on the website as "cerebral" rather than "brainy."
"A lot of people thought it was too much of a ten-dollar word, but we concluded that if you like cerebral titles, then 'cerebral' will be a good word for you," he recalled.
"Campy," on the other hand, had to be abandoned for subscribers in Latin America. The term, used in the U.S. and the United Kingdom to describe such movies as "Dr.
Horrible's Sing-Along Blog" and the 1986 musical "Little Shop of Horrors," didn't translate into Spanish.
Analysts say Netflix's recommendation engine is a critical reason the 15-year-old company has become a powerhouse in the home entertainment industry, despite
consumer uproar last year when it raised monthly fees as much as 60% and announced, then quickly abandoned, plans to launch a separate brand for DVD rentals
called Qwikster.
When customers are satisfied with what they're watching, they're more likely to continue paying $8 or more per month for subscriptions, according to marketing
consultant Stuart Skorman.
"There's so much content online right now and people hate wasting time watching things they don't like. Knowing what your customer wants helps to stand out from
competition," said Skorman, who has more than 10 years experience in digital media.
For Harty, tagging makes the daily struggle to become a successful screenwriter a little easier.
Since moving seven years ago to Los Angeles from South Boston, where he grew up, he has worked as a production assistant on the medical drama "House" and as a
video game tester. He was one of the first taggers hired by Yellin in late 2006 and has assessed more than 1,200 movies and TV shows.
Tagging, along with his work as a freelance artist for television, movies and video games, allows Harty to pay rent on his modest apartment, which is decorated with
little more than a Red Sox blanket on the couch. DVDs crowd a bookcase and shelves over his desk. A lottery ticket is pinned to a bulletin board.
The work is compatible with a life dedicated to writing screenplays in hopes of landing an agent and making a sale to a studio. Harty grew up loving what he calls
"escapist, popcorn films" like "Die Hard" and his latest script is set in the world of industrial espionage — "a James Bond meets Danny Ocean kind of thing."
Like other Netflix taggers who meet once a year at the company's Beverly Hills office to share complaints, learn new tags and receive feedback from Yellin's
department, Harty suffers at least one occupational hazard: It's impossible to go to a theater, sit back with popcorn and enjoy a movie.
"I'm always tagging in the back of my mind," he said. "When I was watching 'Prometheus,' I was tagging the gore. I was like, 'It's gory, but it's not high gory.'"
How did the tags relate to Netflix's "personalized genres"?
What algorithm converted this mass of tags into precisely
76,897 genres?
Needed someone to explain the back end.
So, after securing the data, called up Netflix's PR liaison, a
Dutch guy named Joris Evers
“And now you want to come in and talk to Todd Yellin, I guess?”
Yellin is Netflix's VP of Product , responsible for the creation
of Netflix's system. Tagging all the movies was his idea.
How to tag them began with a 24-page document he wrote himself.
He tagged the early movies and guided the creation of all the systems
Yellin had become my Wizard of Oz, who made the machine,
the human whose intelligence and sensibility I'd been
tracking through the data.

At our interview, Yellin turned to me and said,
"I've been waiting for someone to bubble up like this for years."
Though Yellin seems impressed at our nerdiness,
he patiently explains that we've merely skimmed one endproduct of the entire Netflix data infrastructure.
There is so much more data and a whole lot more
intelligence baked into the system than we've captured.
Here's how he told me all the pieces fit together.
"My first goal was: tear apart content!" he said
How do you systematically dismember thousands of movies using a bunch of different people who all
need to have the same understanding of what a given microtag means?
In 2006, Yellin holed up with a couple of engineers and spent months developing a document called

"Netflix Quantum Theory", "microtag."
The Netflix
the "social

Quantum Theory doc spelled out ways of tagging movie endings,

acceptability" of lead characters, and dozens of other facets of a movie.

Many values are "scalar," that is to say, they go from 1 to 5.
So,

every movie gets a romance rating, not just the ones

labeled "romantic" in the personalized genres. Every movie's ending is
rated from happy to sad, passing through ambiguous. Every plot is tagged. Lead
characters' jobs are tagged. Movie locations are tagged. Everything. Everyone.
That's the data at the base of the pyramid. It is the basis for creating all the

altgenres that I scraped. Netflix's engineers took the microtags and created a
syntax for the genres, much of which we were able to reproduce in our
generator.

It's where the human intelligence of the taggers gets
combined with the machine intelligence of the algorithms.
There's something in the Netflix personalized genres that I
think we can tell is not fully human, but is revealing in a way
that humans alone might not be.
For example, the adjective "feel good" gets attached to movies that have a certain set of features, most
importantly a happy ending. It's not

a direct tag that people attach so much as a computed movie

category based on an underlying set of tags.
The only semi-similar project = Pandora's once-lauded Music
but what's amazing about Netflix

Genome Project,

is that its descriptions of movies are

foregrounded. It's not just show you things you might like, but tell you what
kinds of things those are. A tool for introspection.
Netflix's old way of recommending movies. The company used to could kind of
predict how many stars you might give a movie. And so, the company encouraged
its users to rate movie after movie, so that it could take those numeric values
and develop a taste profile for you.
They even offered a $1

million prize to the team that could design an algorithm that

would improve the company's ability to predict how many stars users would
give movies. It took years to improve the algorithm by a mere 10 percent.
The prize was awarded in 2009, but

Netflix never actually incorporated the new

models. Because Netflix had decided to "go

beyond the 5 stars," which is

where the personalized genres come in.
The

human language of the genres helps people identify with the recommendations.

"Predicting something is 3.2 stars is kind of fun if you have an engineering
sensibility, but it would be more useful to talk about dysfunctional families
and viral plagues. We wanted to put in more language,"
"Highlight our personalization because we pride ourselves on putting the right title in front of the
right person at the right time."
And nothing highlights their personalization like throwing you a very, very specific altgenre.
Why aren't they ultraspecific,, super long, like the gonzo genres?
Genres were limited by three main factors:

1) Display 50 characters for various UI reasons
2) "critical mass" of content that fit the description of the genre, at least in
Netflix's extended DVD catalog;
3) they only wanted genres that made syntactic sense.

In Netflix's real world, there are no genres that have more than five descriptors.
Four descriptors are rare, but they do show up for users: Scary Cult Mad-Scientist Movies from the
1970s.
Three descriptors are more common: Feel-good Foreign Comedies for Hopeless Romantics.
Two are widely used: Steamy Mind Game Movies. And, of course, there are many ones: Quirky
Movies.
underlying tagging data isn't just used to create genres, but also to increase
the level of personalization in all the movies a user is shown.
If Netflix knows you love Action Adventure movies with high romantic ratings (on their 1-5 scale), it
might show you that kind of movie, without ever saying, "Romantic Action Adventure Movies."

"We're gonna tag how much romance is in a movie. We're not gonna tell you
how much romance is in it, but we're gonna recommend it,"
Netflix has built a system that really only has one analog in the tech world: Facebook's NewsFeed.
But instead of serving you up the pieces of web
Netflix is serving you up filmed

content that the algorithm thinks you'll like,

entertainment.

hybrid human and machine intelligence approach. They could have purely used
computation. For example, looking at people with similar

viewing habits and

recommending movies based on what they watched. (And Netflix does use this kind
of data, too.) But they went beyond that approach

"It's

to look at the content itself.

a real combination: machine-learned, algorithms, algorithmic syntax, "

"also a bunch of geeks who love this stuff going deep."
As a thought experiment: Imagine if

Facebook broke down individual websites

according to a 36-page tagging document that let the company truly understand what it
was people liked about Atlantic orPopular Science or 4chan or ViralNova?

It might be impossible with web content. But if Netflix's system didn't already exist, most people
would probably say that it couldn't exist either.
Perry Mason Mystery
Sitting atop

the list of mostly expected Hollywood stars is Raymond Burr, who starred in the

1950s television series Perry Mason. Then, at number seven, we find Barbara Hale,
who starred opposite

Burr in the show.

How can Hale and Burr outrank Meryl Streep and Doris Day, not to mention Samuel L. Jackson,
Nicholas Cage, Fred Astaire, Sean Connery, and all these other actors in the top few dozen?
It's not that the list is nonsensical.
Netflix's actor-based genre-creation doesn't make much sense.
But that's not the case at all. The

rest of the actors at the top of the list make a lot of

sense, even if it does not precisely reflect the top box-office earners.
list of the top 15 directors

Christian I. Nyby II directed several Perry Mason
made-for-TV movies in the 1980s.
(His father, Christian I. Nyby, directed episodes of
the original series, too!)
strange thing is that these lists seem pretty spot-on, except for this weird
Perry Mason thing.
In the DVD days,
Perry Mason fans ordered a

ton of Perry Mason, one after the

other after the other,", "It created sufficient demand that you guys
thought there should be categories."

That is not an accurate theory. That's just not how it worked.
On the other hand, no one — not even Yellin — is quite sure why there are so
many altgenres that feature Raymond Burr and Barbara Hale. It's
inexplicable with human logic. It's just something that happened.
when companies combine human intelligence and machine intelligence,
some things happen that we cannot understand.
philosophical for a minute. In a human world, life is made
interesting by serendipity," "The more complexity you add to
a machine world, you're adding serendipity that you couldn't
imagine. Perry Mason is going to happen. These ghosts in the machine are always going to be a
by-product of the complexity.
"Let me get

sometimes we call it a bug and sometimes we call
it a feature."
Perry Mason episodes were famous for the reveal, the pivotal
moment in a trial when Mason would reveal the crucial piece of
evidence that makes it all makes sense and wins the day.

Now, reality

gets coded into data for the machines, and then

decoded back into descriptions for humans. Along the way,
humans ability to understand what's happening gets thinned
out. When we go looking for answers and causes, we rarely find that aha!
evidence or have the Perry Mason moment. Because it all doesn't actually
make sense.
Netflix may have solved the mystery of what to watch next, but that
generated its own smaller mysteries.
컴퓨터가 만들어낸 데이터 중에서
우리가 이해할 수 없는 것들이 많아졌
다.
우린 그것을 오류라 하기도 하고,
주목해야 할 것이라 여기기도 한다.
sometimes we call that a bug and
sometimes we call it a feature.
To understand how people look for movies,
Netflix (the video service) created 76,897 micro-genres.
The writer and coworker took the genre descriptions,
broke them down to their key words,
…
and built our own new-genre generator.

Contenu connexe

Dernier

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

How Netflix Reverse Engineered Hollywood

  • 1. How Netflix Reverse Engineered Hollywood + The Antlantic - Alexis C. Madrigal / 맹욱재 x 2014 winter
  • 5. 악마 같은 아이가 나오는 컬트 공포 영화 (Cult Evil Kid Horror Movies) 유럽 배경의 60년대 영국 공상과학/판타지물 (British set in Europe Sci-Fi & Fantasy from the 1960s) 비평가들에게 호평받은 감동적 패배자 영화 (Critically-acclaimed Emotional Underdog Movies) Some of them just seem so specific that it's absurd. http://digxtal.com/insight/20140107/76897-altgenres/
  • 6. If Netflix can show such tiny slices of cinema to any given user, and they have 40 million users, how vast did their set of "personalized genres" need to be to describe the entire Hollywood universe?
  • 9. Asking my followers to submit the categories that showed up for them on Netflix to a shared document. "To my knowledge, no such list exists, but obviously one should," I wrote. "And then we can see what Netflix is really doing to us."
  • 11. Sarah Pavis(a writer and engineer) pointed out Netflix's genre URLs were sequentially numbered. One could pull up more and more genres by simply changing the number at the end of the web address. http://movies.netflix.com/WiAltGenre?agid=1 linked to "African-American Crime Documentaries" http://movies.netflix. com/WiAltGenre?agid=2 linked to "Scary Cult Movies from the 1980s." And so on.
  • 12. 1000: Movies directed by Otto Preminger. 3000: Dramas Starring Sylvester Stallone. 5000! Critically-Acclaimed Crime Movies from the 1940s. 20000! Mother-Son Movies from the 1970s. There were a lot of blanks in the data, but the entries extended into the 90,000s.
  • 13. This database probing revealed three things: 1) Netflix had an absurdly large number of genres, an order of magnitude or two more than I had thought 2) it was organized in a way that I didn't understand 3) there was no way I could go through all those genres by hand.
  • 14. there was a way to scrape all this data expensive piece of software called UBot Studio that lets you easily write scripts for automating things on the web.
  • 17. the bot got up and running and simply copied and pasted from URL after URL, essentially replicating a human doing the work. It took nearly a day of constantly running a little Asus laptop in the corner of our kitchen to grab it all.
  • 18. Emotional Independent Sports Movies Spy Action & Adventure from the 1930s Cult Evil Kid Horror Movies Cult Sports Movies Sentimental set in Europe Dramas from the 1970s Visually-striking Foreign Nostalgic Dramas Japanese Sports Movies Gritty Discovery Channel Reality TV Romantic Chinese Crime Movies Mind-bending Cult Horror Movies from the 1980s Dark Suspenseful Sci-Fi Horror Movies Gritty Suspenseful Revenge Westerns Violent Suspenseful Action & Adventure from the 1980s Time Travel Movies starring William Hartnell Romantic Indian Crime Dramas Evil Kid Horror Movies Visually-striking Goofy Action & Adventure British set in Europe Sci-Fi & Fantasy from the 1960s Dark Suspenseful Gangster Dramas Critically-acclaimed Emotional Underdog Movies
  • 19. Not every genre had streaming movies attached to it. The reason for that is the genres that I was looking at represented the total possible universe of different genres, not just the ones that people were being shown on that particular day in this particular geography (the United States). Category 91,300,"Feel-good Romantic Spanish-Language TV Shows" doesn't show me anything I can stream. But Category 91,307, "Visually Striking Latin American Comedies" has two movies Category 6,307, "Visually Striking Romantic Dramas" has 20.
  • 20. The existence of a genre in the database doesn't precisely correspond to the number of movies that Netflix has in its vaults. All the genre's existence means is that there are some movies out there that fit the description.
  • 21. Patterns in the data: Netflix had a defined vocabulary. The same adjectives appeared over and over. Countries of origin a larger-than-expected number of noun descriptions like Westerns and Slashers. Ways of saying where the idea for the movie came from ("Based on Real Life" "Based on Classic Literature") where the movies were set ("Set in Edwardian Era"). various time periods — from the 1980s references to children ("For Ages 8 to 10").
  • 22.
  • 23. Netflix grammar - how it pieced together the words to form comprehensible genres If a movie was both romantic and Oscar-winning, Oscar-winning always went to the left: Oscar-winning Romantic Dramas. Time periods always went at the end of the genre: Oscar-winning Romantic Dramas from the 1950s. The single-word adjectives (such as romantic) could basically just pile up, though, at least to a point: Oscar-winning Romantic Forbidden-Love Movies. the Content-area categories were generally tacked onto the end: Oscarwinning Romantic Movies about Marriage.
  • 24. hierarchy for each category of descriptor Region + Adjectives + Noun Genre + Based On... + Set In... + From the... + About... + For Age X to Y wildcards(exception) like everyone's favorite, "With a Strong Female Lead", "For Hopeless Romantics." starring or directed by certain individuals
  • 25. 76,897 genres that my bot eventually returned, were formed from these basic components. the atoms and logic that were used to create them were comprehensible. Ian Bogost suggested building the generator
  • 26.
  • 28. generally used by linguists, digital humanities scholars, and librarians for dealing with corpuses, large amounts of text. Google's Ngram tool https://books.google.com/ngrams/
  • 29. AntConc turn a bunch of text into data that can be manipulated. count the number of times each word appears in the mass of text that forms Netflix's database list of the top 10 ways that Netflix likes to describe movies in their personalized genres.
  • 30.
  • 31. count the appearance of all 3-word phrases that begin with "from"
  • 32. By searching for phrases beginning with "Set in"
  • 33. By searching for phrases beginning with "For," a list of the age-specific genre descriptions. Netflix has content "for kids" generally, as well as for ages 0 to 2, 0 to 4, 2 to 4, 5 to 7, 8 to 10, 8 to 12, and 11 to 12.
  • 35. Separately, calculated the top actors, directors, and creators From spreadsheets, created several different grammars. GONZO setting in the generator The first and easiest method: just lets lots of adjectives pile up and throws all the different descriptors into the mix very often. ● Deep Sea Father-and-Son Period Pieces Based on Real Life Set in the Middle East For Kids ● Assassination Bounty-Hunter Secret Society Dramas Based on Books Set in Europe About Fame For Ages 8 to 10 ● Post-Apocalyptic Comedies About Friendship
  • 36. Scaled back the fun stuff, allowing only a few adjectives into the titles. Became like movie-production logic of the Hollywood studios. Basically: endless recombination of the same few themes. ● Classic Action Movies ● Family-Friendly Westerns ● Buddy Period Pieces Finally, played and played around with different grammatical structures until started to see Netflix's trademark level of specificity ● Raunchy Absurd Slashers ● Fight-the-System Political Love Triangle Mysteries ● Chilling Action Movies About Royalty
  • 37. As worked on the generator, could tell someone had gone down this road before. A single human brain had had to make the decisions that we had. How many adjectives? How long should they be? And even more basic: what should the adjectives be? Why cerebral and not brainy? Why differentiate between gory and violent? As a writer, I kept asking myself: why are the adjectives just right? Mind-bending, Twisty Tale, Mad Scientist, Underdog, Feel-Good, Understated.
  • 38.
  • 39. Cadre of film buffs helps Netflix viewers sort through the clutter Is it 'suspenseful,' 'cerebral,' or a 'sentimental dysfunctional-family drama'? Netflix's taggers will find out to better personalize recommendations After Greg Harty rolls out of bed, he grabs a cup of coffee and starts his work day at a desk in the corner of his living room. His assignment: Watch three episodes of "Modern Family." As the hit sitcom plays, the aspiring screenwriter opens another window on his laptop and pulls up a spreadsheet. He begins picking labels — his employer, Netflix, calls them tags — to describe what he sees. The comedy: "quirky." The humor: "light dark." The tone: "humorous," "irreverent" and "heartfelt." Ty Burrell's Phil: "silly," "childish," a "lovable dumbass." Julie Bowen's Claire: "controlling," "assertive." Ed O'Neill's Jay: "surly," "alpha dog."
  • 40. Later that morning, Harty uploads the spreadsheet to Netflix's computers at its Silicon Valley headquarters; then he starts watching the movie "50/50." "It's a perfect job because I don't go to an office. I work on my own time, and I get paid to watch movies, which makes me a better writer," said Harty, 33 Harty's choices will become part of an algorithm that produces the personal recommendations made to about 30 million viewers worldwide when they sign into Netflix. Some of the descriptions are seen by subscribers, while others are used internally. If you like "Twin Peaks," the algorithm says you might also enjoy "Quincy M.E." and "Law & Order: Criminal Intent" because Harty and others on a team of freelancers have tagged each of them as "cerebral," "suspenseful," "TV mysteries."
  • 41. about 40 independent contractors, a group that includes a travel writer in Hawaii, a stay-at-home mother in Illinois, an independent filmmaker in Mexico City, and several aspiring screenwriters in Los Angeles. The taggers were hired for their love of entertainment and their ability to evaluate it quickly. Many are film school graduates who once worked in Hollywood — or dream of doing so. They're the ones who pick from more than 1,000 tags to describe thousands of movies and television series offered by Netflix to viewers in the U.S., Canada, Latin America, Great Britain and Ireland. Every movie is watched by a single person, as are at least three episodes of every TV series. Taggers are paid several hundred dollars per week — pocket change for a company that generated $1.76 billion of revenue in the first six months of this year — to watch between 10 to 20 hours of content. This system, dependent upon taggers' cultural sensibilities and judgment, stands out among companies that rent or sell products online. Amazon, for instance, bases its recommendations entirely on computer analyses of comparable purchases.
  • 42. Without the characterizations, viewers face an overwhelming number of choices. Although Netflix doesn’t disclose the number of its films and TV shows available to stream, the website InstantWatcher estimates the inventory is in excess of 14,000 titles. There are also thousands of titles on DVD. Netflix's tagging system is used on all of the company's content whether it's distributed in the U.S. or to its overseas markets. The format of that content may be different however. "Modern Family," for instance, will soon be available to stream in the United Kingdom and Ireland, but is only on DVD in the U.S. The company has found that 75% of its subscribers decide what to watch based on its recommendations. "People consume more hours of video and stick with the service longer when we use these tags," said Todd Yellin, vice president of product innovation who oversees Netflix's recommendation engine. When Yellin came to Netflix six years ago, computers alone were making recommendations based on what the subscriber had previously watched. The results were not always as effective as Netflix wanted, in one instance enigmatically pairing "Death Wish 3" with "The Bible Collection: Moses." "We were doing a really good job with mathematical crunching, but we needed to know our content better," Yellin said. "That requires real humans." A film school graduate with experience making promotional videos and documentaries, Yellin wrote the first several hundred tags in 2006. Now there are more than 1,000, created by Yellin's in-house team.
  • 43. Yellin conceived of "squirm factor" to describe material that causes viewers to feel embarrassed for the characters on the screen. That tag resulted, for example, in linking the cringe-inducing British version of "The Office" with the disturbing, yet funny 1998 film "Happiness" and the cable comedy "Louie," which chronicles bizarre and awkward events in a fictional version of comedian Louis C.K.'s life. Yellin also decided, after much internal debate, that intellectually challenging films are better described on the website as "cerebral" rather than "brainy." "A lot of people thought it was too much of a ten-dollar word, but we concluded that if you like cerebral titles, then 'cerebral' will be a good word for you," he recalled. "Campy," on the other hand, had to be abandoned for subscribers in Latin America. The term, used in the U.S. and the United Kingdom to describe such movies as "Dr. Horrible's Sing-Along Blog" and the 1986 musical "Little Shop of Horrors," didn't translate into Spanish. Analysts say Netflix's recommendation engine is a critical reason the 15-year-old company has become a powerhouse in the home entertainment industry, despite consumer uproar last year when it raised monthly fees as much as 60% and announced, then quickly abandoned, plans to launch a separate brand for DVD rentals called Qwikster. When customers are satisfied with what they're watching, they're more likely to continue paying $8 or more per month for subscriptions, according to marketing consultant Stuart Skorman. "There's so much content online right now and people hate wasting time watching things they don't like. Knowing what your customer wants helps to stand out from competition," said Skorman, who has more than 10 years experience in digital media. For Harty, tagging makes the daily struggle to become a successful screenwriter a little easier. Since moving seven years ago to Los Angeles from South Boston, where he grew up, he has worked as a production assistant on the medical drama "House" and as a video game tester. He was one of the first taggers hired by Yellin in late 2006 and has assessed more than 1,200 movies and TV shows. Tagging, along with his work as a freelance artist for television, movies and video games, allows Harty to pay rent on his modest apartment, which is decorated with little more than a Red Sox blanket on the couch. DVDs crowd a bookcase and shelves over his desk. A lottery ticket is pinned to a bulletin board. The work is compatible with a life dedicated to writing screenplays in hopes of landing an agent and making a sale to a studio. Harty grew up loving what he calls "escapist, popcorn films" like "Die Hard" and his latest script is set in the world of industrial espionage — "a James Bond meets Danny Ocean kind of thing." Like other Netflix taggers who meet once a year at the company's Beverly Hills office to share complaints, learn new tags and receive feedback from Yellin's department, Harty suffers at least one occupational hazard: It's impossible to go to a theater, sit back with popcorn and enjoy a movie. "I'm always tagging in the back of my mind," he said. "When I was watching 'Prometheus,' I was tagging the gore. I was like, 'It's gory, but it's not high gory.'"
  • 44. How did the tags relate to Netflix's "personalized genres"? What algorithm converted this mass of tags into precisely 76,897 genres? Needed someone to explain the back end. So, after securing the data, called up Netflix's PR liaison, a Dutch guy named Joris Evers “And now you want to come in and talk to Todd Yellin, I guess?” Yellin is Netflix's VP of Product , responsible for the creation of Netflix's system. Tagging all the movies was his idea. How to tag them began with a 24-page document he wrote himself. He tagged the early movies and guided the creation of all the systems
  • 45. Yellin had become my Wizard of Oz, who made the machine, the human whose intelligence and sensibility I'd been tracking through the data. At our interview, Yellin turned to me and said, "I've been waiting for someone to bubble up like this for years."
  • 46. Though Yellin seems impressed at our nerdiness, he patiently explains that we've merely skimmed one endproduct of the entire Netflix data infrastructure. There is so much more data and a whole lot more intelligence baked into the system than we've captured. Here's how he told me all the pieces fit together. "My first goal was: tear apart content!" he said
  • 47. How do you systematically dismember thousands of movies using a bunch of different people who all need to have the same understanding of what a given microtag means? In 2006, Yellin holed up with a couple of engineers and spent months developing a document called "Netflix Quantum Theory", "microtag." The Netflix the "social Quantum Theory doc spelled out ways of tagging movie endings, acceptability" of lead characters, and dozens of other facets of a movie. Many values are "scalar," that is to say, they go from 1 to 5. So, every movie gets a romance rating, not just the ones labeled "romantic" in the personalized genres. Every movie's ending is rated from happy to sad, passing through ambiguous. Every plot is tagged. Lead characters' jobs are tagged. Movie locations are tagged. Everything. Everyone.
  • 48. That's the data at the base of the pyramid. It is the basis for creating all the altgenres that I scraped. Netflix's engineers took the microtags and created a syntax for the genres, much of which we were able to reproduce in our generator. It's where the human intelligence of the taggers gets combined with the machine intelligence of the algorithms. There's something in the Netflix personalized genres that I think we can tell is not fully human, but is revealing in a way that humans alone might not be. For example, the adjective "feel good" gets attached to movies that have a certain set of features, most importantly a happy ending. It's not a direct tag that people attach so much as a computed movie category based on an underlying set of tags.
  • 49. The only semi-similar project = Pandora's once-lauded Music but what's amazing about Netflix Genome Project, is that its descriptions of movies are foregrounded. It's not just show you things you might like, but tell you what kinds of things those are. A tool for introspection.
  • 50. Netflix's old way of recommending movies. The company used to could kind of predict how many stars you might give a movie. And so, the company encouraged its users to rate movie after movie, so that it could take those numeric values and develop a taste profile for you. They even offered a $1 million prize to the team that could design an algorithm that would improve the company's ability to predict how many stars users would give movies. It took years to improve the algorithm by a mere 10 percent. The prize was awarded in 2009, but Netflix never actually incorporated the new models. Because Netflix had decided to "go beyond the 5 stars," which is where the personalized genres come in.
  • 51. The human language of the genres helps people identify with the recommendations. "Predicting something is 3.2 stars is kind of fun if you have an engineering sensibility, but it would be more useful to talk about dysfunctional families and viral plagues. We wanted to put in more language," "Highlight our personalization because we pride ourselves on putting the right title in front of the right person at the right time." And nothing highlights their personalization like throwing you a very, very specific altgenre.
  • 52. Why aren't they ultraspecific,, super long, like the gonzo genres? Genres were limited by three main factors: 1) Display 50 characters for various UI reasons 2) "critical mass" of content that fit the description of the genre, at least in Netflix's extended DVD catalog; 3) they only wanted genres that made syntactic sense. In Netflix's real world, there are no genres that have more than five descriptors. Four descriptors are rare, but they do show up for users: Scary Cult Mad-Scientist Movies from the 1970s. Three descriptors are more common: Feel-good Foreign Comedies for Hopeless Romantics. Two are widely used: Steamy Mind Game Movies. And, of course, there are many ones: Quirky Movies.
  • 53. underlying tagging data isn't just used to create genres, but also to increase the level of personalization in all the movies a user is shown. If Netflix knows you love Action Adventure movies with high romantic ratings (on their 1-5 scale), it might show you that kind of movie, without ever saying, "Romantic Action Adventure Movies." "We're gonna tag how much romance is in a movie. We're not gonna tell you how much romance is in it, but we're gonna recommend it,"
  • 54. Netflix has built a system that really only has one analog in the tech world: Facebook's NewsFeed. But instead of serving you up the pieces of web Netflix is serving you up filmed content that the algorithm thinks you'll like, entertainment. hybrid human and machine intelligence approach. They could have purely used computation. For example, looking at people with similar viewing habits and recommending movies based on what they watched. (And Netflix does use this kind of data, too.) But they went beyond that approach "It's to look at the content itself. a real combination: machine-learned, algorithms, algorithmic syntax, " "also a bunch of geeks who love this stuff going deep."
  • 55. As a thought experiment: Imagine if Facebook broke down individual websites according to a 36-page tagging document that let the company truly understand what it was people liked about Atlantic orPopular Science or 4chan or ViralNova? It might be impossible with web content. But if Netflix's system didn't already exist, most people would probably say that it couldn't exist either.
  • 57. Sitting atop the list of mostly expected Hollywood stars is Raymond Burr, who starred in the 1950s television series Perry Mason. Then, at number seven, we find Barbara Hale, who starred opposite Burr in the show. How can Hale and Burr outrank Meryl Streep and Doris Day, not to mention Samuel L. Jackson, Nicholas Cage, Fred Astaire, Sean Connery, and all these other actors in the top few dozen?
  • 58. It's not that the list is nonsensical. Netflix's actor-based genre-creation doesn't make much sense. But that's not the case at all. The rest of the actors at the top of the list make a lot of sense, even if it does not precisely reflect the top box-office earners. list of the top 15 directors Christian I. Nyby II directed several Perry Mason made-for-TV movies in the 1980s. (His father, Christian I. Nyby, directed episodes of the original series, too!)
  • 59. strange thing is that these lists seem pretty spot-on, except for this weird Perry Mason thing.
  • 60. In the DVD days, Perry Mason fans ordered a ton of Perry Mason, one after the other after the other,", "It created sufficient demand that you guys thought there should be categories." That is not an accurate theory. That's just not how it worked. On the other hand, no one — not even Yellin — is quite sure why there are so many altgenres that feature Raymond Burr and Barbara Hale. It's inexplicable with human logic. It's just something that happened. when companies combine human intelligence and machine intelligence, some things happen that we cannot understand.
  • 61. philosophical for a minute. In a human world, life is made interesting by serendipity," "The more complexity you add to a machine world, you're adding serendipity that you couldn't imagine. Perry Mason is going to happen. These ghosts in the machine are always going to be a by-product of the complexity. "Let me get sometimes we call it a bug and sometimes we call it a feature."
  • 62. Perry Mason episodes were famous for the reveal, the pivotal moment in a trial when Mason would reveal the crucial piece of evidence that makes it all makes sense and wins the day. Now, reality gets coded into data for the machines, and then decoded back into descriptions for humans. Along the way, humans ability to understand what's happening gets thinned out. When we go looking for answers and causes, we rarely find that aha! evidence or have the Perry Mason moment. Because it all doesn't actually make sense. Netflix may have solved the mystery of what to watch next, but that generated its own smaller mysteries.
  • 63. 컴퓨터가 만들어낸 데이터 중에서 우리가 이해할 수 없는 것들이 많아졌 다. 우린 그것을 오류라 하기도 하고, 주목해야 할 것이라 여기기도 한다. sometimes we call that a bug and sometimes we call it a feature.
  • 64. To understand how people look for movies, Netflix (the video service) created 76,897 micro-genres. The writer and coworker took the genre descriptions, broke them down to their key words, … and built our own new-genre generator.