There's something about Netflix: Descriptions of their films 🎞

Published Nov 20, 2019 by Yeoun Yi
#advertising #marketing



Netflix is famous for data analysis. They change posters and recommend films based on data. However, I haven’t been told a lot about their descriptions of TV shows or movies.

Searching about this, I came across this posting . It says Netflix do A/B Testing for their descriptions.

I searched for job descriptions of Netflix, and they did ask for A/B Testing experience.



I got curious how much the descriptions tested by A/B Testing improved, compared to other descriptions without any testing. So, I compared the descriptions of the same movies in Netflix and Naver, which is the biggest web portal in Korea.

1. Crawling


I crawled Netflix descriptions from 4FLIX and Naver descriptions from Naver

Crawling 4FLIX website was easy, but crawling search results in Naver was a little bit challenging.

First, find the tag that contains description in Naver search page.



As I want the descriptions in Naver of the same movies in Netflix,make the list of movie names that I want to crawl in Naver, using Netflix crawled data.



But this url doesn’t work. Korean characters should be converted using quote.

df2 = pd.DataFrame(columns=["title", "naver"])
count=0

from urllib.parse import quote
from urllib.request import urlopen

for title in drama['title']:
    url = "https://search.naver.com/search.naver?sm=top_hty&fbm=1&ie=utf8&query=" + quote(title)
    with urllib.request.urlopen(url) as url:
        try:
            doc = url.read()
            soup = BeautifulSoup(doc, "html.parser")
            naver = soup.find_all(id="layer_sy")[0].text.strip()
            df2.loc[count] = [title, naver]
            count+=1
            
        except:
            pass    

2. Analysis


Korean is a morphologically complicated language. For an accurate analysis, POS analysis should be done.

from konlpy.tag import Kkma 
kkma = Kkma()

# dataframe for POS frequency analysis
pos_df = pd.DataFrame()

# for loop for all descriptions 
for i in range(len(df['desc'])):
    pos = pd.DataFrame.from_dict(dict(Counter(list(dict(kkma.pos(df['desc'][i])).values()))), orient='index').transpose()
    
    pos_df = pd.concat([pos_df,pos], axis=0, ignore_index=True)
    
    # freq. value for POS not appearing in current descriptions set to be zero  
    pos_df = pos_df.fillna(0) 

Result for this code looks like this.



I concated this dataframe with crawled data and normalized the values by total length of each description. I also saved all POS tagged descriptions.

Based on POS taggings, I could compare the frequency of certain words and phrases in Netflix and Naver.

naver_passive = pd.DataFrame(columns=['title', 'naver_desc'])  
count = 0
for i in range(len(naver['pos'])):
	# searching expression for Korean passive phrase in Naver descriptions
    if "('게', 'ECD'), ('되', 'VV')" in str(naver.iloc[i,3]):
        title = naver.iloc[i,0]
        naver_desc = naver.iloc[i,1]
        naver_passive.loc[count] = [title, naver_desc]
        count += 1

2.1. Independent Characters


Netflix used verb β€˜come forward(λ‚˜μ„œλ‹€)’ or β€˜protect(지킀닀)’ more frequently than Naver. These verbs can express the indepedency of characters: e.g. Someone come forward to protect the city. Even describing the same movie, Naver used verb β€˜get(λ°›λ‹€)’: e.g. Someone get a call or receive a command.

Β  Netflix Netflix Naver Naver
verb total freq. freq. per 10,000 morphemes total freq. freq. per 10,000 morphemes
come forward (λ‚˜μ„œλ‹€) 60 10.913 79 3.567
protect (지킀닀) 40 7.276 97 4.379
get/receive (λ°›λ‹€) 86 15.643 356 16.072

Naver also used passive expressions more, which makes characters less indepedent. Below is the table of number of descriptions which contained at least one passive expression.

Passive Expression Netflix Naver
-μ–΄ 지닀 43 179
-게 λ˜λ‹€ 148 590

2.2. Empathy to Characters


Netflix used imperative, interrogative sentences more. The table shows the frequency of verb endings per 10,000 morphemes.

Verb endings Netflix Naver
Imperative verb endings 0.139 0.025
Interrogative verb endings 1.988 0.726

They also used 1st person pronouns, β€˜I’ or β€˜We’ more. By using these, they wrote sentences that the characters would have said. Naver used β€˜self’ instead of 1st person pronouns.

Β  Netflix Netflix Naver Naver
words total freq. freq. per 10,000 morphemes total freq. freq. per 10,000 morphemes
I (λ‚˜) 37 6.730 94 4.244
We (우리) 36 6.548 58 2.619
She (κ·Έλ…€) 81 14.733 479 21.626
He (κ·Έ) 213 38.743 1118 50.475
Self (μžμ‹ ) 74 13.460 619 27.946

2.3. Narrative vs Descriptive writing


Netflix used verbs more, and Naver used adverbs and adjectives more. Naver also used β€˜have(가지닀)’ more to describe characters. Netflix’s descriptions were narrative, focued more on the plots, while Naver’s were descriptive, focued more on the emotions. Below is the table of averaged frequency of POS per 10,000 morphemes.

POS Netflix Naver
Adverbs 0.643 1.027
Adjectives 3.201 4.463
Verbs 2.367 1.139

In short, I found out characters in Netflix descriptions are expressed in a way more independent and emphathable. Netflix descriptions also focused more on the plot, not on the emotion.



*****

© Yeoun Yi