There's something about Netflix: Descriptions of their films 🎞

Published Nov 20, 2019 by Yeoun Yi
#advertising #marketing

Netflix is famous for data analysis. They change posters and recommend films based on data. However, I haven’t been told a lot about their descriptions of TV shows or movies.

Searching about this, I came across this posting . It says Netflix do A/B Testing for their descriptions.

I searched for job descriptions of Netflix, and they did ask for A/B Testing experience.

I got curious how much the descriptions tested by A/B Testing improved, compared to other descriptions without any testing. So, I compared the descriptions of the same movies in Netflix and Naver, which is the biggest web portal in Korea.

1. Crawling

I crawled Netflix descriptions from 4FLIX and Naver descriptions from Naver

Crawling 4FLIX website was easy, but crawling search results in Naver was a little bit challenging.

First, find the tag that contains description in Naver search page.

As I want the descriptions in Naver of the same movies in Netflix,make the list of movie names that I want to crawl in Naver, using Netflix crawled data.

But this url doesn’t work. Korean characters should be converted using quote.

df2 = pd.DataFrame(columns=["title", "naver"])
count=0

from urllib.parse import quote
from urllib.request import urlopen

for title in drama['title']:
    url = "https://search.naver.com/search.naver?sm=top_hty&fbm=1&ie=utf8&query=" + quote(title)
    with urllib.request.urlopen(url) as url:
        try:
            doc = url.read()
            soup = BeautifulSoup(doc, "html.parser")
            naver = soup.find_all(id="layer_sy")[0].text.strip()
            df2.loc[count] = [title, naver]
            count+=1
            
        except:
            pass    

2. Analysis

Korean is a morphologically complicated language. For an accurate analysis, POS analysis should be done.

from konlpy.tag import Kkma 
kkma = Kkma()

# dataframe for POS frequency analysis
pos_df = pd.DataFrame()

# for loop for all descriptions 
for i in range(len(df['desc'])):
    pos = pd.DataFrame.from_dict(dict(Counter(list(dict(kkma.pos(df['desc'][i])).values()))), orient='index').transpose()
    
    pos_df = pd.concat([pos_df,pos], axis=0, ignore_index=True)
    
    # freq. value for POS not appearing in current descriptions set to be zero  
    pos_df = pos_df.fillna(0) 

Result for this code looks like this.

I concated this dataframe with crawled data and normalized the values by total length of each description. I also saved all POS tagged descriptions.

Based on POS taggings, I could compare the frequency of certain words and phrases in Netflix and Naver.

naver_passive = pd.DataFrame(columns=['title', 'naver_desc'])  
count = 0
for i in range(len(naver['pos'])):
	# searching expression for Korean passive phrase in Naver descriptions
    if "('게', 'ECD'), ('되', 'VV')" in str(naver.iloc[i,3]):
        title = naver.iloc[i,0]
        naver_desc = naver.iloc[i,1]
        naver_passive.loc[count] = [title, naver_desc]
        count += 1

2.1. Independent Characters

Netflix used verb ‘come forward(나서다)’ or ‘protect(지키다)’ more frequently than Naver. These verbs can express the indepedency of characters: e.g. Someone come forward to protect the city. Even describing the same movie, Naver used verb ‘get(받다)’: e.g. Someone get a call or receive a command.

	Netflix	Netflix	Naver	Naver
verb	total freq.	freq. per 10,000 morphemes	total freq.	freq. per 10,000 morphemes
come forward (나서다)	60	10.913	79	3.567
protect (지키다)	40	7.276	97	4.379
get/receive (받다)	86	15.643	356	16.072

Netflix: “루브르 박물관의 큐레이터가 살해되고, 하버드대 교수와 암호 해독가가 레오나르도 다빈치 작품을 둘러싼 난해한 속임수를 해결하러 나선다.”
Naver: “특별강연을 위해 파리에 체류 중이던 하버드대 기호학자 로버트 랭던(톰 행크스)은 깊은 밤 급박한 호출을 받는다.”

Naver also used passive expressions more, which makes characters less indepedent. Below is the table of number of descriptions which contained at least one passive expression.

Passive Expression	Netflix	Naver
-어 지다	43	179
-게 되다	148	590

Netflix: “1900년대 초, 안락한 상류층 도시 생활을 뒤로하고 캐나다 서부의 탄광촌에서 교사로 사는 삶을 택한 씩씩한 여성의 이야기.”
Naver: “젊은 여교사가 서부의 작은 탄광촌에서 아이들을 가르치게 되면서 일어나는 이야기”

2.2. Empathy to Characters

Netflix used imperative, interrogative sentences more. The table shows the frequency of verb endings per 10,000 morphemes.

Verb endings	Netflix	Naver
Imperative verb endings	0.139	0.025
Interrogative verb endings	1.988	0.726

They also used 1st person pronouns, ‘I’ or ‘We’ more. By using these, they wrote sentences that the characters would have said. Naver used ‘self’ instead of 1st person pronouns.

	Netflix	Netflix	Naver	Naver
words	total freq.	freq. per 10,000 morphemes	total freq.	freq. per 10,000 morphemes
I (나)	37	6.730	94	4.244
We (우리)	36	6.548	58	2.619
She (그녀)	81	14.733	479	21.626
He (그)	213	38.743	1118	50.475
Self (자신)	74	13.460	619	27.946

Netflix: “복수를 원하는가? 그렇다면 누가 왜 가뒀는지 비밀을 풀어라.”
Netflix: “죽음 직전, 뇌를 이식한 인공 신체로 부활했다. 사이버 범죄를 소탕하는 전사가 됐다. 하지만 사건을 파고들면서 찾아드는 스스로에 대한 의문. 내 과거는 무엇이고 나는 누구인가. 이제 무엇도 믿을 수 없다. 알아내야 해, 내가 누군지!”
Naver: “사건을 깊이 파고들수록 메이저는 자신의 과거와 존재에 대한 의문을 갖게 되는데…! 스스로의 존재를 찾기 위한, 그리고 세계를 구하기 위한 거대 조직과의 전투가 시작된다!”

2.3. Narrative vs Descriptive writing

Netflix used verbs more, and Naver used adverbs and adjectives more. Naver also used ‘have(가지다)’ more to describe characters. Netflix’s descriptions were narrative, focued more on the plots, while Naver’s were descriptive, focued more on the emotions. Below is the table of averaged frequency of POS per 10,000 morphemes.

POS	Netflix	Naver
Adverbs	0.643	1.027
Adjectives	3.201	4.463
Verbs	2.367	1.139

Netflix: “정우는 학교에서 왕따를 당하는 수연과 친구가 되기로 한다. 한편 정우와 수연은 불난 집에서 남자아이를 구한다.”
Naver: “열 다섯, 가슴 설렌 첫 사랑의 기억을 송두리째 앗아간 쓰라린 상처를 가슴에 품고 살아가는 두 남녀의 숨바꼭질 같은 사랑이야기를 그린 정통 멜로 드라마”

In short, I found out characters in Netflix descriptions are expressed in a way more independent and emphathable. Netflix descriptions also focused more on the plot, not on the emotion.

*****