Published Nov 20, 2019 by Yeoun Yi Netflix is famous for data analysis. They change posters and recommend films based on data. However, I havenβt been told a lot about their descriptions of TV shows or movies. Searching about this, I came across this posting . It says Netflix do A/B Testing for their descriptions. I searched for job descriptions of Netflix, and they did ask for A/B Testing experience. I got curious how much the descriptions tested by A/B Testing improved, compared to other descriptions without any testing. So, I compared the descriptions of the same movies in Netflix and Naver, which is the biggest web portal in Korea. Crawling 4FLIX website was easy, but crawling search results in Naver was a little bit challenging. First, find the tag that contains description in Naver search page. As I want the descriptions in Naver of the same movies in Netflix,make the list of movie names that I want to crawl in Naver, using Netflix crawled data. But this url doesnβt work. Korean characters should be converted using Result for this code looks like this. I concated this dataframe with crawled data and normalized the values by total length of each description. I also saved all POS tagged descriptions. Based on POS taggings, I could compare the frequency of certain words and phrases in Netflix and Naver.
#advertising
#marketing
1. Crawling
I crawled Netflix descriptions from 4FLIX and Naver descriptions from Naver
quote
.df2 = pd.DataFrame(columns=["title", "naver"])
count=0
from urllib.parse import quote
from urllib.request import urlopen
for title in drama['title']:
url = "https://search.naver.com/search.naver?sm=top_hty&fbm=1&ie=utf8&query=" + quote(title)
with urllib.request.urlopen(url) as url:
try:
doc = url.read()
soup = BeautifulSoup(doc, "html.parser")
naver = soup.find_all(id="layer_sy")[0].text.strip()
df2.loc[count] = [title, naver]
count+=1
except:
pass
2. Analysis
Korean is a morphologically complicated language. For an accurate analysis, POS analysis should be done.from konlpy.tag import Kkma
kkma = Kkma()
# dataframe for POS frequency analysis
pos_df = pd.DataFrame()
# for loop for all descriptions
for i in range(len(df['desc'])):
pos = pd.DataFrame.from_dict(dict(Counter(list(dict(kkma.pos(df['desc'][i])).values()))), orient='index').transpose()
pos_df = pd.concat([pos_df,pos], axis=0, ignore_index=True)
# freq. value for POS not appearing in current descriptions set to be zero
pos_df = pos_df.fillna(0)
naver_passive = pd.DataFrame(columns=['title', 'naver_desc'])
count = 0
for i in range(len(naver['pos'])):
# searching expression for Korean passive phrase in Naver descriptions
if "('κ²', 'ECD'), ('λ', 'VV')" in str(naver.iloc[i,3]):
title = naver.iloc[i,0]
naver_desc = naver.iloc[i,1]
naver_passive.loc[count] = [title, naver_desc]
count += 1
2.1. Independent Characters
Netflix used verb βcome forward(λμλ€)β or βprotect(μ§ν€λ€)β more frequently than Naver. These verbs can express the indepedency of characters: e.g. Someone come forward to protect the city.
Even describing the same movie, Naver used verb βget(λ°λ€)β: e.g. Someone get a call or receive a command
.
Β | Netflix | Netflix | Naver | Naver |
---|---|---|---|---|
verb | total freq. | freq. per 10,000 morphemes | total freq. | freq. per 10,000 morphemes |
come forward (λμλ€) | 60 | 10.913 | 79 | 3.567 |
protect (μ§ν€λ€) | 40 | 7.276 | 97 | 4.379 |
get/receive (λ°λ€) | 86 | 15.643 | 356 | 16.072 |
Netflix
: β루λΈλ₯΄ λ°λ¬Όκ΄μ νλ μ΄ν°κ° μ΄ν΄λκ³ , νλ²λλ κ΅μμ μνΈ ν΄λ
κ°κ° λ μ€λλ₯΄λ λ€λΉμΉ μνμ λλ¬μΌ λν΄ν μμμλ₯Ό ν΄κ²°νλ¬ λμ λ€.β
Naver
: βνΉλ³κ°μ°μ μν΄ ν리μ 체λ₯ μ€μ΄λ νλ²λλ κΈ°νΈνμ λ‘λ²νΈ λλ(ν° νν¬μ€)μ κΉμ λ°€ κΈλ°ν νΈμΆμ λ°λλ€.β
Naver also used passive expressions more, which makes characters less indepedent. Below is the table of number of descriptions which contained at least one passive expression.
Passive Expression | Netflix | Naver |
---|---|---|
-μ΄ μ§λ€ | 43 | 179 |
-κ² λλ€ | 148 | 590 |
Netflix
: β1900λ
λ μ΄, μλ½ν μλ₯μΈ΅ λμ μνμ λ€λ‘νκ³ μΊλλ€ μλΆμ νκ΄μ΄μμ κ΅μ¬λ‘ μ¬λ μΆμ νν μ©μ©ν μ¬μ±μ μ΄μΌκΈ°.βNaver
: βμ μ μ¬κ΅μ¬κ° μλΆμ μμ νκ΄μ΄μμ μμ΄λ€μ κ°λ₯΄μΉκ² λλ©΄μ μΌμ΄λλ μ΄μΌκΈ°β
Netflix used imperative, interrogative sentences more. The table shows the frequency of verb endings per 10,000 morphemes.
Verb endings | Netflix | Naver |
---|---|---|
Imperative verb endings | 0.139 | 0.025 |
Interrogative verb endings | 1.988 | 0.726 |
They also used 1st person pronouns, βIβ or βWeβ more. By using these, they wrote sentences that the characters would have said. Naver used βselfβ instead of 1st person pronouns.
Β | Netflix | Netflix | Naver | Naver |
---|---|---|---|---|
words | total freq. | freq. per 10,000 morphemes | total freq. | freq. per 10,000 morphemes |
I (λ) | 37 | 6.730 | 94 | 4.244 |
We (μ°λ¦¬) | 36 | 6.548 | 58 | 2.619 |
She (κ·Έλ ) | 81 | 14.733 | 479 | 21.626 |
He (κ·Έ) | 213 | 38.743 | 1118 | 50.475 |
Self (μμ ) | 74 | 13.460 | 619 | 27.946 |
Netflix
: β볡μλ₯Ό μνλκ°? κ·Έλ λ€λ©΄ λκ° μ κ°λλμ§ λΉλ°μ νμ΄λΌ.β
Netflix
: βμ£½μ μ§μ , λλ₯Ό μ΄μν μΈκ³΅ μ μ²΄λ‘ λΆννλ€. μ¬μ΄λ² λ²μ£λ₯Ό μννλ μ μ¬κ° λλ€. νμ§λ§ μ¬κ±΄μ νκ³ λ€λ©΄μ μ°Ύμλλ μ€μ€λ‘μ λν μλ¬Έ. λ΄ κ³Όκ±°λ 무μμ΄κ³ λλ λꡬμΈκ°. μ΄μ 무μλ λ―Ώμ μ μλ€. μμλ΄μΌ ν΄, λ΄κ° λκ΅°μ§!β
Naver
: βμ¬κ±΄μ κΉμ΄ νκ³ λ€μλ‘ λ©μ΄μ λ μμ μ κ³Όκ±°μ μ‘΄μ¬μ λν μλ¬Έμ κ°κ² λλλ°β¦! μ€μ€λ‘μ μ‘΄μ¬λ₯Ό μ°ΎκΈ° μν, κ·Έλ¦¬κ³ μΈκ³λ₯Ό ꡬνκΈ° μν κ±°λ μ‘°μ§κ³Όμ μ ν¬κ° μμλλ€!β
Netflix used verbs more, and Naver used adverbs and adjectives more. Naver also used βhave(κ°μ§λ€)β more to describe characters. Netflixβs descriptions were narrative, focued more on the plots, while Naverβs were descriptive, focued more on the emotions. Below is the table of averaged frequency of POS per 10,000 morphemes.
POS | Netflix | Naver |
---|---|---|
Adverbs | 0.643 | 1.027 |
Adjectives | 3.201 | 4.463 |
Verbs | 2.367 | 1.139 |
Netflix
: βμ μ°λ νκ΅μμ μλ°λ₯Ό λΉνλ μμ°κ³Ό μΉκ΅¬κ° λκΈ°λ‘ νλ€. ννΈ μ μ°μ μμ°μ λΆλ μ§μμ λ¨μμμ΄λ₯Ό ꡬνλ€.β
Naver
: βμ΄ λ€μ―, κ°μ΄ μ€λ 첫 μ¬λμ κΈ°μ΅μ μ‘λ리째 μμκ° μ°λΌλ¦° μμ²λ₯Ό κ°μ΄μ νκ³ μ΄μκ°λ λ λ¨λ
μ μ¨λ°κΌμ§ κ°μ μ¬λμ΄μΌκΈ°λ₯Ό κ·Έλ¦° μ ν΅ λ©λ‘ λλΌλ§β
In short, I found out characters in Netflix descriptions are expressed in a way more independent and emphathable. Netflix descriptions also focused more on the plot, not on the emotion.