b站爱5预告片评论简单分析

Oct 21, 2019

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from bs4 import BeautifulSoup
import pandas as pd
import requests

url = "http://comment.bilibili.com/123519261.xml"
html = requests.get(url)
html.encoding = "utf8"

soup = BeautifulSoup(html.text, "lxml")
results = soup.find_all("d")

comments = [comment.text for comment in results]
comments_dict = {"comments": comments}

df = pd.DataFrame(comments_dict)
df.to_csv("bilibili_data.csv", encoding="utf-8-sig")

print("爬取完成!")

2.数据分析

2.1数据查看

1
2
3
4
5
import pandas as pd

data = pd.read_csv("bilibili_data.csv")
del(data["Unnamed: 0"])
data.head(5)

comments
0 啊啊啊啊啊啊啊
1 高三
2 高二加1
3 可惜了关谷,,悠悠
4 没多少戏应该是请不起了

2.2怀念度(年龄)分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
dayicount = 0
daercount = 0
dasancount = 0
gaoyicount = 0
gaoercount = 0
gaosancount = 0
xiaoxuecount = 0
chuzhongcount = 0
for i in range(len(data)):
if "大一" in data["comments"][i]:
dayicount += 1
elif "大二" in data["comments"][i]:
daercount += 1
elif "大三" in data["comments"][i]:
dasancount += 1
elif "高一" in data["comments"][i]:
gaoyicount += 1
elif "高二" in data["comments"][i]:
gaoercount += 1
elif "高三" in data["comments"][i]:
gaosancount += 1
elif "小学" in data["comments"][i]:
xiaoxuecount += 1
elif "初中" in data["comments"][i]:
chuzhongcount += 1
print("大一:",dayicount)
print("大二:",daercount)
print("大三:",dasancount)
print("高一:",gaoyicount)
print("高二:",gaoercount)
print("高三:",gaosancount)
print("小学:",xiaoxuecount)
print("初中:",chuzhongcount)
大一: 60
大二: 34
大三: 19
高一: 72
高二: 59
高三: 70
小学: 5
初中: 2

1000条弹幕就有这么多怀念青春的人 哈哈哈

3.源文件下载

oneindex下载