0%

这个是南京大学MOOC <用python玩转数据>的第三周python小项目作业题
用python玩转数据

作业一:

题目:爬取豆瓣的随便一本书的前50热评,并计算所用评分的平均值(注意:有的评论下无评分)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import requests,re,os,time
from bs4 import BeautifulSoup
def crawler():
html = []
url="https://book.douban.com/subject/2567698/comments/hot?p="
for i in range(3):
re=requests.get(url+str(i+1))
html.append(re.text)
time.sleep(5)
html=' '.join(html)
return html

def parse(html):
comments=[]
grades=[]
soup=BeautifulSoup(html,'lxml')
commenttags=soup.select(".short")
for comment in commenttags:
comments.append(comment.string)
starttags=soup.select(".user-stars")
for start in starttags:
#提取评分
grade=int(' '.join(start['class']).split(' ')[1][7:])
grades.append(grade)
mean=sum(grades)/len(grades)
return comments,mean

if __name__=="__main__":
html=crawler()
allcomments,mean=parse(html)
print(mean)
for comment in allcomments:
print(comment)
Read more »

Think about the many types of writing you do,for example,writing for school,to family and friends,or on your job?what kind of writing would you consider to be technical writing? How would you define technical writing? Hello!I‘m chen meihua from Southeast University.In this section,I will provide you with an overview of technical writing and will focus on its definetion,main featutes and purposes.

Read more »

主要的想法就是先用request.get()方法获取一些url,得到url的html后再用beautifulsoup解析这些网页,获取用户的id,name,href,commnet这四个属性,然后在用户的这些属性存入数据库,又添加了一些无聊的数据库的增删改查功能,接着从数据库取出所有的评论,一边取一边用正则清洗一些无用的评论,然后直接用列表存一下清洗后的评论(反正数据比较少),然后用jieba库来中文分词,最后用wordcloude库生成词云。ui可以自定义,我做得超丑(主要觉得没意思[摊手])
重要的是:python刚学,代码写得特别垃圾,代码写得特别垃圾,代码写得特别垃圾,内存列表都是随便开的。逃~。以后会重构的。

Read more »