源码公布:爬取豆瓣前250名的电影

举报

发布于:2022-09-18

更新于:2022-09-18

2

1924

流浪的墨墨Depth_小数点工作室_oc

0M/ 0.0M

作品介绍:

源码如下: import requests from bs4 import BeautifulSoup def get_html_text(url, code='utf-8'): """get html source code""" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win**; x**) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} try: response = requests.get(url, headers=headers) response.raise_for_status() response.encoding = code return response.text except requests.exceptions.RequestException: return '' with open('豆瓣TOP250.txt','w', encoding='utf-8') as f: for i in range(10): homepage = 'https://movie.douban.com/top250' + min(1, i) * f'?start={i*25}&filter=' soup = BeautifulSoup(get_html_text(homepage), 'html.parser') movielist = soup('ol')[0] for movie in movielist('li'): title = movie('span', class_='title')[0].string try: rating = movie('span', class_='rating_num')[0].string except IndexError: rating = '无评分' try: comments = movie('span', class_='inq')[0].string except IndexError: comments = '无评论' f.write(f'{title:10}\t{rating:3}\t{comments}\n')

操作说明:

!!!注意: 本作品运用了requests和bs4等第三方库 运行时会提示未下载第三方库 直接点确定即可

收藏