简单python爬虫案例(爬取慕课网全部实战课程信息) (2)

主函数可以类似这样:
for i in range(4):
main(i+1)
完整代码:

# -*- coding: utf-8 -*- import requests import re import json from requests.exceptions import RequestException from multiprocessing import Pool def get_one_page(url): try: response = requests.get(url) if response.status_code == 200: return response.content.decode("utf-8") return None except RequestException: return None def parse_one_page(html): pattern = re.compile(\'<div>.*?lecturer-info.*?<span>(.*?)</span>.*?shizhan-intro-box.*?title=".*?">\' \'(.*?)</p>.*?class="grade">(.*?)</span>.*?imv2-set-sns.*?</i>\' \'(.*?)</span>.*?class="big-text">(.*?)</p>.*?shizan-desc.*?>\' \'(.*?)</p>.*?</div>\',re.S) items = re.findall(pattern,html) for item in items: yield { \'teacher\': item[0], \'title\': item[1], \'grade\': item[2], \'people\':item[3], \'score\': item[4], \'describe\': item[5] } def write_to_file(content): with open(\'imoocAll.txt\',\'a\',encoding=\'utf-8\') as f: f.write(json.dumps(content,ensure_ascii=False)+\'\n\') f.close() def main(page): url = \'http://coding.imooc.com/?page=\'+str(page) html = get_one_page(url) # parse_one_page(html) # print(html) for item in parse_one_page(html): print(item) write_to_file(item) if __name__ == \'__main__\': pool = Pool() pool.map(main,[i+1 for i in range(4)]) # for i in range(4): # main(i+1)

到这里,我们就能够把慕课网上面的全部实战课程的信息爬取下来,拿到这些数据,你就可以做自己喜爱的分析了

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zgjsgd.html