《python3网络爬虫开发实战》--基本库的使用

日期：2022-01-16 栏目：程序人生浏览：次

request:它是最基本的 HTTP 请求模块，可以用来模拟发送请求。就像在浏览器里输入网挝然后回车一样，只需要给库方法传入 URL 以及额外的参数，就可以模拟实现这个过程了。

error:

parse:一个工具模块，提供了许多 URL处理方法，比如拆分、解析、合并等。

robotparser:主要是用来识别网站的 robots.txt文件，然后判断哪些网站可以爬，哪些网站不可以爬，它其实用得比较少。

2. Handle类：

当需要实现高级的功能时，使用Handle

1 import http.cookiejar,urllib.request 2 3 filename = \'cookies.txt\' 4 #cookie = http.cookiejar.CookieJar 5 #cookie = http.cookiejar.MozillaCookieJar(filename) 6 cookie = http.cookiejar.LWPCookieJar(filename) 7 cookie.load(\'cookies.txt\', ignore_discard=True, ignore_expires=True) 8 handle = urllib.request.HTTPCookieProcessor(cookie) 9 opener = urllib.request.build_opener(handle) 10 response = opener.open(\'\') 11 #for item in cookie: 12 # print(item.name+"="+item.value) 13 14 #cookie.save(ignore_discard=True, ignore_expires=True) 15 print(response.read().decode(\'utf-8\'))

转载注明出处：https://www.heiqu.com/zwxwxz.html

《python3网络爬虫开发实战》--基本库的使用

相关推荐