Python爬虫-05:Ajax加载的动态页面内容

1. 获取AJAX加载动态页面的内容 1.1. Introduction

如果所爬取的网址是通过Ajax方式加载的,就直接抓包,拿他后面传输数据的文件

有些网页内容使用AJAX加载,只要记得,AJAX一般返回的是JSON,直接对AJAX地址进行post或get,就返回JSON数据了。

拿到JSAON,就是拿到了网页的数据

例子:

Python爬虫-05:Ajax加载的动态页面内容

这里有很多页数据,每一页的数据都是ajax加载的。如果你直接用python请求上面那个url的话,估计什么数据都拿不到

我们可以通过抓包工具查看Ajax加载的请求内容

Python爬虫-05:Ajax加载的动态页面内容

Python爬虫-05:Ajax加载的动态页面内容

POST ?op=cname HTTP/1.1 Host: Connection: keep-alive Content-Length: 53 Accept: application/json, text/javascript, */*; q=0.01 Origin: X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Referer: Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7,en-CA;q=0.6 Cookie: ASP.NET_SessionId=qxmmf43wkyy5alkxyp15pvce; KLBRSID=76cac537517c99f2fa8f912b4403b8f8|1546668159|1546667575 x-hd-token: rent-your-own-vps cname=%E4%B8%8A%E6%B5%B7&pid=&pageIndex=1&pageSize=10 cname=%E4%B8%8A%E6%B5%B7&pid=&pageIndex=1&pageSize=10 1.2. 爬取肯德基查询页面方法1: import urllib.request import urllib.parse url = \'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname\' headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"} place = input("Which palce: ") page = input("Please see which page: ") formdata = { \'cname\': place, \'pid\': \'\', \'pageIndex\': page, \'pageSize\': \'10\' } data = urllib.parse.urlencode(formdata).encode(\'utf-8\') request = urllib.request.Request(url, data = data, headers=headers) html = urllib.request.urlopen(request).read() print(html)

结果

Which palce: 上海 Please see which page: 1 b\'{"Table":[{"rowcount":410}],"Table1":[{"rownum":1,"storeName":"\xe8\x8c\x82\xe5\x90\x8d","addressDetail":"\xe5\x90\xb4\xe6\xb1\x9f\xe8\xb7\xaf269\xe5\x8f\xb72\xe5\xb1\x82","pro":"\xe5\xba\x97\xe5\x86\x85\xe5\x8f\x82\xe8\xa7\x82,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe6\x89\x8b\xe6\x9c\xba\xe7\x82\xb9\xe9\xa4\x90,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":2,"storeName":"\xe7\xbf\x94\xe5\xb7\x9d","addressDetail":"\xe5\xa6\x99\xe9\x95\x9c\xe8\xb7\xaf1118\xe5\x8f\xb7E\xe5\x8f\xb7\xe5\x95\x86\xe9\x93\xba","pro":"Wi-Fi,\xe5\xba\x97\xe5\x86\x85\xe5\x8f\x82\xe8\xa7\x82,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":3,"storeName":"\xe5\x8a\xa8\xe5\x8a\x9b\xe5\x8d\x97\xe5\xb9\xbf\xe5\x9c\xba\xef\xbc\x88\xe6\xb1\x87\xe9\x87\x91\xe5\xa5\xa5\xe7\x89\xb9\xe8\x8e\xb1\xe6\x96\xafB1\xe5\xb1\x82\xef\xbc\x89","addressDetail":"\xe7\x9f\xb3\xe9\xbe\x99\xe8\xb7\xaf750-3\xe5\x8f\xb7\xe4\xb8\x8a\xe6\xb5\xb7\xe5\x8d\x97\xe7\xab\x99\xe5\x9c\xb0\xe4\xb8\x8b\xe5\x95\x86\xe5\x9c\xba\xe5\x8d\x97\xe9\xa6\x86","pro":"\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":4,"storeName":"\xe6\xb1\x9f\xe8\x8b\x8f","addressDetail":"\xe6\xb1\x9f\xe8\x8b\x8f\xe8\xb7\xaf398\xe5\x8f\xb71\xe3\x80\x812\xe5\xb1\x82\xef\xbc\x88\xe8\x88\x9c\xe5\x85\x83\xe5\xbc\x98\xe5\x9f\xba\xe5\xa4\xa9\xe5\x9c\xb01\xe6\xa5\xbc\xef\xbc\x89","pro":"Wi-Fi,\xe5\xba\x97\xe5\x86\x85\xe5\x8f\x82\xe8\xa7\x82,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":5,"storeName":"\xe5\xa8\x81\xe5\xae\x81","addressDetail":"\xe5\xa4\xa9\xe5\xb1\xb1\xe8\xb7\xaf352\xe5\x8f\xb7101\xe5\x92\x8c201","pro":"Wi-Fi,\xe7\x82\xb9\xe5\x94\xb1\xe6\x9c\xba,\xe5\xba\x97\xe5\x86\x85\xe5\x8f\x82\xe8\xa7\x82,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe7\x94\x9f\xe6\x97\xa5\xe9\xa4\x90\xe4\xbc\x9a,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":6,"storeName":"\xe6\x80\x9d\xe8\xb4\xa4","addressDetail":"\xe6\x80\x9d\xe8\xb4\xa4\xe8\xb7\xaf778--780\xe5\x8f\xb7","pro":"Wi-Fi,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":7,"storeName":"\xe6\x83\xa0\xe4\xb9\x90","addressDetail":"\xe4\xba\xba\xe6\xb0\x91\xe8\xa5\xbf\xe8\xb7\xaf955\xe5\x8f\xb7","pro":"Wi-Fi,\xe5\xba\x97\xe5\x86\x85\xe5\x8f\x82\xe8\xa7\x82,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe7\x94\x9f\xe6\x97\xa5\xe9\xa4\x90\xe4\xbc\x9a,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":8,"storeName":"\xe6\x9f\xb3\xe5\xb7\x9e","addressDetail":"\xe6\xb2\xaa\xe9\x97\xb5\xe8\xb7\xaf9001\xe5\x8f\xb7\xe4\xb8\x8a\xe6\xb5\xb7\xe5\x8d\x97\xe7\xab\x99\xe7\xab\x99\xe5\x8e\x85\xe5\xb1\x82","pro":"\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":9,"storeName":"\xe7\x9c\x9f\xe5\x8c\x97","addressDetail":"\xe6\xa1\x83\xe6\xb5\xa6\xe8\xb7\xaf328\xe5\x8f\xb7","pro":"Wi-Fi,\xe5\xba\x97\xe5\x86\x85\xe5\x8f\x82\xe8\xa7\x82,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"},{"rownum":10,"storeName":"\xe9\xa9\xac\xe9\x99\x86\xe5\xbc\x98\xe5\x9f\xba","addressDetail":"\xe9\xa9\xac\xe9\x99\x86\xe9\x95\x87\xe6\xb2\xaa\xe5\xae\x9c\xe5\x85\xac\xe8\xb7\xaf2398/2400\xe5\x8f\xb7","pro":"Wi-Fi,\xe7\xa4\xbc\xe5\x93\x81\xe5\x8d\xa1,\xe7\x94\x9f\xe6\x97\xa5\xe9\xa4\x90\xe4\xbc\x9a,\xe6\xba\xaf\xe6\xba\x90","provinceName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82","cityName":"\xe4\xb8\x8a\xe6\xb5\xb7\xe5\xb8\x82"}]}\' 1.3. 爬取肯德基查询页面方法2: import requests page = 1 while True: url = \'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname\' data = { \'cname\': \'上海\', \'pid\': \'\', \'pageIndex\': page, \'pageSize\': \'10\' } response = requests.post(url, data=data) print(response.json()) if response.json().get(\'Table1\', \'\'): page += 1 else: break

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/zwjwyw.html