爬虫利器Puppeteer实战(2)

爬取SPA应用,并生成预渲染内容(即“SSR” 服务端渲染),通俗讲就是在页面上显示的内容我们都能获取到。下面我们就通过爬取 瓜子二手车直卖网 的车辆信息来认识它。

首先通过 axios 来试试

const axios = require('axios'); const useAxios = () => { axios.get('https://www.guazi.com/hz/buy/') .then(((result) => { console.log(result.data); })) .catch((err) => { console.log(err); }); };

结果它返回给我这个玩意,这显然不是我要的内容

爬虫利器Puppeteer实战

通过 Puppeteer 爬取

const fs = require('fs'); const puppeteer = require('puppeteer'); (async () => { const browser = await (puppeteer.launch({ executablePath: '/Users/huqiyang/Documents/project/z/chromium/Chromium.app/Contents/MacOS/Chromium', headless: true })); const page = await browser.newPage(); // 进入页面 await page.goto('https://www.guazi.com/hz/buy/'); // 获取页面标题 let title = await page.title(); console.log(title); // 获取汽车品牌 const BRANDS_INFO_SELECTOR = '.dd-all.clearfix.js-brand.js-option-hid-info'; const brands = await page.evaluate(sel => { const ulList = Array.from($(sel).find('ul li p a')); const ctn = ulList.map(v => { return v.innerText.replace(/\s/g, ''); }); return ctn; }, BRANDS_INFO_SELECTOR); console.log('汽车品牌: ', JSON.stringify(brands)); let writerStream = fs.createWriteStream('car_brands.json'); writerStream.write(JSON.stringify(brands, undefined, 2), 'UTF8'); writerStream.end(); // await bodyHandle.dispose(); // 获取车源列表 const CAR_LIST_SELECTOR = 'ul.carlist'; const carList = await page.evaluate((sel) => { const catBoxs = Array.from($(sel).find('li a')); const ctn = catBoxs.map(v => { const title = $(v).find('h2.t').text(); const subTitle = $(v).find('div.t-i').text().split('|'); return { title: title, year: subTitle[0], milemeter: subTitle[1] }; }); return ctn; }, CAR_LIST_SELECTOR); console.log(`总共${carList.length}辆汽车数据: `, JSON.stringify(carList, undefined, 2)); // 将车辆信息写入文件 writerStream = fs.createWriteStream('car_info_list.json'); writerStream.write(JSON.stringify(carList, undefined, 2), 'UTF8'); writerStream.end(); browser.close(); })();

运行结果

爬虫利器Puppeteer实战

爬虫利器Puppeteer实战

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/2b12b486c0cba99c6bd2719ca90329f9.html