1.明确爬虫目的
爬虫目的需要我们明确的,没有目的的爬虫都是耍流氓!像我这次爬虫目的能不能从网页上爬下来。
2.怎么来爬?
a. 先要找到具有唯一性的标签
<li class="game-live-item" gid="1"> <a href="http://www.huya.com/baozha" class="video-info new-clickstat " target="_blank" report="{"eid":"click/position","position":"lol/0/1/1","game_id":"1","ayyuid":"17363578"}"> <img class="pic" data-original="//screenshot.msstatic.com/yysnapshot/1801cfa4fc99aabc841eb9e25fa43f15a608b02d1055?imageview/4/0/w/338/h/190/blur/1" src="//screenshot.msstatic.com/yysnapshot/1801cfa4fc99aabc841eb9e25fa43f15a608b02d1055?imageview/4/0/w/338/h/190/blur/1/format/webp" onerror="this.onerror=null; this.src='//a.msstatic.com/huya/main/assets/img/default/338x190.jpg';" alt="炸姐ADC的直播" title="炸姐ADC的直播"> <em class="tag tag-recommend">大神推荐</em> <div class="item-mask"></div> <i class="btn-link__hover_i"></i> <p class="tag-right"> <!-- 蓝光 --> <!-- 热舞 --> <!-- 存活人数 --> </p> </a> <a href="http://www.huya.com/baozha" class="title new-clickstat" report="{"eid":"click/position","position":"lol/0/1/1","game_id":"1","ayyuid":"17363578"}" title="S8定位赛开始了11-0 裁决已解决" target="_blank">S8定位赛开始了11-0 裁决已解决</a> <span class="txt"> <span class="avatar fl"> <img data-original="//huyaimg.msstatic.com/avatar/1095/83/2aa2f6905fe4382221d08b66d7cdcb_180_135.jpg" src="//huyaimg.msstatic.com/avatar/1095/83/2aa2f6905fe4382221d08b66d7cdcb_180_135.jpg" onerror="this.onerror=null; this.src='//a.msstatic.com/huya/main/assets/img/default/84x84.jpg';" alt="炸姐ADC" title="炸姐ADC"> <i class="nick" title="炸姐ADC">炸姐ADC</i> </span> <span class="num"><i class="num-icon"></i><i class="js-num">67.0万</i></span> </span> </li>