自己做采集程序(2)

日期：2020-05-13 栏目：程序人生浏览：次

然后就可以分析采集网站的网页结构，写采集程序了。
下面给个例子：



<%
server.ScriptTimeout = 1000
%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>BT采集器</title>
</head>
<body>
<form name="form1" method="post" action="get81bt.asp">
分类ID：
  <input type="text" name="cid" value="<%=request("cid")%>"><br>
开始ID：
  <input type="text" name="startid" value="<%=request("startid")%>">
  <br>
  结束ID：
  <input type="text" name="overid" value="<%=request("overid")%>">
  <br>
  分类名称：<input type="text" name="classname" value="<%=request("classname")%>">为空自动获取
  <br>
  <input name="action" type="hidden" id="action" value="getdata">
  <input type="submit" name="Submit" value="采集">
</form>
当前ID：<%=request("id")%> <br>
<%
dim action

action = Request("action")
if action = "getdata" then
cid = Request("cid")
startid = Request("startid")
overid = Request("overid")
id = Request("id")
if id = "" then id = startid

set objxhttp = new xhttp

objxhttp.URL = "http://www.81dd.com/Class/"&cid&"_"&id&".htm"
content = objxhttp.Html

if InStr(content,"网站维护中") then
call NextID
response.End()
end if

list = GetContent(content,"","",0)

Dim regEx, Match, Matches,patrn
Set regEx = New RegExp
patrn = "<a href=""../BtHtml/(.+?)"">"
regEx.Pattern = patrn
regEx.IgnoreCase = True
regEx.Global = True
Set Matches = regEx.Execute(list)
on error resume next
For Each Match in Matches

'response.write Match.Value & "<br>"
weburl = "http://www.81dd.com/BtHtml/" & regEx.Replace(Match.Value,"$1")
response.write weburl & "<br>"

转载注明出处：http://www.heiqu.com/3637.html

自己做采集程序(2)

相关推荐