vbs或asp采集文章时网页编码问题(2)

日期：2020-05-13 栏目：程序人生浏览：次

        For i = 0 To Matches.Count -1
            If Matches(i).Value<>"" Then RetStr = RetStr & Matches(i).SubMatches(0) & "柳永法"
        Next
    Else
        For Each oMatch in Matches
            If oMatch.Value<>"" Then RetStr = RetStr & oMatch.Value & "柳永法"
        Next
    End If
    getContents = Split(RetStr, "柳永法")
End Function

Function getHTTPPage(url)
    On Error Resume Next
    Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
    xmlhttp.Open "Get", url, False
    xmlhttp.Send
    If xmlhttp.Status<>200 Then Exit Function
    GetBody = xmlhttp.ResponseBody
    '柳永法(www.yongfa365.com)在此的思路是,先根据返回的字符串找，找文件头，如果还没有的话就用GB2312,一般都能直接匹配出编码。
    '在返回的字符串里看，虽然中文是乱码，但不影响我们取其编码，
    GetCodePage = getContents("charset=[""']*([^"",']+)", xmlhttp.ResponseText , True)(0)
    '在头文件里看编码
     If Len(GetCodePage)<3 Then GetCodePage = getContents("charset=[""']*([^"",']+)", xmlhttp.getResponseHeader("Content-Type") , True)(0)
    If Len(GetCodePage)<3 Then GetCodePage = "gb2312"
    Set xmlhttp = Nothing
    '下边这句在正式使用时要屏蔽掉
    WScript.Echo url & "-->" & GetCodePage
    getHTTPPage = BytesToBstr(GetBody, GetCodePage)
End Function


Function BytesToBstr(Body, Cset)
    On Error Resume Next

转载注明出处：http://www.heiqu.com/2526.html

vbs或asp采集文章时网页编码问题(2)

相关推荐