爬虫系列 一次采集.NET WebForm网站的坎坷历程

今天接到一个活,需要统计人员的工号信息,由于种种原因不能直接连数据库 [无奈]、[无奈]、[无奈]。采取迂回方案,写个工具自动登录网站,采集用户信息。

这也不是第一次采集ASP.NET网站,以前采集的时候就知道,这种网站采集比较麻烦,尤其是WebForm的ASP.NET 网站,那叫一个费劲。

喜欢现在流行的Restful模式的网站,数据接口采集那才叫舒服。

闲话少说,开干

工作量不大,HTTP纯手写

先准备下一个GET/POST预备使用

    public static string Get(string url, Action<string> SuccessCallback, Action<string> FailCallback) { HttpWebRequest req = WebRequest.Create(url) as HttpWebRequest; req.Method = "GET"; req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"; req.Accept = "*/*"; req.KeepAlive = true; req.ServicePoint.ConnectionLimit = int.MaxValue; req.ServicePoint.Expect100Continue = false; req.CookieContainer = sznyCookie; #静态变量 req.Credentials = System.Net.CredentialCache.DefaultCredentials; string msg = ""; using (HttpWebResponse rsp = req.GetResponse() as HttpWebResponse) { using (StreamReader reader = new StreamReader(rsp.GetResponseStream())) { msg = reader.ReadToEnd(); } } return msg; }     public static string Post(string url, Dictionary<string, string> dicParms, Action<string> SuccessCallback, Action<string> FailCallback) { StringBuilder data = new StringBuilder(); foreach (var kv in dicParms) { if (kv.Key.StartsWith("header")) continue; data.Append($"&{Common.UrlEncode( kv.Key,Encoding.UTF8)}={ Common.UrlEncode( kv.Value,Encoding.UTF8)}"); } if (data.Length > 0) data.Remove(0, 1); HttpWebRequest req = WebRequest.Create(url) as HttpWebRequest; req.Method = "POST"; req.KeepAlive = true; req.CookieContainer = sznyCookie; req.Connection = "KeepAlive"; req.KeepAlive = true; req.ContentType = "application/x-www-form-urlencoded"; req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"; req.Referer = url; if (dicParms.ContainsKey("ScriptManager1")) { req.Headers.Add("X-MicrosoftAjax", "Delta=true"); req.Headers.Add("X-Requested-With", "XMLHttpRequest"); req.ContentType = "application/x-www-form-urlencoded; charset=UTF-8"; req.Accept = "*/*"; } req.Headers.Add("Cache-Control", "no-cache"); req.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"; req.ServicePoint.ConnectionLimit = int.MaxValue; req.ServicePoint.Expect100Continue = false; req.AllowAutoRedirect = true; req.Credentials = System.Net.CredentialCache.DefaultCredentials; byte[] buffer = Encoding.UTF8.GetBytes(data.ToString()); using (Stream reqStream = req.GetRequestStream()) { reqStream.Write(buffer, 0, buffer.Length); } string msg = ""; using (HttpWebResponse rsp = req.GetResponse() as HttpWebResponse) { using (StreamReader reader = new StreamReader(rsp.GetResponseStream())) { msg = reader.ReadToEnd(); if (msg.Contains("images/dl.jpg") || msg.Contains("pageRedirect||%2flogin.aspx")) { //登录失败 if (FailCallback != null) FailCallback(msg); } else { if (SuccessCallback!=null) SuccessCallback(msg); } } } return msg; }

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wpfxjp.html