Python中的编码问题（encoding与decode、str与bytes）(2)

日期：2020-06-10 栏目：程序人生浏览：次

s1 = '你好，Linux公社' #如果是以‘w’的方式写入，写入前一定要进行encoding，否则会报错 with open('linuxidc.txt','w',encoding='utf-8') as f1: f1.write(s1) s2 = s1.encode("utf-8")#转换为bytes的形式 #这时候写入方式一定要是‘wb’，且一定不能加encoding参数 with open('linuxidc.com.txt','wb') as f2: f2.write(s2)

Python中的编码问题（encoding与decode、str与bytes）

　　有的人会问，我在系统里面用文本编辑器打开以bytes形式写入的2.txt文件，发现里面显示的是‘你好，Linux公社’，而不是b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8cLinux\xe5\x85\xac\xe7\xa4\xbe'，因为文本文档打开linuxidc.com.txt时，系统会用合适的编码将其显示为对应的符号，然后才给你看到。

5 网页编码

　　网页编码和文件编码方法差不多，如下urlopen下载下来的网页read()且用decoding(‘utf-8’)解码，那就必须以‘w’的方式写入文件。如果只是read()而不用encoding(‘utf-8’)进行编码，一定要以‘wb’方式写入：
　　以‘w’方式写入时：

response= url_open('https://www.linuxidc.com/Linux/2018-12/155956.htm' ,timeout=5 )#自定义的一个网页下载函数 #此处以UTF-8方式进行解码，解码后的数据以unicode的方式存储在html中 html = response.read().decode('UTF-8') print(type(html))#输出结果：<> #这时写入方式一定要加encoding,以encoding # 即UTF-8的方式对二进制数据进行编码才能写入 with open('linuxidc.html.txt',"w" , encoding='UTF-8') as f: f.write(html)

　　以‘wb’方式写入：

response= url_open('https://www.linuxidc.com/Linux/2018-12/155956.htm' ,timeout=5 ) html = response.read()#此处不需要进行解码，下载下来 print(type(html))#输出结果：<> with open('linuxidc.html.txt',"wb" ) as f: f.write(html)

　　如果要在Python3中，对urlopen下载下来的网页进行字符操作（例如正则匹配、lxml提取），就必须decode成Unicode。

Linux公社的RSS地址：https://www.linuxidc.com/rssFeed.aspx

转载注明出处：https://www.heiqu.com/2e3fb4801c09dfbd1ec4ffe5f09ddebd.html

Python中的编码问题（encoding与decode、str与bytes）(2)

相关推荐