【大数据实战】将普通文本文件导入ElasticSearch

日期：2021-06-09 栏目：程序人生浏览：次

以《刑法》文本.txt为例。

一、格式化数据

1，首先，ElasticSearch只能接收格式化的数据，所以，我们需要将文本文件转换为格式化的数据---json。

下图为未处理的文本文件。

【大数据实战】将普通文本文件导入ElasticSearch

2，这里，使用python文件操作，将文本格式化为ElasticSearch可识别的json格式。

#python 3.6 #!/usr/bin/env python # -*- coding:utf-8 -*- __author__ = 'BH8ANK' ''' 最终将输出格式改为 {"index":{"_index":"xingfa","_id":1}} {"text_entry":"犯罪的行为或者结果有一项发生在中华人民共和国领域内的，就认为是在中华人民共和国领域内犯罪。"} ''' '''读取文件 ''' a = open(r"D:\xingfa.txt", "r",encoding='utf-8') out = a.read() #print(out) TypeList = out.split('\n') #print(TypeList) lenth = len(TypeList) print(lenth) number = 1 ju_1 = '{"index":{"_index":"xingfa","_id":' ju_2 = '{"text_entry":"' # print(ju_1) for x in TypeList: res_1 = ju_1 + str(number) + '}}'+'\n' print(res_1) a = open(r"D:\out.json", "a", encoding='UTF-8') a.write(res_1) res_2 = ju_2 + x + '"}'+'\n' print(res_2) a = open(r"D:\out.json", "a", encoding='UTF-8') a.write(res_2) a.close() number+=1

转载注明出处：https://www.heiqu.com/wpsswy.html

【大数据实战】将普通文本文件导入ElasticSearch

相关推荐