{
"tokens": [
{
"token": "联想",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "全球",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
},
{
"token": "最大",
"start_offset": 5,
"end_offset": 7,
"type": "CN_WORD",
"position": 2
},
{
"token": "笔记本",
"start_offset": 8,
"end_offset": 11,
"type": "CN_WORD",
"position": 3
},
{
"token": "笔记",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 4
},
{
"token": "笔",
"start_offset": 8,
"end_offset": 9,
"type": "CN_WORD",
"position": 5
},
{
"token": "记",
"start_offset": 9,
"end_offset": 10,
"type": "CN_CHAR",
"position": 6
},
{
"token": "本厂",
"start_offset": 10,
"end_offset": 12,
"type": "CN_WORD",
"position": 7
},
{
"token": "厂商",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 8
}
]
}
从中可以看出,两个分词器分词的结果还是有区别的。
扩展词库,在config\ik\custom下在mydict.dic中增加需要的词组,然后重启Elasticsearch,需要注意的是文件编码是UTF-8 无BOM格式编码。
比如增加了赛克蓝德单词。然后再次查询:
请求:POST :9200/_analyze/
参数:
{
"analyzer": "ik",
"text": "赛克蓝德是一家数据安全公司"
}
返回结果:
{
"tokens": [
{
"token": "赛克蓝德",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 0
},
{
"token": "克",
"start_offset": 1,
"end_offset": 2,
"type": "CN_WORD",
"position": 1
},
{
"token": "蓝",
"start_offset": 2,
"end_offset": 3,
"type": "CN_WORD",
"position": 2
},
{
"token": "德",
"start_offset": 3,
"end_offset": 4,
"type": "CN_CHAR",
"position": 3
},
{
"token": "一家",
"start_offset": 5,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
{
"token": "一",
"start_offset": 5,
"end_offset": 6,
"type": "TYPE_CNUM",
"position": 5
},
{
"token": "家",
"start_offset": 6,
"end_offset": 7,
"type": "COUNT",
"position": 6
},
{
"token": "数据",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 7
},
{
"token": "安全",
"start_offset": 9,
"end_offset": 11,
"type": "CN_WORD",
"position": 8
},
{
"token": "公司",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 9
}
]
}
从上面的结果可以看出已经支持赛克蓝德单词了。