character filter :字符过滤器,对文本进行字符过滤处理,如处理文本中的html标签字符。处理完后再交给tokenizer进行分词。一个analyzer中可包含0个或多个字符过滤器,多个按配置顺序依次进行处理。
tokenizer:分词器,对文本进行分词。一个analyzer必需且只可包含一个tokenizer。
token filter:词项过滤器,对tokenizer分出的词进行过滤处理。如转小写、停用词处理、同义词处理。一个analyzer可包含0个或多个词项过滤器,按配置顺序进行过滤。
测试分词器 POST _analyze { "tokenizer": "standard", "char_filter": [ "html_strip" ], "filter": [ "lowercase", "asciifolding" ], "text": "Is this déja vu?" } POST _analyze { "analyzer": "ik_smart", "text": "微知" } 内置的分析器Standard Analyzer
Simple Analyzer
Whitespace Analyzer
Stop Analyzer
Keyword Analyzer
Pattern Analyzer
Language Analyzers
Fingerprint Analyzer
Custom analyzers
内建的character filterHTML Strip Character Filter
html_strip :过滤html标签,解码HTML entities like &.
Mapping Character Filter
mapping :用指定的字符串替换文本中的某字符串。
Pattern Replace Character Filter
pattern_replace :进行正则表达式替换。
Standard Tokenizer
Letter Tokenizer
Lowercase Tokenizer
Whitespace Tokenizer
UAX URL Email Tokenizer
Classic Tokenizer
Thai Tokenizer
NGram Tokenizer
Edge NGram Tokenizer
Keyword Tokenizer
Pattern Tokenizer
Simple Pattern Tokenizer
Simple Pattern Split Tokenizer
Path Hierarchy Tokenizer
示例
PUT customer { "mappings": { "_doc": { "properties": { "customerName": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "ik_smart" }, "companyId": { "type": "text" } } } } } POST /customer/_doc/_bulk { "index": { "_id": 1 }} { "companyId": "55", "customerName": "微知(上海)服务外包有限公司" } { "index": { "_id": 2 }} { "companyId": "55", "customerName": "上海微盟" } { "index": { "_id": 3 }} { "companyId": "55", "customerName": "上海知道广告有限公司" } { "index": { "_id": 4 }} { "companyId": "55", "customerName": "微鲸科技有限公司" } { "index": { "_id": 5}} { "companyId": "55", "customerName": "北京微尘大业电子商务" } { "index": { "_id": 6}} { "companyId": "55", "customerName": "福建微冲企业咨询有限公司" } { "index": { "_id": 7}} { "companyId": "55", "customerName": "上海知盛企业管理咨询有限公司" } GET /customer/_doc/_search { "query": { "match": { "customerName": "知道" } } } GET /customer/_doc/_search { "query": { "match": { "customerName": "微知" } } } 更多学习资料https://www.cnblogs.com/leeSmall/category/1210814.html
https://blog.csdn.net/ricky110/article/category/7336900
官方的reference:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
标题 链接elasticsearch系列一:elasticsearch(ES简介、安装&配置、集成Ikanalyzer) https://www.cnblogs.com/leeSmall/p/9189078.html
elasticsearch系列二:索引详解(快速入门、索引管理、映射详解、索引别名) https://www.cnblogs.com/leeSmall/p/9193476.html
elasticsearch系列三:索引详解(分词器、文档管理、路由详解(集群)) https://www.cnblogs.com/leeSmall/p/9195782.html
elasticsearch系列四:搜索详解(搜索API、Query DSL) https://www.cnblogs.com/leeSmall/p/9206641.html
elasticsearch系列五:搜索详解(查询建议介绍、Suggester 介绍) https://www.cnblogs.com/leeSmall/p/9206646.html
elasticsearch系列六:聚合分析(聚合分析简介、指标聚合、桶聚合) https://www.cnblogs.com/leeSmall/p/9215909.html
elasticsearch系列七:ES Java客户端-Elasticsearch Java client https://www.cnblogs.com/leeSmall/p/9218779.html
elasticsearch系列八:ES 集群管理(集群规划、集群搭建、集群管理) https://www.cnblogs.com/leeSmall/p/9220535.html