【原创】elasticsearch入门 (4)

character filter :字符过滤器,对文本进行字符过滤处理,如处理文本中的html标签字符。处理完后再交给tokenizer进行分词。一个analyzer中可包含0个或多个字符过滤器,多个按配置顺序依次进行处理。

tokenizer:分词器,对文本进行分词。一个analyzer必需且只可包含一个tokenizer。

token filter:词项过滤器,对tokenizer分出的词进行过滤处理。如转小写、停用词处理、同义词处理。一个analyzer可包含0个或多个词项过滤器,按配置顺序进行过滤。

测试分词器 POST _analyze { "tokenizer": "standard", "char_filter": [ "html_strip" ], "filter": [ "lowercase", "asciifolding" ], "text": "Is this déja vu?" } POST _analyze { "analyzer": "ik_smart", "text": "微知" } 内置的分析器

Standard Analyzer

Simple Analyzer

Whitespace Analyzer

Stop Analyzer

Keyword Analyzer

Pattern Analyzer

Language Analyzers

Fingerprint Analyzer

Custom analyzers

内建的character filter

HTML Strip Character Filter
  html_strip :过滤html标签,解码HTML entities like &.

Mapping Character Filter
  mapping :用指定的字符串替换文本中的某字符串。

Pattern Replace Character Filter
  pattern_replace :进行正则表达式替换。

内建的Tokenizer

Standard Tokenizer

Letter Tokenizer

Lowercase Tokenizer

Whitespace Tokenizer

UAX URL Email Tokenizer

Classic Tokenizer

Thai Tokenizer

NGram Tokenizer

Edge NGram Tokenizer

Keyword Tokenizer

Pattern Tokenizer

Simple Pattern Tokenizer

Simple Pattern Split Tokenizer

Path Hierarchy Tokenizer

示例

PUT customer { "mappings": { "_doc": { "properties": { "customerName": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "ik_smart" }, "companyId": { "type": "text" } } } } } POST /customer/_doc/_bulk { "index": { "_id": 1 }} { "companyId": "55", "customerName": "微知(上海)服务外包有限公司" } { "index": { "_id": 2 }} { "companyId": "55", "customerName": "上海微盟" } { "index": { "_id": 3 }} { "companyId": "55", "customerName": "上海知道广告有限公司" } { "index": { "_id": 4 }} { "companyId": "55", "customerName": "微鲸科技有限公司" } { "index": { "_id": 5}} { "companyId": "55", "customerName": "北京微尘大业电子商务" } { "index": { "_id": 6}} { "companyId": "55", "customerName": "福建微冲企业咨询有限公司" } { "index": { "_id": 7}} { "companyId": "55", "customerName": "上海知盛企业管理咨询有限公司" } GET /customer/_doc/_search { "query": { "match": { "customerName": "知道" } } } GET /customer/_doc/_search { "query": { "match": { "customerName": "微知" } } } 更多学习资料

https://www.cnblogs.com/leeSmall/category/1210814.html

https://blog.csdn.net/ricky110/article/category/7336900

官方的reference:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

标题 链接
elasticsearch系列一:elasticsearch(ES简介、安装&配置、集成Ikanalyzer)   https://www.cnblogs.com/leeSmall/p/9189078.html  
elasticsearch系列二:索引详解(快速入门、索引管理、映射详解、索引别名)   https://www.cnblogs.com/leeSmall/p/9193476.html  
elasticsearch系列三:索引详解(分词器、文档管理、路由详解(集群))   https://www.cnblogs.com/leeSmall/p/9195782.html  
elasticsearch系列四:搜索详解(搜索API、Query DSL)   https://www.cnblogs.com/leeSmall/p/9206641.html  
elasticsearch系列五:搜索详解(查询建议介绍、Suggester 介绍)   https://www.cnblogs.com/leeSmall/p/9206646.html  
elasticsearch系列六:聚合分析(聚合分析简介、指标聚合、桶聚合)   https://www.cnblogs.com/leeSmall/p/9215909.html  
elasticsearch系列七:ES Java客户端-Elasticsearch Java client   https://www.cnblogs.com/leeSmall/p/9218779.html  
elasticsearch系列八:ES 集群管理(集群规划、集群搭建、集群管理)   https://www.cnblogs.com/leeSmall/p/9220535.html  

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wpgsfz.html