Elasticsearch Query DSL 整理总结（四）—— Multi Match Query

日期：2021-05-10 栏目：程序人生浏览：次

该做的事情一定要做，决心要做的事情一定要做好

——本杰明·富兰克林

引言

最近很喜欢使用思维导图来学习总结知识点，如果你对思维导图不太了解，又非常感兴趣，请来看下这篇文章。书归正传，这次介绍下 MutiMatch, 正文之前，请先看下本文的思维导图预热下：

Multi Match

概要

multi_match 查询建立在 match 查询之上，重要的是它允许对多个字段查询。

先构建一个实例, multimatch_test 中设置了两个字段 subject 和 message , 使用 fields 参数在两个字段上都查询 multimatch ，从而得到了两个匹配文档。

PUT multimatchtest { } PUT multimatchtest/_mapping/multimatch_test { "properties": { "subject": { "type": "text" }, "message": { "type": "text" } } } PUT multimatchtest/multimatch_test/1 { "subject": "this is a multimatch test", "message": "blala blalba" } PUT multimatchtest/multimatch_test/2 { "subject": "blala blalba", "message": "this is a multimatch test" } GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "multimatch", "fields": ["subject", "message"] } } }

下面来讲解下 fields 参数的使用

fields 字段通配符

fields 字段中的值支持通配符* , 设置 mess* 依旧可以查询出 message 字段中的匹配。

GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "multimatch", "fields": ["subject", "mess*"] } } } 提升字段权重

在查询字段后使用 ^ 符号可以提高字段的权重，增加字段的分数 _score 。例如，我们想增加 subject 字段的权重。

GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "multimatch", "fields": ["subject^3", "mess*"] } } }

虽然文档 1 和文档 2 中都含有相同数量的 multimatch 词条，但可以看出，搜索结果中 subject 中含有multimatch 的分数是另一个文档的 3 倍。

"hits": { "total": 2, "max_score": 0.8630463, "hits": [ { "_index": "multimatchtest", "_type": "multimatch_test", "_id": "1", "_score": 0.8630463, "_source": { "subject": "this is a multimatch test", "message": "blala blalba" } }, { "_index": "multimatchtest", "_type": "multimatch_test", "_id": "2", "_score": 0.2876821, "_source": { "subject": "blala blalba", "message": "this is a multimatch test" } } ] } }

如果在 multimatch 查询中不指定 fields 参数，默认会将文档中的所有字段都匹配一遍。但不建议这么做，可能会出现性能问题，也没有什么意义。

multi_multich 查询的类型

multi_match 查询内部到底如何执行主要取决于它的 type 参数，这个参数的可取得值如下

best_fields 是默认类型，会将任何与查询匹配的文档作为结果返回，但是只使用最佳字段的 _score 评分作为评分结果返回。

most_fields 将任何与查询匹配的文档作为结果返回，并所有匹配字段的评分合并起来

phrase 在 fields 中的每个字段上均执行 match_phrase 查询，并将最佳字段的 _score 作为结果返回

phrase_prefix 在 fields 中的字段上均执行 match_phrase_prefix 查询，并将每个字段的分数进行合并

下面我们来依次查看写这些类型的意义和具体使用。

best_fields 类型

要搞懂 best_fields 类型，首先要了解下 dis_max 。

dis_max 分离最大化查询

dis_max 查询英文全称为 Disjunction Max Query 就是分离最大化查询的意思。

分离（Disjunction）的意思是或（or），表示把同一个文档中每个字段上的查询都分离开，分别计算出分数。

分离最大化查询（Disjunction Max Query）指的是：将任何与任一查询匹配的文档作为结果返回，但 只将最佳匹配的评分作为查询的评分结果返回

来看一个例子, 我们将上面两个文档的内容重写

PUT multimatchtest/multimatch_test/1 { "subject": "food is delicious!", "message": "cook food" } PUT multimatchtest/multimatch_test/2 { "subject": "blabla blala", "message": "I like chinese food" }

这时我们在 subject 和 message 两个字段上都查询 chinese food ，看得到什么结果？(我们先不使用 multimatch 而是 match)

GET multimatchtest/multimatch_test/_search { "query": { "dis_max": { "queries": [ { "match": { "subject": "chinese food" } }, { "match": { "message": "chinese food" } } ] } } }

而得到的结果则是

"hits": { "total": 2, "max_score": 0.5753642, "hits": [ { "_index": "multimatchtest", "_type": "multimatch_test", "_id": "2", "_score": 0.5753642, "_source": { "subject": "blabla blala", "message": "I like chinese food" } }, { "_index": "multimatchtest", "_type": "multimatch_test", "_id": "1", "_score": 0.2876821, "_source": { "subject": "food is delicious!", "message": "cook food" } } ] } }

转载注明出处：https://www.heiqu.com/wsppyx.html

Elasticsearch Query DSL 整理总结（四）—— Multi Match Query

相关推荐