Elasticsearch Query DSL 整理总结（四）—— Multi Match Query (2)

日期：2021-05-10 栏目：程序人生浏览：次

虽然文档 1 中的 subject 和 message 字段中都含有 food 能够匹配到，但由于使用的 dis_max 查询，只会将它们单独计算得分，而文档 2 中只有 message 匹配到，但是它的分数更高。由此比较，文档 2 的得分当然比文档 1 高，而这就是 best_fields 类型的计算方式。

best_fields

上个小节中的 dis_max 查询则直接就可以用

best_fields 在查询多个词条最佳匹配度方面是最有用的，它和 dis_max 方式是等价的。例如，上节中的 dis_max 查询就可以写成下面的形式。而且 best_fields 类型是 multi_match 查询时的默认类型。

GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "chinese food", "fields": ["subject", "message"] } } }

按照这种方式，只是最佳匹配语句起作用，其他语句对分数一点贡献度也没有了。这样太纯粹了似乎也不太好。有没有折中的办法，其他语句也参与评分，只不过要打下折扣，让它们的贡献度不那么高？嗯，还真有，这就是 tie_breaker 参数。

维权使者 tie_breaker

感觉 tie_breaker 参数就是为了维护其他语句的权利而生的，先了解下它的评分方式：

先由 best_fields type 获得最佳匹配语句的评分 _score 。

将其他匹配语句的评分结果与 tie_breaker 相乘。

对以上评分求和并规范化。

有了 tie_breaker ，世界变得更美好了，在计算时会考虑所有匹配语句，但tie_breaker 并没有喧宾夺主，最佳匹配语句依然是老大，但其他语句在 tie_breaker 的帮助下也有了一定的话语权。

将上节查询语句添加一个 tie_breaker 参数才来看结果。

GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "chinese food", "fields": ["subject", "message"], "tie_breaker": 0.3 } } }

结果如下：

"hits": { "total": 2, "max_score": 0.5753642, "hits": [ { "_index": "multimatchtest", "_type": "multimatch_test", "_id": "2", "_score": 0.5753642, "_source": { "subject": "blabla blala", "message": "I like chinese food" } }, { "_index": "multimatchtest", "_type": "multimatch_test", "_id": "1", "_score": 0.37398672, "_source": { "subject": "food is delicious!", "message": "cook food" } } ] }

和上节的文档 1 的评分对比，由于文档 1 中 message 字段和 subject 都只有一个 "food" 单词，它们的评分是一样的，且 tie_breaker 为 0.3，那就相当于 0.2876821x1.3=0.37398672 ，正好与结果吻合。

开篇时我们就说到， multi-match 查询是构建在 match 查询基础上的，因此 match 查询的参数，multi-match 都可以使用，可以参考我之前写的 match query 文档来查看。

most_fields

most_fields 主要用在多个字段都包含相同的文本的场合，会将所有字段的评分合并起来。

GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "multimatch", "fields": ["subject", "message"], "type": "most_fields" } } } phrase 和 phrase_prefix

phrase 和 phrase_prefix 类型的行为与 best_fields 参数类似，区别就是

phrase 使用 match_phrase & dis_max 实现

phrase_prefix 使用 match_phrase_prefix & dis_max 实现

best_fields 使用 match & dis_max 实现

GET multimatchtest/multimatch_test/_search { "query": { "multi_match": { "query": "this is", "fields": ["subject", "message"], "type": "phrase" } } }

上面查询等价于

GET multimatchtest/multimatch_test/_search { "query": { "dis_max": { "queries": [{ "match_phrase": { "subject": "this is" } }, { "match_phrase": { "message": "this is" } }] } } } cross_fields

像 most_fields 和 best_fields 类型都是词中心式(field-centric)，什么意思呢？举个例子，假如要查询 "blabla like" 字符串，并且指定 operator 为 and ，则会在同一个字段内搜索整个字符串，只有一个字段内都有这两个词，才匹配上。

GET multimatchtest/_search { "query": { "multi_match": { "query": "blabla like", "operator": "and", "fields": [ "subject", "message"], "type": "best_fields" } } }

而 cross_fields 类型则是字段中心式的，例如，要查询 "blabla like" 字符串，查询字段为 "subject" 和 "message"。此时首先分析查询字符串并生成一个词列表，然后从所有字段中依次搜索每个词，只要查询到，就算匹配上。

GET multimatchtest/_search { "query": { "multi_match": { "query": "blabla like", "operator": "and", "fields": [ "subject", "message"], "type": "cross_fields" } } } 评分

那么 cross_fields 的评分是怎么完成的呢？

cross_fields 也有 tie_breaker 配置，就是由它来控制 cross_fields 的评分。tie_breaker 的取值及意义如下：

0.0 获取最佳字段的分数为最终分数，默认值

1.0 将多个字段的分数合并

转载注明出处：https://www.heiqu.com/wsppyx.html

Elasticsearch Query DSL 整理总结（四）—— Multi Match Query (2)

相关推荐