_shards 我们搜索了多少shards, 成功了多少, 失败了多少, 跳过了多少. 关于shard, 简单理解为数据分片, 即一个index里的数据分成了几片,可以理解为按id进行分表。
max_score 最相关的记录(document)的分数
接下来可可以尝试带条件的查询。
分词查询查询address中带mill和lane的地址。
GET /bank/_search { "query": { "match": { "address": "mill lane" } }, "size": 2 }返回
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 19, "relation" : "eq" }, "max_score" : 9.507477, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "136", "_score" : 9.507477, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } }, { "_index" : "bank", "_type" : "_doc", "_id" : "970", "_score" : 5.4032025, "_source" : { "account_number" : 970, "balance" : 19648, "firstname" : "Forbes", "lastname" : "Wallace", "age" : 28, "gender" : "M", "address" : "990 Mill Road", "employer" : "Pheast", "email" : "forbeswallace@pheast.com", "city" : "Lopezo", "state" : "AK" } } ] } }我设置了返回2个,但实际上命中的有19个
完全匹配查询 GET /bank/_search { "query": { "match_phrase": { "address": "mill lane" } } }这时候查的完全符合的就一个了
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 9.507477, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "136", "_score" : 9.507477, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } } ] } } 多条件查询实际查询中通常是多个条件一起查询的
GET /bank/_search { "query": { "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } }bool用来合并多个查询条件
must, should, must_not是boolean查询的子语句, must, should决定相关性的score,结果默认按照score排序
must not是作为一个filter,影响查询的结果,但不影响score,只是从结果中过滤。
还可以显式地指定任意过滤器,以包括或排除基于结构化数据的文档。
比如,查询balance在20000和30000之间的。
GET /bank/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } 聚合运算group by 按照省份统计人数按sql的写法可能是
select state AS group_by_state, count(*) from tbl_bank limit 3;对应es的请求是
GET /bank/_search { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword", "size": 3 } } } }size=0是限制返回内容, 因为es会返回查询的记录, 我们只想要聚合值
aggs是聚合的语法词
group_by_state 是一个聚合结果, 名称自定义
terms 查询的字段精确匹配, 这里是需要分组的字段
state.keyword state是text类型, 字符类型需要统计和分组的,类型必须是keyword
size=3 限制group by返回的数量,这里是top3, 默认top10, 系统最大10000,可以通过修改search.max_buckets实现, 注意多个shards会产生精度问题, 后面再深入学习