sgq0085

浏览: 430398 次
性别:
来自: 吉林→上海

最近访客更多访客>>

sinkiangscorpio

yunzhu

angle_hxz

zenmash

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2.查询

博客分类：

ElasticSearch

elasticsearch

请求体查询

空查询

GET /index_2014*/type1,type2/_search
{
  "from": 30,
  "size": 10
}

只用一个查询字符串，你就可以在一个、多个或者 _all 索引库（indices）和一个、多个或者所有types中查询，{} 是一个空的请求体，在功能上等价于使用 "query": {"match_all": {}} 查询。

查询表达式

一个查询语句的典型结构,当不包含FIELD_NAME层时，是对全域进行检索

{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}

合并查询语句

查询语句(Query clauses) 就像一些简单的组合块，这些组合块可以彼此之间合并组成更复杂的查询。这些语句可以是如下形式：

(1)叶子语句（Leaf clauses） (就像 match 语句) 被用于将查询字符串和一个字段（或者多个字段）对比。

(2)复合(Compound) 语句主要用于合并其它查询语句。比如，一个 bool 语句允许在你需要的时候组合其它语句，无论是 must 匹配、 must_not 匹配还是 should 匹配，同时它可以包含不评分的过滤器（filters）

{
    "bool": {
        "must":     { "match": { "tweet": "elasticsearch" }},
        "must_not": { "match": { "name":  "mary" }},
        "should":   { "match": { "tweet": "full text" }},
        "filter":   { "range": { "age" : { "gt" : 30 }} }
    }
}

查询与过滤

当使用于 查询情况 时，查询就变成了一个“评分”的查询。一个评分查询计算每一个文档与此查询的 _相关程度_，同时将这个相关程度分配给表示相关性的字段 `_score`，并且按照相关性对匹配到的文档进行排序。这种相关性的概念是非常适合全文搜索的情况，因为全文搜索几乎没有完全 “正确” 的答案。

通常的规则是，使用查询（query）语句来进行全文搜索或者其它任何需要影响 相关性得分 的搜索。除此以外的情况都使用过滤（filters)。

最重要的查询

match_all 查询

match_all 查询简单的匹配所有文档。在没有指定查询方式时，它是默认的查询：

{ "match_all": {}}

match查询

如果你在一个全文字段上使用 match 查询，在执行查询前，它将用正确的分析器去分析查询字符串，如果在一个精确值的字段上使用它，例如数字、日期、布尔或者一个 not_analyzed 字符串字段，那么它将会精确匹配给定的值。

查询时间最好带上时区，或插入数据时带上时区2014-09-01T00:00:000+0800

{ "match": { "age":    26           }}
{ "match": { "date":   "2014-09-01" }}
{ "match": { "date":   "now-8h" }}
{ "match": { "date":   "2014-09-01T00:00:00.000+0800" }}
{ "match": { "public": true         }}
{ "match": { "tag":    "full_text"  }}

multi_match 查询

multi_match 查询可以在多个字段上执行相同的 match 查询：

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

range 查询

range 查询找出那些落在指定区间内的数字或者时间，同时支持日期处理比如"gt" : "now-1h"（最近一小时），"lt": "now-5d"（5天前）。

gt 大于

gte 大于等于

lt 小于

lte 小于等于

{
    "range": [
        "age": {
            "gte":  20,
            "lt":   30
        },
        "timestamp": {
            "gt" : "now-1h",  大于今天
            "lt" : "2014-05-07T00:00:00+0800||+1d" // 小于5月8日
        }
    ]
}

term 查询

term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些 not_analyzed 的字符串，term 查询对于输入的文本不分析，所以它将给定的值进行精确查询。

相当于SQL中 T.A=B

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

terms 查询

terms 查询和 term 查询一样，但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件，和 term 查询一样，terms 查询对于输入的文本不分析。它查询那些精确匹配的值（包括在大小写、重音、空格等方面的差异）。

相当于SQL中IN，只要结果中有一个值在给定的值范围内就匹配

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

exists 查询和 missing 查询

exists 查询和 missing 查询被用于查找那些指定字段中有值 (exists) 或无值 (missing) 的文档。这与SQL中的 IS_NULL (missing) 和 NOT IS NULL (exists) 在本质上具有共性：

{
    "exists":   {
        "field":    "title"
    }
}

组合多查询

bool 查询来实现你的需求。这种查询将多查询组合在一起，成为用户自己想要的布尔查询。它接收以下参数：

must 相当于SQL中AND，文档必须匹配这些条件才能被包含进来。

must_not 相当于SQL中NOT或<>，文档必须不匹配这些条件才能被包含进来。

should 相当于SQL中OR，如果满足这些语句中的任意语句，将增加 _score ，否则，无任何影响。它们主要用于修正每个文档的相关性得分。

filter 必须匹配，但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

{
    "query": {
        "bool": {
            "must": {
                "match": { "title": "how to make millions" }
            },
            "must_not": {
                "match": { "tag": "spam" }
            },
            "should": [
                { "match": { "tag": "starred" } }
            ],
            "filter": [
                { "match": { "province": "19" } },
                { "match": { "city": "1657" } }
            ]
        }
    }
}

增加带过滤器（filtering）的查询

通过将 range 查询移到 filter 语句中，我们将它转成不评分的查询，将不再影响文档的相关性排名。将 bool 查询包裹在 filter 语句中，我们可以在过滤标准中增加布尔逻辑。

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "bool": { 
              "must": [
                  { "range": { "date": { "gte": "2014-01-01" }}},
                  { "range": { "price": { "lte": 29.99 }}}
              ],
              "must_not": [
                  { "term": { "category": "ebooks" }}
              ]
          }
        }
    }
}

constant_score 查询

它将一个不变的常量评分应用于所有匹配的文档。它被经常用于你只需要执行一个 filter 而没有其它查询（例如，评分查询）的情况下。可以使用它来取代只有 filter 语句的 bool 查询。在性能上是完全相同的，但对于提高查询简洁性和清晰度有很大帮助。

GET /_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "user_id" : 1
                }
            }
        }
    }
}

term 查询被放置在 constant_score 中，转成不评分的 filter。这种方式可以用来取代只有 filter 语句的 bool 查询。

filtered查询

与constant_score作用类似，但是可以嵌套一个组合过滤器

GET /my_store/products/_search
{
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}}, 
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
              ],
              "must_not" : {
                 "term" : {"price" : 30} 
              }
           }
         }
      }
   }
}

验证查询

通过validate-query来验证查询正确性，可选参数explain。比如下面的错误JSON，添加explain后可以在对于合法查询，使用 explain 参数将返回可读的描述，这对准确理解 Elasticsearch 是如何解析你的 query 是非常有用的。

POST /gb/tweet/_validate/query?explain 
{
   "query": {
      "tweet" : {
         "match" : "really powerful"
      }
   }
}

按照字段的值排序

在 Elasticsearch 中，相关性得分由一个浮点数进行表示，并在搜索结果中通过 _score 参数返回，默认排序是 _score 降序，其他字段字段将会默认升序排序。

GET /_search
{
    "query" : {
        "bool" : {
            "filter" : { "term" : { "user_id" : 1 }}
        }
    },
    "sort": { "date": { "order": "desc" }}
}

多级排序

假定我们想要结合使用 date 和 _score 进行查询，并且匹配的结果首先按照日期排序，然后按照相关性排序：

GET /_search
{
    "query" : {
        "bool" : {
            "must":   { "match": { "tweet": "manage text search" }},
            "filter" : { "term" : { "user_id" : 2 }}
        }
    },
    "sort": [
        { "date":   { "order": "desc" }},
        { "_score": { "order": "desc" }}
    ]
}

多值字段的排序

一种情形是字段有多个值的排序，需要记住这些值并没有固有的顺序；一个多值的字段仅仅是多个值的包装。

对于数字或日期，你可以将多值字段减为单值，这可以通过使用 min 、 max 、 avg 或是 sum 排序模式

"sort": {
    "dates": {
        "order": "asc",
        "mode":  "min"
    }
}

字符串排序与多字段

fields 参数：一个String字段，我们既希望作为not_analyzed精准的进行排序，又希望支持全文索引即analyzed，同时因为是同一个文本域，所以不想简单的储存两次相同的值浪费空间，fields参数，将一个简单映射为一个多字段映射

转化前
"tweet": {
    "type":     "string",
    "analyzer": "english"
}
转化
"tweet": {  1.tweet 主字段与之前的一样: 是一个 analyzed 全文字段。
    "type":     "string",
    "analyzer": "english",
    "fields": {
        "raw": {  2.新的 tweet.raw 子字段是 not_analyzed.
            "type":  "string",
            "index": "not_analyzed"
        }
    }
}

上例中使用 tweet 字段用于搜索，tweet.raw 字段用于排序

GET /_search
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    },
    "sort": "tweet.raw"
}

游标查询

游标查询用字段 _doc 来排序。这个指令让 Elasticsearch 仅仅从还有结果的分片返回下一批结果

GET /old_index/_search?scroll=1m 
{
    "query": { "match_all": {}},
    "sort" : ["_doc"], 
    "size":  1000
}

1m: 保持游标查询窗口一分钟。 _doc: 关键字 _doc 是最有效的排序顺序。

GET /_search/scroll
{
    "scroll": "1m", 
    "scroll_id" : "... ..."
}

再次设置游标查询窗口一分钟，并输入上次查询结果中的scroll_id

分享到：

3.聚合 | 1.ElasticSearch简单了解

2018-05-13 13:46
浏览 449
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

2.查询

请求体查询

空查询

查询表达式

合并查询语句

查询与过滤

最重要的查询

match_all 查询

match查询

multi_match 查询

range 查询

term 查询

terms 查询

exists 查询和 missing 查询

组合多查询

增加带过滤器（filtering）的查询

constant_score 查询

filtered查询

验证查询

按照字段的值排序

多级排序

多值字段的排序

字符串排序与多字段

游标查询

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

2.查询

请求体查询

空查询

查询表达式

合并查询语句

查询与过滤

最重要的查询

match_all 查询

match查询

multi_match 查询

range 查询

term 查询

terms 查询

exists 查询和 missing 查询

组合多查询

增加带过滤器（filtering）的查询

constant_score 查询

filtered查询

验证查询

按照字段的值排序

多级排序

多值字段的排序

字符串排序与多字段

游标查询

评论

发表评论

相关推荐

3.聚合

1.ElasticSearch简单了解

最近访客更多访客>>