es倒排索引、索引操作、文档操作

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

es倒排索引、索引操作、⽂档操作es倒排索引、索引操作、⽂档操作
1. 倒排索引
反向索引，把⽂章进⾏分词建⽴索引
2. 索引操作
类似于数据库的增删改查操作
1. 新增索引
PUT sy # sy就是索引名字
# 可以不写
{
"settings": {
"index":{
"number_of_shards":5,
"number_of_replicas":1
}
}
}
# number_of_shards 每个索引的主分⽚数默认值是5，配置后创建不能修改
# number_of_replicas 每个主分⽚的副本数，默认是1，可以修改
2. 查询索引
GET sy/_settings
# 返回结果
{
"sy" : {
"settings" : {
"index" : {
"creation_date" : "1588822389842",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "NBXIeVdHQ26vCuPn8_6uew",
"version" : {
"created" : "7050099"
},
"provided_name" : "sy"
}
}
}
}
3. 更新索引
PUT sy/_settings
{
"number_of_replicas": 2
}
4. 删除索引
DELETE sy
3. ⽂档基本增删改查
3.1 ⽂档插⼊
# 新增⼀条⽂档
PUT test02/_doc/3 # 7.x后⽂档类型默认只有_doc。

id 3可以不指定，es默认会⽣成⼀个uuid作为id
{
"name":"李四",
"age":"22",
"dec":"合法公民",
"tags":["交友","旅游","爱吃鸡"]
}
# python操作⽅式
from elasticsearch import Elasticsearch
#获取es连接
def get_es_engine(host,port,user=None,pwd=None):
if user and pwd:
es = Elasticsearch(host+':'+str(port), http_auth=(user, pwd), maxsize=15) # 有XPACK安全认证的ES集群
else:
es = Elasticsearch(host+':'+str(port), maxsize=15)#⽆安全认证的集群
return es
data_ins={
"name" : "Rick.Wang",
"company" : "CSDN",
"age" : "10",
"email" : "wangyikai@"
}
es.index(index='test02',body=data_ins,doc_type='_doc',id=8) # 此处id可以不指定
es.indices.refresh(index="test02") # es插⼊后不会⽴马查询到需要默认1s后从内存刷进⽂件系统中才能查到，此操作保证插⼊后⽴马刷⼊⽂件系统中，⽴马可以查询到
批量插⼊
#JSON数据不能有回车换⾏
batch_data= [
{"index": {}},
{"name": "王义凯", "age": 11, "email":"wangyikai1@", "company":"CSDN1"},
{"index": {}},
{"name": "wang,yi-kai", "age": 22, "email":"wangyikai2@", "company":"CSDN2"},
{"index": {}},
{"name": "Rick.Wang", "age": 33, "email":"wangyikai3@", "company":"CSDN3"},
{"index": {}},
{"name": "义凯王", "age": 44, "email":"wangyikai4@", "company":"CSDN4"},
]
es.bulk(index='test02',doc_type='_doc',body=batch_data)
3.2 删除⽂档
DELETE test02/_doc/3
es.delete(index='test02',doc_type='_doc',id=8)
# 批量删除
bd= {'query': {'bool': {'should': [{'match_phrase_prefix': {'email': 'yikai'}}]}}}
es.delete_by_query(index='test2',body=bd)
3.3 更新⽂档
覆盖式（原来的字段就没有了）(⼀般不⽤）
覆盖式（原来的字段就没有了）(⼀般不⽤）
PUT test02/_doc/3
{
"name":"李四",
"tags":["旅游","爱吃鸡"]
}
使⽤增量式
增量式（只修改某个字段）⼀定要注意包在doc中
POST test02/_doc/3/_update
{
"doc":{
"age": 30
}
}
#指定ID进⾏更新单条记录
data={
"doc":{
"age":77
}
}
es.update(index='test02',id=8,doc_type='_doc',body=data)
es.indices.refresh(index="test02") # es插⼊或更新后不会⽴马查询到需要默认1s后从内存刷进⽂件系统中才能查到，此操作保证插⼊更新后⽴马刷⼊⽂件系统中，⽴马可以查询到批量更新
data_all={
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.age = params.age;",
"lang": "painless",
"params" : {
"age": "88"
}
}
}
es.update_by_query(index='test02',body=data_all)
3.4 查询⽂档
es默认只会返回⼀万条数据，由于ElasticSearch的默认深度翻页机制的限制造成的，ES为了避免⽤户的过⼤分页请求造成ES服务所在机器内存
溢出，默认对深度分页的条数进⾏了限制，默认的最⼤条数是10000条
解决⽅案：增⼤1万条数量限制
# 对索引设置最⼤输出数
PUT /索引名/_settings
{"index": {"max_result_window" : 10000000}}
# 查询时使⽤"track_total_hits": true解除10000条限制
GET /索引名/_search
{
"track_total_hits": true,
"query":{
"match_all":{
}
}
}
1、匹配查询
mach查询：有⼀个匹配都能查出来，查出来后他的得分_score不⼀样：1.匹配的词越多越⾼。

2.当匹配的词相同时，该词在整个字段值中所站⽐例越⾼得分越⾼
GET test02/user/_search
{
"query": {
"match": {
"name": "伤狂神"
}
}
}
精确查询：term/terms或者match查询中设置and
term查询主要⽤于精确值匹配，这些精确值可能是数字、时间、布尔或者那些未分词（keyword）的字符串
POST test02/_doc/_search
{
"query":{
"term":{
"age":20
}
}
}
terms查询跟 term 有点类似，但 terms 允许指定多个匹配条件。

某个字段可以指定了多个值
POST test02/_doc/_search
{
"query":{
"terms":{
"age":[
20,
21
]
}
}
}
text类型使⽤match查询，使⽤and进⾏关联，不分词查询
GET test02/_search
{
"query": {
"match": {
"name": {
"query": "伤感王五",
"operator": "and"
}
}
}
}
text类型使⽤keywords，不分词查询
GET test02/_search
{
"query": {
"match": {
"name.keyword": "伤感王五"
}
}
}
match查询⽀持minimum_should_match最⼩匹配参数，这让我们可以指定必须匹配的词项数⽤来表⽰⼀个⽂档是否相关。

我们可以将其设置为某个
具体数字，更常⽤的做法是将其设置为⼀个百分数，因为我们⽆法控制⽤户搜索时输⼊的单词数量：GET test02/_search
{
"query": {
"match": {
"name": {
"query": "伤感DJ舞曲",
"minimum_should_match": "70%"//匹配度
}
}
}
}
# 匹配算法：这⾥伤感DJ舞曲可划分为3个词，3*70% 约等于2。

所以只要包含2个词条就算满⾜条件了。

2、过滤出只需要的字段输出
GET test02/user/_search
{
"query": {
"match": {
"name": "学狂神"
}
},
"_source": ["name"]
}
3、排序
GET test02/user/_search
{
"query": {
"match": {
"name": "学狂神"
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
4、分页查询
GET test02/user/_search
{
"query": {
"match": {
"name": "学狂神"
}
},
"from": 0,//从哪开始
"size": 2//显⽰⼏条
}
5、bool查询
must:必须都满⾜，相当于and
GET test02/user/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "狂神"
}
},
{
"match": {
"age":"21"
}
}
]
}
}
}
should:只需满⾜其⼀，相当于or GET test02/user/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "狂神"
}
},
{
"match": {
"age":"21"
}
}
]
}
}
}
must_not:必须都不满⾜，相当于not GET test02/user/_search
{
"query": {
"bool": {
"must_not": [
{
"match": {
"name": "狂神"
}
},
{
"match": {
"age":"21"
}
}
]
}
}
}
6、结果过滤
GET test02/user/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "狂神"
}
}
],
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 22
}
}
}
}
}
}
{
"query": {
"bool": {
"filter": [
{
"match": {"name": "狂神"} },
{
"range": {
"age": {
"gte": 18,
"lte": 30
}
}
}
]
}
}
}
7、模糊查询
fuzzy查询是term查询的模糊等价。

它允许⽤户搜索词条与实际词条的拼写出现偏差，但是偏差的编辑距离不得超过2
GET test01/_search
{
"query": {
"fuzzy": {
"dec": {
"value": "MAXDjSO",
"fuzziness": 1
}
}
}
}
8、聚合查询
aggs可以使⽤分组group_by，最⼤值max，最⼩值min，平均值avg，求和sum，统计stats
AVG求平均值，最⼤值max，最⼩值min，平均值avg，求和sum，统计stats同样的使⽤⽅法
{
"size": 0,
"aggs": {
"return_avg_balance": { # return_avg_balance 返回的字段名，⾃定义
"avg": { # avg是求平均值
"field": "balance" # balance是对⽂档中具体某个字段名称求平均
}
}
}
}
aggs结合分组查询
{
"size": 0, # 查询全部，但是不输出hits
"query": {
"match_all": {}
},
"aggs": { # 输出aggregations
"group_technics_name": { # ⾃定义分组后的名称
"terms": {
"field": "technics_name" # 根据technics_name分组
}
}
}
}
top_hits：根据query在聚合的基础上返回最相关的n个匹配结果，配置项，size(返回多少条，默认3条)，from(偏移)，sort(排序，默认根据score) aggs结合分组查询，并通过top_hits输出每个组下相关的结果
{
"size": 0, # 查询全部，但是不输出hits
"query": {
"match_all": {}
},
"aggs": { # 输出aggregations
"group_technics_name": { # ⾃定义分组后的名称
"terms": {
"field": "technics_name", # 根据technics_name分组
},
"aggs": {
"top_technics_name": { # ⾃定义输出时的名字
"top_hits": { # 分组聚合后返回多少个相关的值，使⽤top_hits
"size": 1000
}
}
}
}
}
}。