Introduction > Mapping and Analysis - 中部 > 索引重建 - reindex

索引重建 - reindex

reindex 是 ES 提供的一个 api 接口，实现集群内部或跨集群跨索引同步数据。

使用场景

分片数变更：当数据量过大，而索引最初创建的分片数量不足，导致数据入库较慢的情况，此时需要扩大分片的数量，此时可以尝试使用Reindex。
mapping字段变更：当数据的mapping需要修改，但是大量的数据已经导入到索引中了，重新导入数据到新的索引太耗时；但是在ES中，一个字段的mapping在定义并且导入数据之后是不能再修改的，所以这种情况下也可以考虑尝试使用Reindex。
分词规则修改，比如使用了新的分词器或者对分词器自定义词库进行了扩展，而之前保存的数据都是按照旧的分词规则保存的，这时候必须进行索引重建。

ES提供了_reindex API，相比于先把数据从index导出来再导入新的index速度会快很多，实测速度大概是bulk导入数据的5-10倍。reindex适合做跨索引、跨集群的数据迁移。

Reindex 不会尝试设置目标索引，它不会复制源索引的设置。在运行_reindex操作之前需要设置目标索引，包括设置映射、分片计数、副本等。

reindex测试

上一章我们创建了/reviews索引，它的Mapping定义如下：

GET /reviews/_mapping
{
  "reviews": {
    "mappings": {
      "properties": {
        "author": {
          "properties": {
            "email": {
              "type": "keyword"
            },
            "first_name": {
              "type": "text"
            },
            "last_name": {
              "type": "text"
            }
          }
        },
        "content": {
          "type": "text"
        },
        "create_at": {
          "type": "date"
        },
        "product_id": {
          "type": "integer"
        },
        "rating": {
          "type": "float"
        }
      }
    }
  }
}

现在假设业务的product_id需要使用字符串表示，需要将原来的integer类型更新为keyword类型。由于Elasticsearch不支持在已有索引上更新字段的属性，此时可以使用reindex，重新创建一个索引，并把原来的数据导入过去

创建新的索引/reviews_new，注意product_id此时更新为keyword类型：

PUT /reviews_new
{
  "mappings": {
      "properties": {
        "author": {
          "properties": {
            "email": {
              "type": "keyword"
            },
            "first_name": {
              "type": "text"
            },
            "last_name": {
              "type": "text"
            }
          }
        },
        "content": {
          "type": "text"
        },
        "create_at": {
          "type": "date"
        },
        "product_id": {
          "type": "keyword"
        },
        "rating": {
          "type": "float"
        }
      }
    }
}

然后执行_reindex API：

POST /_reindex
{
  "source": {
    "index": "reviews"
  },
  "dest": {
    "index": "reviews_new"
  }
}

执行后显示成功插入的数据条数及所用时间：

_reindex背后也是先将源索引的文档取回，再插入到目标索引：

执行查询确认新的索引中成功插入了数据：

但是上面product_id中原始数据依然显示为整数格式，在reindex时可以使用script来将其转换成为字符串。

先将原来导入的数据删除掉：

POST /reviews_new/_delete_by_query
{
  "query":{
    "match_all":{}
  }
}

重新reindex，这次使用script将整数转换成字符串：

POST /_reindex
{
  "source": {
    "index": "reviews"
  },
  "dest": {
    "index": "reviews_new"
  },
  "script": {
    "source": """
    if (ctx._source.product_id != null) {
      ctx._source.product_id = ctx._source.product_id.toString(); 
    }
    """
  }
}

查看新的索引中的数据，product_id以字符串形式展现：