ElasticSearch comparied with SQL and NoSQL

ElasticSearch SQL NoSQL
Index DataBase Database
Type Table Collection
Document Value Document
  • In ES, a document must be assigned to a Type.

The number of shards and replicas can be defined per index at the time the index is created. After the index is created, you may change the number of replicas dynamically anytime but you cannot change the number of shards after-the-fact.

ElasticSearch's CRUD

ISUD CRUD REST Verb
indexing create PUT
search read GET
update update POST
delete delete DELETE

some terms:

  • JSON (JavaScript Object Notation)
  • ASAP (As Soon As Possible)

启动命令并定义集群名、节点名

./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name

status color

  • Green means everything is good (cluster is fully functional)
  • yellow means all data is available but some replicas are not yet allocated (cluster is fully functional)

  • red means some data is not available for whatever reason. Note that even if a cluster is red, it still is partially functional (i.e. it will continue to serve search requests from the available shards) but you will likely need to fix it ASAP since you have missing data.

put/add/create document

REST_Verb /Index/Type/ID


# add /index/type/document_ID ? pretty-print json # {the document will be added} PUT /customer/external/1?pretty { "name": "John Doe" }
  • POST: index a document without an explicit ID

get/find document

update/post document

# post will update a specify field of document
# and its mechamism is delete old one and 
# insert new one

PUT /index/type/doc_id/_update
{
    "doc": {
        "old_field": "term1",  # which will be replace
        "new_field": "term2"   # which will be inserted 
    }
}

# something wrong
POST /customer/external/1/_update?pretty
{
  "script" : "ctx._source.age += 1"
}
# return
{
   "error": {
      "root_cause": [
         {
            "type": "remote_transport_exception",
            "reason": "[gupern_first_node][127.0.0.1:9300][indices:data/write/update[s]]"
         }
      ],
      "type": "illegal_argument_exception",
      "reason": "failed to execute script",
      "caused_by": {
         "type": "script_exception",
         "reason": "runtime error",
         "script_stack": [
            "ctx._source.age = ctx._source.age+1",
            "                             ^---- HERE"
         ],
         "script": "ctx._source.age = ctx._source.age+1",
         "lang": "painless",
         "caused_by": {
            "type": "null_pointer_exception",
            "reason": null
         }
      }
   },
   "status": 400
}

delete

  • It is worth noting that it is much more efficient to delete a whole index instead of deleting all documents with the Delete By Query API.
# POST index/_delete_by_query
# must query, match/term, key-value

POST customer/_delete_by_query
{
  "query": { 
    "match": {
      "name": "John Doeaa"
    }
  }
}

bulk operation

  • In addition to being able to index, update, and delete individual documents, Elasticsearch also provides the ability to perform any of the above operations in batches using the _bulk API. This functionality is important in that it provides a very efficient mechanism to do multiple operations as fast as possible with as few network roundtrips as possible.
# indexes two documents in one bulk operation

POST /customer/external/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }

# This example updates the first document (ID of 1)
# and then deletes the second document (ID of 2) 

POST /customer/external/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}

import data from json file

  • json file's content must match _bulk
  • and fileName's prefix must add '@'

curl -H "Content-Type: application/json" -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"

curl https://ec.haxx.se/cmdline.html