- Published at
ElasticSearch essentials; part 1
An article about ElasticSearch essentials.
- Authors
-
-
- Name
- Piotr Duda
- https://x.com/duqens
- Senior Software Engineer at AppDynamics @ Cisco
-
Table of Contents
What is ElasticSearch?
Elastisearch is a distributed search and analytics enginge. It is designed for fast full-text search, perfect for logging, real time data analysis and scalability. It is commonly used for indexing large volumes of data, with very easy access to queries that retreive information quickly. ElasticSearch is often combined with the ELK stack (ElasticSearch, Logstash and Kibana). Most important use cases for ES would be:
- log analysis,
- application monitoring,
- business analytics.
Important nomenclature
- Cluster - a combination of nodes which together form a cluster.
- Nodes - a server which compounds into a cluster.
- Index - Largest unit in ElasticSearch - combination of large documents, similar to SQL tables.
- Shards - small pieces of data which combine into an index.
- Replicas - replica shards used for HA. This way, when some of the shards are not available, these shards replace their place.
- Mapping -
- Documents -
How does it scale?
Very well! You can scale both horizontally and veritcally. While one node starts up in a very-costly manner (4cpu, 16GB of RAM), you can stale it up in both ways. I’ve seen nodes with 32cpus + 256GB of RAM! For horizontal scaling, it can handle many nodes, 50+ being the standard.
High availability
This concept is always important in enterprise level environments - banking, financial services, goverments. Thankfully, ElasticSearch is HA by itself - the more nodes you add, the higher the availability you have. With 10+ nodes, one node going down does not impact your production live data.
https://discuss.elastic.co/t/elastic-master-node-high-cpu/228347