- Published at
ElasticSearch essentials; part 2
Advanced Elasticsearch administration: setup, memory configuration, performance troubleshooting, index rollover, and node upgrades.
- Authors
-
-
- Name
- Piotr Duda
- https://x.com/duqens
- Software Engineer at Splunk @ Cisco
-
Table of Contents
- Table of Contents
- Basic Setup
- Installation Steps
- Essential Configuration Files
- Determine How Much RAM is Needed
- General Guidelines
- Factors Affecting RAM Requirements
- Production Recommendations
- Checking Current Memory Usage
- Configure Heap
- Heap Size Guidelines
- Configuration Steps
- Important Notes
- Verifying Heap Configuration
- POV/Single Node Deployments
- When to Use Single-Node
- Limitations
- Single-Node Configuration
- Resource Requirements for POV
- Issues with Performance
- Common Performance Issues
- 1. High CPU Usage
- 2. Memory Pressure
- 3. Slow Queries
- 4. Cluster Health Issues
- Performance Monitoring
- Rolling Over an Index
- When Indexes Roll Over
- Manual Index Rollover
- Parameters
- Pre-Rollover Checklist
- Common Rollover Scenarios
- Index Lifecycle Management (ILM)
- Upgrading/Patching Nodes
- Pre-Upgrade Checklist
- Rolling Upgrade Process
- Post-Upgrade Verification
- Important Notes
- Single Node Deployment
- Configuration
- Starting Single Node
- Verification
- Limitations and Considerations
- Resource Requirements
- When to Scale
- Best Practices Summary
- References
Table of Contents
- Basic Setup
- Determine How Much RAM is Needed
- Configure Heap
- POV/Single Node Deployments
- Issues with Performance
- Rolling Over an Index
- Upgrading/Patching Nodes
- Single Node Deployment
Basic Setup
Setting up Elasticsearch involves installing the software, configuring essential settings, and ensuring the environment is optimized for performance. Begin by downloading the appropriate version of Elasticsearch for your operating system from the official website.
Installation Steps
-
Download and Extract: Download the Elasticsearch distribution and extract it to your desired location.
-
Configure Basic Settings: Edit
config/elasticsearch.ymlto set:- Cluster name
- Node name
- Network host (if needed)
- Discovery settings (for multi-node clusters)
-
Set Environment Variables: Configure
ES_HOMEand ensure Java is properly installed and accessible. -
Start Elasticsearch: Run the startup script:
./bin/elasticsearch -
Verify Installation: Check that Elasticsearch is running:
curl http://localhost:9200
Essential Configuration Files
config/elasticsearch.yml: Main configuration fileconfig/jvm.options: JVM heap and garbage collection settingsconfig/log4j2.properties: Logging configuration
Determine How Much RAM is Needed
Memory requirements for Elasticsearch depend on your workload, data volume, and cluster size. Proper RAM allocation is critical for performance and stability.
General Guidelines
As per AppDynamics recommendations for Event Service deployments (which use Elasticsearch), allocate half of the available RAM to the Elasticsearch process, with:
- Minimum: 7 GB heap
- Maximum: 31 GB heap
- Optimal Production: At least 62 GB of RAM on the system/host machine
Factors Affecting RAM Requirements
- License Units: Higher license unit consumption requires more RAM
- Event Volume: Transaction Analytics and Log Analytics events impact memory needs
- Cluster Size: More nodes can distribute the load, but each node still needs adequate RAM
- Query Complexity: Complex queries and aggregations consume more memory
Production Recommendations
For production environments:
- Minimum: 62 GB RAM per node
- Optimal: 122 GB RAM per node (i2.4xlarge equivalent)
- High-Volume: 244 GB RAM per node (i2.8xlarge equivalent)
- Node Count: At least 3 nodes for production (avoid single-node in production)
Everything below 62 GB RAM should be used only for POV and testing purposes.
Checking Current Memory Usage
To check how much RAM the Elasticsearch process is using:
# Find the Elasticsearch process ID
netstat -anp | grep 9200 | grep LISTEN
# Check memory consumption (replace <pid> with actual process ID)
echo 0 $(awk '/Rss/ {print "+", $2}' /proc/<pid>/smaps) | bc
The output is in KB. Compare this against your heap configuration and available system RAM.
Configure Heap
Heap memory configuration determines the amount of memory allocated to the Java Virtual Machine (JVM) heap. The heap is the runtime data area from which memory for all class instances and arrays is allocated.
Heap Size Guidelines
- Allocation Rule: Set heap to 50% of available RAM, up to a maximum of 31 GB
- Why 31 GB?: Beyond 31 GB, JVM uses 64-bit object pointers, which increases memory overhead
- Why 50%?: The remaining RAM is needed for:
- Operating system operations
- File system cache (Lucene uses this for better performance)
- Other system processes
Configuration Steps
-
Edit
config/jvm.options:# Set initial and maximum heap to the same value (prevents resizing) -Xms31g -Xmx31g -
For systems with less RAM, adjust accordingly:
# Example for 16 GB system: allocate 7-8 GB to heap -Xms7g -Xmx7g -
Restart Elasticsearch after making changes:
./bin/elasticsearch -d
Important Notes
- Always set
-Xmsand-Xmxto the same value to prevent heap resizing during runtime - Never allocate more than 50% of RAM to heap
- Never exceed 31 GB heap size
- Ensure total system RAM is sufficient (heap + OS + file cache)
Verifying Heap Configuration
After restart, verify the heap settings:
# Check JVM settings
curl -s 'http://localhost:9200/_nodes/jvm?pretty'
Look for heap_max_in_bytes in the response to confirm your settings.
POV/Single Node Deployments
Proof of Value (POV) and single-node deployments are suitable for development, testing, and evaluation purposes, but not recommended for production.
When to Use Single-Node
- Development and testing environments
- POV demonstrations
- Learning and experimentation
- Low-volume, non-critical applications
Limitations
- No High Availability: Single node failure means complete cluster unavailability
- No Data Replication: Risk of data loss if the node fails
- Performance Constraints: All roles run on one node, causing resource contention
- No Rolling Upgrades: Must take full downtime for maintenance
Single-Node Configuration
For a single-node cluster, set in config/elasticsearch.yml:
discovery.type: single-node
This disables the minimum master nodes check and allows the cluster to form with just one node.
Resource Requirements for POV
Even for POV, ensure minimum resources:
- RAM: At least 16 GB (8 GB heap minimum)
- CPU: 4+ cores
- Storage: SSD recommended, sufficient space for data
Issues with Performance
Performance issues in Elasticsearch can manifest as slow queries, high CPU usage, memory pressure, or cluster instability.
Common Performance Issues
1. High CPU Usage
Symptoms:
- Slow query responses
- High CPU utilization on nodes
- Cluster lag
Troubleshooting:
# Check node stats
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,cpu,load_1m,load_5m,load_15m'
# Check thread pool stats
curl -s 'http://localhost:9200/_cat/thread_pool?v'
Solutions:
- Optimize queries (avoid deep pagination, use filters instead of queries where possible)
- Increase node resources (CPU/RAM)
- Add more nodes to distribute load
- Review and optimize index mappings
2. Memory Pressure
Symptoms:
- Frequent garbage collection pauses
- Out of Memory (OOM) errors
- Process termination by OS
Troubleshooting:
# Check if Elasticsearch process was killed
dmesg | grep <process-id>
# Look for OOM killer messages
dmesg | grep -i "out of memory"
dmesg | grep -i "oom-kill"
Example OOM Output:
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1010.slice/session-978.scope,task=java,pid=495838,uid=1010
Out of memory: Killed process 495838 (java) total-vm:25599008kB, anon-rss:19702248kB
Solutions:
- Increase heap size (within 31 GB limit)
- Increase system RAM
- Reduce index shard count
- Optimize queries to use less memory
- Enable circuit breakers to prevent OOM
3. Slow Queries
Symptoms:
- Query timeouts
- Slow search responses
- High latency
Troubleshooting:
# Enable slow log
PUT /my_index/_settings
{
"index.search.slowlog.threshold.query.warn": "10s",
"index.search.slowlog.threshold.query.info": "5s"
}
# Check slow queries
GET /_cat/indices?v&s=index
Solutions:
- Optimize query structure
- Use filters instead of queries for exact matches
- Add appropriate index mappings
- Consider using aggregations efficiently
- Review shard allocation
4. Cluster Health Issues
Symptoms:
- Yellow or red cluster status
- Unassigned shards
- Node failures
Troubleshooting:
# Check cluster health
curl -s 'http://localhost:9200/_cat/health?v'
# Check for unassigned shards
curl -s 'http://localhost:9200/_cat/shards?v' | grep UNASSIGNED
# Check node status
curl -s 'http://localhost:9200/_cat/nodes?v'
Solutions:
- Ensure cluster is in “green” status before operations
- Fix unassigned shards
- Ensure sufficient nodes for replica allocation
- Check disk space and permissions
Performance Monitoring
Regular monitoring helps identify issues early:
# Cluster health
curl -s 'http://localhost:9200/_cat/health?v'
# Node stats
curl -s 'http://localhost:9200/_nodes/stats?pretty'
# Index stats
curl -s 'http://localhost:9200/_cat/indices?v'
# Check processes
netstat -tulpn | grep 9200
Rolling Over an Index
Index rollover is the process of creating a new index when the current one reaches a certain size or age. This is essential for managing large datasets and maintaining performance.
When Indexes Roll Over
Indexes typically roll over when:
- Average shard size breaches a threshold
- Index age exceeds the data retention period
- Manual rollover is triggered
Manual Index Rollover
For Event Service deployments using Elasticsearch 8, use the following curl command:
curl -XPOST http://<host>:<port>/v1/admin/cluster/<cluster>/index/<index>/rollover \
-H"Authorization: Basic <key>" \
-H"Content-Type: application/json" \
-H"Accept: application/json" \
-d '{"numberOfShards": "2"}'
Parameters
Replace the following values:
<host>: Hostname (uselocalhostif running from Event Service CLI)<port>: Event Service port (default 9080)grep ad.dw.http.port events-service-api-store.properties<cluster>: Cluster namegrep ad.es.cluster.name events-service-api-store.properties<index>: Index name to rollovercurl http://localhost:9200/_cat/indices?v<key>: Base64 encodedad.accountmanager.key.opsfrom properties fileecho -n "<value-from-ad.accountmanager.key.ops>" | base64
Pre-Rollover Checklist
Before rolling over any index, ensure cluster health is green:
curl -s 'http://localhost:9200/_cat/health?v'
Verify:
- Cluster status is “green”
- No unassigned shards
- All nodes are healthy
After rollover, verify again:
curl -s 'http://localhost:9200/_cat/health?v'
Common Rollover Scenarios
- After Migration: If data older than retention period was migrated, indexes may not roll over automatically
- Size Threshold: When index size exceeds recommended limits
- Maintenance: During scheduled maintenance windows
Index Lifecycle Management (ILM)
For automated rollover, consider implementing ILM policies:
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "30d"
}
}
}
}
}
}
Upgrading/Patching Nodes
Regular updates ensure access to new features, performance improvements, and security patches. Follow a rolling upgrade process to minimize downtime.
Pre-Upgrade Checklist
-
Backup Data: Always create a snapshot before upgrading
# Create snapshot repository PUT /_snapshot/my_backup { "type": "fs", "settings": { "location": "/path/to/backup" } } # Create snapshot PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true -
Check Compatibility: Ensure plugins and configurations are compatible with the new version
-
Review Release Notes: Check for breaking changes and migration requirements
-
Verify Cluster Health: Ensure cluster is in “green” status
curl -s 'http://localhost:9200/_cat/health?v'
Rolling Upgrade Process
For Event Service deployments, follow these steps:
-
Disable Shard Allocation (on the node to be upgraded):
PUT /_cluster/settings { "persistent": { "cluster.routing.allocation.enable": "none" } } -
Stop Elasticsearch on the node:
# Find process netstat -anp | grep 9200 # Stop gracefully kill -SIGTERM <pid> -
Upgrade/Patch the node:
- Install new version
- Update configuration files if needed
- Verify Java version compatibility
-
Start Elasticsearch:
./bin/elasticsearch -d -
Verify Node Joined:
curl -s 'http://localhost:9200/_cat/nodes?v' -
Re-enable Shard Allocation:
PUT /_cluster/settings { "persistent": { "cluster.routing.allocation.enable": "all" } } -
Wait for Cluster to Stabilize:
# Monitor until green curl -s 'http://localhost:9200/_cat/health?v' -
Repeat for each node, one at a time
Post-Upgrade Verification
After all nodes are upgraded:
# Check cluster health
curl -s 'http://localhost:9200/_cat/health?v'
# Verify version
curl -s 'http://localhost:9200/'
# Check all nodes are on new version
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,version'
Important Notes
- Never upgrade more than one node at a time in a multi-node cluster
- Maintain cluster quorum: Ensure majority of master-eligible nodes are available
- Test in non-production first: Always test upgrade process in a test environment
- Have rollback plan: Keep backups and know how to restore if needed
Single Node Deployment
For development, testing, or POV scenarios, you may need to run Elasticsearch as a single-node cluster.
Configuration
Set in config/elasticsearch.yml:
# Single node configuration
discovery.type: single-node
# Optional: reduce resource requirements
cluster.routing.allocation.disk.threshold_enabled: false
Starting Single Node
# Start Elasticsearch
./bin/elasticsearch
# Or as daemon
./bin/elasticsearch -d
Verification
# Check cluster status (should show 1 node)
curl -s 'http://localhost:9200/_cat/nodes?v'
# Check cluster health
curl -s 'http://localhost:9200/_cat/health?v'
Limitations and Considerations
- No Replication: Data is stored only once, no redundancy
- No High Availability: Node failure means complete downtime
- Resource Constraints: All roles (data, master, ingest) run on one node
- Not for Production: Use only for development, testing, or POV
Resource Requirements
Even for single-node deployments:
- Minimum RAM: 8 GB (4 GB heap)
- Recommended RAM: 16 GB (8 GB heap)
- CPU: 2-4 cores minimum
- Storage: SSD recommended
When to Scale
Consider moving to a multi-node cluster when:
- Moving to production
- Requiring high availability
- Handling production workloads
- Need for better performance
Best Practices Summary
-
Memory Management:
- Allocate 50% of RAM to heap (max 31 GB)
- Monitor memory usage regularly
- Ensure sufficient system RAM
-
Cluster Health:
- Always check cluster health before operations
- Ensure “green” status before rollovers/upgrades
- Monitor for unassigned shards
-
Performance:
- Optimize queries and mappings
- Monitor CPU and memory usage
- Use appropriate shard sizes (10-50 GB)
-
Operations:
- Always backup before upgrades
- Use rolling upgrades for zero downtime
- Test changes in non-production first
-
Production:
- Minimum 3 nodes for production
- At least 62 GB RAM per node
- Never use single-node in production