ElasticSearch essentials; part 2

Basic Setup
Determine How Much RAM is Needed
Configure Heap
POV/Single Node Deployments
Issues with Performance
Rolling Over an Index
Upgrading/Patching Nodes
Single Node Deployment

Basic Setup

Setting up Elasticsearch involves installing the software, configuring essential settings, and ensuring the environment is optimized for performance. Begin by downloading the appropriate version of Elasticsearch for your operating system from the official website.

Installation Steps

Download and Extract: Download the Elasticsearch distribution and extract it to your desired location.
Configure Basic Settings: Edit config/elasticsearch.yml to set:
- Cluster name
- Node name
- Network host (if needed)
- Discovery settings (for multi-node clusters)
Set Environment Variables: Configure ES_HOME and ensure Java is properly installed and accessible.
Start Elasticsearch: Run the startup script:
```
./bin/elasticsearch
```
Verify Installation: Check that Elasticsearch is running:
```
curl http://localhost:9200
```

Essential Configuration Files

config/elasticsearch.yml: Main configuration file
config/jvm.options: JVM heap and garbage collection settings
config/log4j2.properties: Logging configuration

Determine How Much RAM is Needed

Memory requirements for Elasticsearch depend on your workload, data volume, and cluster size. Proper RAM allocation is critical for performance and stability.

General Guidelines

As per AppDynamics recommendations for Event Service deployments (which use Elasticsearch), allocate half of the available RAM to the Elasticsearch process, with:

Minimum: 7 GB heap
Maximum: 31 GB heap
Optimal Production: At least 62 GB of RAM on the system/host machine

Factors Affecting RAM Requirements

License Units: Higher license unit consumption requires more RAM
Event Volume: Transaction Analytics and Log Analytics events impact memory needs
Cluster Size: More nodes can distribute the load, but each node still needs adequate RAM
Query Complexity: Complex queries and aggregations consume more memory

Production Recommendations

For production environments:

Minimum: 62 GB RAM per node
Optimal: 122 GB RAM per node (i2.4xlarge equivalent)
High-Volume: 244 GB RAM per node (i2.8xlarge equivalent)
Node Count: At least 3 nodes for production (avoid single-node in production)

Everything below 62 GB RAM should be used only for POV and testing purposes.

Checking Current Memory Usage

To check how much RAM the Elasticsearch process is using:

# Find the Elasticsearch process ID
netstat -anp | grep 9200 | grep LISTEN

# Check memory consumption (replace <pid> with actual process ID)
echo 0 $(awk '/Rss/ {print "+", $2}' /proc/<pid>/smaps) | bc

The output is in KB. Compare this against your heap configuration and available system RAM.

Configure Heap

Heap memory configuration determines the amount of memory allocated to the Java Virtual Machine (JVM) heap. The heap is the runtime data area from which memory for all class instances and arrays is allocated.

Heap Size Guidelines

Allocation Rule: Set heap to 50% of available RAM, up to a maximum of 31 GB
Why 31 GB?: Beyond 31 GB, JVM uses 64-bit object pointers, which increases memory overhead
Why 50%?: The remaining RAM is needed for:
- Operating system operations
- File system cache (Lucene uses this for better performance)
- Other system processes

Configuration Steps

Edit config/jvm.options:

# Set initial and maximum heap to the same value (prevents resizing)
-Xms31g
-Xmx31g

For systems with less RAM, adjust accordingly:

# Example for 16 GB system: allocate 7-8 GB to heap
-Xms7g
-Xmx7g

Restart Elasticsearch after making changes:
```
./bin/elasticsearch -d
```

Important Notes

Always set -Xms and -Xmx to the same value to prevent heap resizing during runtime
Never allocate more than 50% of RAM to heap
Never exceed 31 GB heap size
Ensure total system RAM is sufficient (heap + OS + file cache)

Verifying Heap Configuration

After restart, verify the heap settings:

# Check JVM settings
curl -s 'http://localhost:9200/_nodes/jvm?pretty'

Look for heap_max_in_bytes in the response to confirm your settings.

POV/Single Node Deployments

Proof of Value (POV) and single-node deployments are suitable for development, testing, and evaluation purposes, but not recommended for production.

When to Use Single-Node

Development and testing environments
POV demonstrations
Learning and experimentation
Low-volume, non-critical applications

Limitations

No High Availability: Single node failure means complete cluster unavailability
No Data Replication: Risk of data loss if the node fails
Performance Constraints: All roles run on one node, causing resource contention
No Rolling Upgrades: Must take full downtime for maintenance

Single-Node Configuration

For a single-node cluster, set in config/elasticsearch.yml:

discovery.type: single-node

This disables the minimum master nodes check and allows the cluster to form with just one node.

Resource Requirements for POV

Even for POV, ensure minimum resources:

RAM: At least 16 GB (8 GB heap minimum)
CPU: 4+ cores
Storage: SSD recommended, sufficient space for data

Issues with Performance

Performance issues in Elasticsearch can manifest as slow queries, high CPU usage, memory pressure, or cluster instability.

Common Performance Issues

1. High CPU Usage

Symptoms:

Slow query responses
High CPU utilization on nodes
Cluster lag

Troubleshooting:

# Check node stats
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,cpu,load_1m,load_5m,load_15m'

# Check thread pool stats
curl -s 'http://localhost:9200/_cat/thread_pool?v'

Solutions:

Optimize queries (avoid deep pagination, use filters instead of queries where possible)
Increase node resources (CPU/RAM)
Add more nodes to distribute load
Review and optimize index mappings

2. Memory Pressure

Symptoms:

Frequent garbage collection pauses
Out of Memory (OOM) errors
Process termination by OS

Troubleshooting:

# Check if Elasticsearch process was killed
dmesg | grep <process-id>

# Look for OOM killer messages
dmesg | grep -i "out of memory"
dmesg | grep -i "oom-kill"

Example OOM Output:

oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1010.slice/session-978.scope,task=java,pid=495838,uid=1010
Out of memory: Killed process 495838 (java) total-vm:25599008kB, anon-rss:19702248kB

Solutions:

Increase heap size (within 31 GB limit)
Increase system RAM
Reduce index shard count
Optimize queries to use less memory
Enable circuit breakers to prevent OOM

3. Slow Queries

Symptoms:

Query timeouts
Slow search responses
High latency

Troubleshooting:

# Enable slow log
PUT /my_index/_settings
{
  "index.search.slowlog.threshold.query.warn": "10s",
  "index.search.slowlog.threshold.query.info": "5s"
}

# Check slow queries
GET /_cat/indices?v&s=index

Solutions:

Optimize query structure
Use filters instead of queries for exact matches
Add appropriate index mappings
Consider using aggregations efficiently
Review shard allocation

4. Cluster Health Issues

Symptoms:

Yellow or red cluster status
Unassigned shards
Node failures

Troubleshooting:

# Check cluster health
curl -s 'http://localhost:9200/_cat/health?v'

# Check for unassigned shards
curl -s 'http://localhost:9200/_cat/shards?v' | grep UNASSIGNED

# Check node status
curl -s 'http://localhost:9200/_cat/nodes?v'

Solutions:

Ensure cluster is in “green” status before operations
Fix unassigned shards
Ensure sufficient nodes for replica allocation
Check disk space and permissions

Performance Monitoring

Regular monitoring helps identify issues early:

# Cluster health
curl -s 'http://localhost:9200/_cat/health?v'

# Node stats
curl -s 'http://localhost:9200/_nodes/stats?pretty'

# Index stats
curl -s 'http://localhost:9200/_cat/indices?v'

# Check processes
netstat -tulpn | grep 9200

Rolling Over an Index

Index rollover is the process of creating a new index when the current one reaches a certain size or age. This is essential for managing large datasets and maintaining performance.

When Indexes Roll Over

Indexes typically roll over when:

Average shard size breaches a threshold
Index age exceeds the data retention period
Manual rollover is triggered

Manual Index Rollover

For Event Service deployments using Elasticsearch 8, use the following curl command:

curl -XPOST http://<host>:<port>/v1/admin/cluster/<cluster>/index/<index>/rollover \
  -H"Authorization: Basic <key>" \
  -H"Content-Type: application/json" \
  -H"Accept: application/json" \
  -d '{"numberOfShards": "2"}'

Parameters

Replace the following values:

<host>: Hostname (use localhost if running from Event Service CLI)

<port>: Event Service port (default 9080)

grep ad.dw.http.port events-service-api-store.properties

<cluster>: Cluster name

grep ad.es.cluster.name events-service-api-store.properties

<index>: Index name to rollover

curl http://localhost:9200/_cat/indices?v

<key>: Base64 encoded ad.accountmanager.key.ops from properties file
```
echo -n "<value-from-ad.accountmanager.key.ops>" | base64
```

Pre-Rollover Checklist

Before rolling over any index, ensure cluster health is green:

curl -s 'http://localhost:9200/_cat/health?v'

Verify:

Cluster status is “green”
No unassigned shards
All nodes are healthy

After rollover, verify again:

curl -s 'http://localhost:9200/_cat/health?v'

Common Rollover Scenarios

After Migration: If data older than retention period was migrated, indexes may not roll over automatically
Size Threshold: When index size exceeds recommended limits
Maintenance: During scheduled maintenance windows

Index Lifecycle Management (ILM)

For automated rollover, consider implementing ILM policies:

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "30d"
          }
        }
      }
    }
  }
}

Upgrading/Patching Nodes

Regular updates ensure access to new features, performance improvements, and security patches. Follow a rolling upgrade process to minimize downtime.

Pre-Upgrade Checklist

Backup Data: Always create a snapshot before upgrading

# Create snapshot repository
PUT /_snapshot/my_backup
{
  "type": "fs",
  "settings": {
    "location": "/path/to/backup"
  }
}

# Create snapshot
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true

Check Compatibility: Ensure plugins and configurations are compatible with the new version
Review Release Notes: Check for breaking changes and migration requirements
Verify Cluster Health: Ensure cluster is in “green” status
```
curl -s 'http://localhost:9200/_cat/health?v'
```

Rolling Upgrade Process

For Event Service deployments, follow these steps:

Disable Shard Allocation (on the node to be upgraded):

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "none"
  }
}

Stop Elasticsearch on the node:

# Find process
netstat -anp | grep 9200

# Stop gracefully
kill -SIGTERM <pid>

Upgrade/Patch the node:
- Install new version
- Update configuration files if needed
- Verify Java version compatibility
Start Elasticsearch:
```
./bin/elasticsearch -d
```

Verify Node Joined:

curl -s 'http://localhost:9200/_cat/nodes?v'

Re-enable Shard Allocation:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

Wait for Cluster to Stabilize:

# Monitor until green
curl -s 'http://localhost:9200/_cat/health?v'

Repeat for each node, one at a time

Post-Upgrade Verification

After all nodes are upgraded:

# Check cluster health
curl -s 'http://localhost:9200/_cat/health?v'

# Verify version
curl -s 'http://localhost:9200/'

# Check all nodes are on new version
curl -s 'http://localhost:9200/_cat/nodes?v&h=name,version'

Important Notes

Never upgrade more than one node at a time in a multi-node cluster
Maintain cluster quorum: Ensure majority of master-eligible nodes are available
Test in non-production first: Always test upgrade process in a test environment
Have rollback plan: Keep backups and know how to restore if needed

Single Node Deployment

For development, testing, or POV scenarios, you may need to run Elasticsearch as a single-node cluster.

Configuration

Set in config/elasticsearch.yml:

# Single node configuration
discovery.type: single-node

# Optional: reduce resource requirements
cluster.routing.allocation.disk.threshold_enabled: false

Starting Single Node

# Start Elasticsearch
./bin/elasticsearch

# Or as daemon
./bin/elasticsearch -d

Verification

# Check cluster status (should show 1 node)
curl -s 'http://localhost:9200/_cat/nodes?v'

# Check cluster health
curl -s 'http://localhost:9200/_cat/health?v'

Limitations and Considerations

No Replication: Data is stored only once, no redundancy
No High Availability: Node failure means complete downtime
Resource Constraints: All roles (data, master, ingest) run on one node
Not for Production: Use only for development, testing, or POV

Resource Requirements

Even for single-node deployments:

Minimum RAM: 8 GB (4 GB heap)
Recommended RAM: 16 GB (8 GB heap)
CPU: 2-4 cores minimum
Storage: SSD recommended

When to Scale

Consider moving to a multi-node cluster when:

Moving to production
Requiring high availability
Handling production workloads
Need for better performance

Best Practices Summary

Memory Management:
- Allocate 50% of RAM to heap (max 31 GB)
- Monitor memory usage regularly
- Ensure sufficient system RAM
Cluster Health:
- Always check cluster health before operations
- Ensure “green” status before rollovers/upgrades
- Monitor for unassigned shards
Performance:
- Optimize queries and mappings
- Monitor CPU and memory usage
- Use appropriate shard sizes (10-50 GB)
Operations:
- Always backup before upgrades
- Use rolling upgrades for zero downtime
- Test changes in non-production first
Production:
- Minimum 3 nodes for production
- At least 62 GB RAM per node
- Never use single-node in production

Table of Contents

Basic Setup

Installation Steps

Essential Configuration Files

Determine How Much RAM is Needed

General Guidelines

Factors Affecting RAM Requirements

Production Recommendations

Checking Current Memory Usage

Configure Heap

Heap Size Guidelines

Configuration Steps

Important Notes

Verifying Heap Configuration

POV/Single Node Deployments

When to Use Single-Node

Limitations

Single-Node Configuration

Resource Requirements for POV

Issues with Performance

Common Performance Issues

1. High CPU Usage

2. Memory Pressure

3. Slow Queries

4. Cluster Health Issues

Performance Monitoring

Rolling Over an Index

When Indexes Roll Over

Manual Index Rollover

Parameters

Pre-Rollover Checklist

Common Rollover Scenarios

Index Lifecycle Management (ILM)

Upgrading/Patching Nodes

Pre-Upgrade Checklist

Rolling Upgrade Process

Post-Upgrade Verification

Important Notes

Single Node Deployment

Configuration

Starting Single Node

Verification

Limitations and Considerations

Resource Requirements

When to Scale

Best Practices Summary

References