Monitoring Elasticsearch health via HTTP APIs is the gold standard for operations. In fact, it is exactly how official tools (like Kibana or Elastic Agent) and third-party tools (like Prometheus, Grafana, or Datadog) talk to your cluster.
Elasticsearch provides incredibly descriptive, lightweight JSON monitoring endpoints right out of the box.
If you want to build a health-checking script or connect a monitoring tool, here are the most important native HTTP APIs you should target, ranked from "high-level" to "deep dive."
1. The Quick Check: The Cluster Health API
This is the absolute first API you should hit. It gives you an instant, high-level overview of whether your database is working or failing.
json
GET /_cluster/health
Key Fields to Watch in the Response:
status: Can be green (all good), yellow (all data is safe, but some replica shards are unassigned---common in single-node testing), or red (some primary shards are missing; you are actively losing/missing data).unassigned_shards: If this number is high or growing, your cluster is struggling to allocate data to nodes.number_of_nodes: Ensure this matches the actual number of servers you expect to be online.
2. The Detailed Check: The Cluster Stats API
If the Health API tells you if something is wrong, the Stats API helps you understand why by showing memory, CPU, and disk usage across the entire cluster.
json
GET /_cluster/stats?human&pretty
(Adding ?human converts raw bytes and milliseconds into readable terms like 2.3gb or 12s).
Key Fields to Watch:
nodes.fs.available_in_bytes: How much disk space do you have left? (If a node hits 95% disk usage, ES will automatically lock the index and block all writes).nodes.jvm.mem.heap_used_percent: Elasticsearch runs on Java (JVM). If this percentage stays consistently above 85-90%, you are at risk of an "Out of Memory" crash or heavy query lag.
3. The Performance Check: The Nodes Stats API
If you run a multi-node cluster, this API breaks down performance metrics server by server. This is where you look if one specific node is running slow.
json
GET /_nodes/stats/jvm,indices,fs
(By appending /jvm,indices,fs, we filter out noise and only fetch memory, search performance, and disk statistics).
Key Fields to Watch:
- **
indices.search.query_time_in_millis/query_total**: Divide time by total to get your Average Latency per Search. If this spikes, your users are experiencing slowness. jvm.gc.collectors.old.collection_count: High Garbage Collection (GC) numbers mean Java is freezing the CPU to clean up memory.
Best Practices for Building an HTTP Monitor
If you are writing a script (e.g., Python, Node.js, or curl via a cronjob) to hit these APIs, follow these architectural rules:
1. Don't hit them too often
Avoid hitting _cluster/stats or _nodes/stats every single second. Gathering these metrics requires Elasticsearch to poll all its internal systems, which uses CPU. A polling interval of every 10 to 30 seconds is standard for production.
2. Protect the Endpoint with Authentication
If security features are enabled on your cluster, your monitoring script must provide credentials via an HTTP Authorization header (either basic auth with a username/password or an explicit API key).
For example, using curl:
bash
curl -u monitoring_user:password -X GET "http://localhost:9200/_cluster/health"
(Tip: In production, create a custom role in Elasticsearch that only has monitor cluster privileges, so if your script's credentials leak, an attacker cannot delete data).
3. Use the _cat APIs for Quick Terminal Diagnostics
If you aren't writing a script and just want a quick, readable view on your monitor screen, use the _cat (Compact and Aligned Text) APIs instead of standard JSON. They print data out like a clean command-line table:
json
GET /_cat/health?v
GET /_cat/nodes?v&h=ip,name,cpu,heap.percent,disk.used_percent
(The ?v adds headers, and &h= lets you cherry-pick exactly which columns you want to view).
Are you planning to build a custom dashboard/alerting script (like an automated Slack or email alert when the cluster status turns red), or are you hooking this up to an open-source monitoring tool?