|
| 1 | +# Cluster state (cluster_state.json) and node-role reference |
| 2 | + |
| 3 | +Two quick references: (1) **jq** and **jless** for reading `cluster_state.json`, and (2) **how to tell which node type runs a piece of code** (master vs data, etc.) when reading the Elasticsearch codebase. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 1. jq and jless cheat sheet for cluster_state.json |
| 8 | + |
| 9 | +### Getting cluster_state.json |
| 10 | + |
| 11 | +- **REST**: `GET /_cluster/state?metric=_all` (or specific metrics: `version,master_node,nodes,metadata,routing_table,routing_nodes,blocks,customs`). |
| 12 | +- **File**: If you have a dump, it’s typically a single JSON object (same shape as the API response). |
| 13 | + |
| 14 | +### Top-level shape (from `ClusterState`) |
| 15 | + |
| 16 | +| Key | Type | Description | |
| 17 | +|-----|------|-------------| |
| 18 | +| `cluster_name` | string | Cluster name | |
| 19 | +| `version` | number | Cluster state version | |
| 20 | +| `state_uuid` | string | UUID of this cluster state | |
| 21 | +| `master_node` | string | **Node ID** of the elected master (only if metric `master_node` or `_all`) | |
| 22 | +| `blocks` | object | Cluster blocks (if requested) | |
| 23 | +| `nodes` | object | Map **node_id → node descriptor** (if metric `nodes` or `_all`) | |
| 24 | +| `nodes_versions` | array | Compatibility versions per node | |
| 25 | +| `nodes_features` | object | Node feature flags | |
| 26 | +| `metadata` | object | Indices, templates, etc. | |
| 27 | +| `routing_table` | object | Index → shard routing (if requested) | |
| 28 | +| `routing_nodes` | object | Unassigned shards + per-node shard list (if requested) | |
| 29 | +| `customs` | object | Custom state components | |
| 30 | + |
| 31 | +### Node descriptor shape (each entry under `nodes`) |
| 32 | + |
| 33 | +Key = **node id**. Value is an object: |
| 34 | + |
| 35 | +| Key | Type | Description | |
| 36 | +|-----|------|-------------| |
| 37 | +| `name` | string | Human-readable node name | |
| 38 | +| `ephemeral_id` | string | Ephemeral node id (changes on restart) | |
| 39 | +| `transport_address` | string | Transport address | |
| 40 | +| `external_id` | string | External identifier | |
| 41 | +| `attributes` | object | Key-value attributes | |
| 42 | +| `roles` | array | **Array of role names**: `master`, `data`, `data_hot`, `ingest`, etc. | |
| 43 | +| `version` | string | Node version | |
| 44 | +| `min_index_version` | number | Min index version supported | |
| 45 | +| `max_index_version` | number | Max index version supported | |
| 46 | + |
| 47 | +### jq: essentials |
| 48 | + |
| 49 | +```bash |
| 50 | +# Pretty-print |
| 51 | +jq . cluster_state.json |
| 52 | + |
| 53 | +# Cluster name and version |
| 54 | +jq '{cluster_name, version, state_uuid}' cluster_state.json |
| 55 | + |
| 56 | +# Master node id |
| 57 | +jq '.master_node' cluster_state.json |
| 58 | + |
| 59 | +# All node ids |
| 60 | +jq '.nodes | keys' cluster_state.json |
| 61 | + |
| 62 | +# Node id → name |
| 63 | +jq '.nodes | to_entries | map({key: .key, name: .value.name}) | from_entries' cluster_state.json |
| 64 | + |
| 65 | +# Nodes that have the "master" role (master-eligible) |
| 66 | +jq '.nodes | to_entries | map(select(.value.roles | index("master"))) | from_entries' cluster_state.json |
| 67 | + |
| 68 | +# Nodes that have the "data" role |
| 69 | +jq '.nodes | to_entries | map(select(.value.roles | index("data"))) | from_entries' cluster_state.json |
| 70 | + |
| 71 | +# List node names with their roles |
| 72 | +jq '.nodes | to_entries[] | {name: .value.name, roles: .value.roles}' cluster_state.json |
| 73 | + |
| 74 | +# Find node by name (e.g. "node-1") |
| 75 | +jq '.nodes | to_entries[] | select(.value.name == "node-1") | {id: .key, node: .value}' cluster_state.json |
| 76 | + |
| 77 | +# Master node id and its name |
| 78 | +jq '{master_node, master_name: .nodes[.master_node].name}' cluster_state.json |
| 79 | + |
| 80 | +# Routing table: index names |
| 81 | +jq '.routing_table.indices | keys' cluster_state.json |
| 82 | + |
| 83 | +# Shards for an index (e.g. "my_index") |
| 84 | +jq '.routing_table.indices.my_index.shards' cluster_state.json |
| 85 | + |
| 86 | +# Unassigned shards (from routing_nodes) |
| 87 | +jq '.routing_nodes.unassigned' cluster_state.json |
| 88 | + |
| 89 | +# Metadata: index list |
| 90 | +jq '.metadata.indices | keys' cluster_state.json |
| 91 | +``` |
| 92 | + |
| 93 | +### jq: “who is master / data” from nodes |
| 94 | + |
| 95 | +```bash |
| 96 | +# Node ids that are master-eligible (have "master" in roles) |
| 97 | +jq '[.nodes | to_entries[] | select(.value.roles | index("master")) | .key]' cluster_state.json |
| 98 | + |
| 99 | +# Node ids that can hold data (have "data" or any data_* in roles) |
| 100 | +jq '[.nodes | to_entries[] | select(.value.roles | any(contains("data"))) | .key]' cluster_state.json |
| 101 | +``` |
| 102 | + |
| 103 | +### jless: essentials |
| 104 | + |
| 105 | +[jless](https://jless.io/) is a pager for JSON; keys are navigable. |
| 106 | + |
| 107 | +```bash |
| 108 | +# Open file (arrow keys to expand/collapse, type to search) |
| 109 | +jless cluster_state.json |
| 110 | + |
| 111 | +# Open from stdin |
| 112 | +jq -c . cluster_state.json | jless |
| 113 | +``` |
| 114 | + |
| 115 | +- **Navigation**: `Enter` expand/collapse, `←`/`→` collapse parent / go to next sibling, `/` search, `n`/`N` next/previous match. |
| 116 | +- **Filter by path**: Pre-filter with jq then pipe: |
| 117 | + `jq '.nodes' cluster_state.json | jless` |
| 118 | +- **Search**: `/master_node` then `Enter` to jump to that key; search for a node name or id to see where it appears. |
| 119 | + |
| 120 | +### One-liners to answer common questions |
| 121 | + |
| 122 | +```bash |
| 123 | +# Who is the current master? (id and name) |
| 124 | +jq '{master_id: .master_node, master_name: .nodes[.master_node].name}' cluster_state.json |
| 125 | + |
| 126 | +# All nodes with roles |
| 127 | +jq '.nodes | to_entries | map({name: .value.name, id: .key, roles: .value.roles})' cluster_state.json |
| 128 | + |
| 129 | +# Indices in cluster |
| 130 | +jq '.metadata.indices | keys' cluster_state.json |
| 131 | +``` |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## 2. Determining which node runs code (master vs data, etc.) |
| 136 | + |
| 137 | +When reading Elasticsearch server code, use these **precise, reliable** ways to see whether logic runs on **master** nodes, **data** nodes, or **any** node. |
| 138 | + |
| 139 | +### 2.1 “Am I the local node?” and “What is my role?” |
| 140 | + |
| 141 | +- **Local node** (the node this process is running on): |
| 142 | + - `ClusterService#localNode()` → `DiscoveryNode` |
| 143 | + Example: `clusterService.localNode()` |
| 144 | + - Or from cluster state: `clusterState.nodes().getLocalNode()` |
| 145 | + - In coordination layer: `transportService.getLocalNode()` (same idea; may be used before cluster state has the local node). |
| 146 | + |
| 147 | +- **Role checks on the local node** (use the `DiscoveryNode` from above): |
| 148 | + - **Master-eligible**: `localNode.isMasterNode()` |
| 149 | + True if this node has the `master` role (can participate in elections). |
| 150 | + - **Currently elected master**: `clusterState.nodes().isLocalNodeElectedMaster()` |
| 151 | + True iff the local node’s id equals `clusterState.nodes().getMasterNodeId()`. |
| 152 | + - **Can hold data**: `localNode.canContainData()` |
| 153 | + True if this node has any data role (e.g. `data`, `data_hot`, `data_content`, or stateless `index`/`search`). |
| 154 | + - **Has the “data” role specifically**: `DiscoveryNode.hasDataRole(settings)` (from settings); on a `DiscoveryNode`: check roles for `DiscoveryNodeRole.DATA_ROLE`. |
| 155 | + - **Ingest**: `localNode.isIngestNode()` |
| 156 | + True if this node has the `ingest` role. |
| 157 | + |
| 158 | +**Relevant classes**: |
| 159 | + |
| 160 | +- `org.elasticsearch.cluster.service.ClusterService` — `localNode()`, `state()` |
| 161 | +- `org.elasticsearch.cluster.node.DiscoveryNode` — `isMasterNode()`, `canContainData()`, `isIngestNode()`, `getRoles()` |
| 162 | +- `org.elasticsearch.cluster.node.DiscoveryNodes` — `getLocalNode()`, `getLocalNodeId()`, `getMasterNodeId()`, `getMasterNode()`, `isLocalNodeElectedMaster()` |
| 163 | +- `org.elasticsearch.cluster.node.DiscoveryNodeRole` — `MASTER_ROLE`, `DATA_ROLE`, `INGEST_ROLE`, etc.; role names match JSON (e.g. `"master"`, `"data"`). |
| 164 | + |
| 165 | +### 2.2 “Does this code run only on the elected master?” |
| 166 | + |
| 167 | +- **MasterService** |
| 168 | + Code that runs inside a **MasterService** task (e.g. submitted via `masterService.submitStateUpdateTask(...)` or a task queue) runs **only on the node that is currently the elected master**. |
| 169 | + - The executor’s `runOnlyOnMaster()` (default `true`) means: if this node loses mastership before the task runs, the task is failed with `NotMasterException` and not executed. |
| 170 | + - So: **if you see a `ClusterStateUpdateTask` / executor registered with MasterService, that code runs only on the master.** |
| 171 | + |
| 172 | +- **Explicit checks in code** |
| 173 | + - `clusterState.nodes().isLocalNodeElectedMaster()` — “am I the current master?” |
| 174 | + - `assert clusterState.nodes().isLocalNodeElectedMaster()` — often used in code that is only supposed to run on the master. |
| 175 | + - `state.nodes().getMasterNodeId()` — who is master (node id); compare with `state.nodes().getLocalNodeId()` to see “am I master?”. |
| 176 | + |
| 177 | +So when reading code: |
| 178 | + |
| 179 | +1. **Master-only logic**: Look for **MasterService** submissions or **`isLocalNodeElectedMaster()`** (or assertions using it). That code path runs only on the elected master. |
| 180 | +2. **Data-node / shard logic**: Look for **shard-level** operations (e.g. `IndexShard`, `TransportReplicationAction` executing on a shard). Those run on nodes that **hold that shard** (data/index/search nodes). You can also check `localNode().canContainData()` in branches that are only relevant for data-holding nodes. |
| 181 | +3. **Any node**: If there is no MasterService and no “only on master” or “only on data” check, the code may run on any node (e.g. coordination, or routing). |
| 182 | + |
| 183 | +### 2.3 From Settings (startup / config) rather than cluster state |
| 184 | + |
| 185 | +When the question is “what **kind** of node is this configured to be?” (e.g. in constructors or at bootstrap): |
| 186 | + |
| 187 | +- `DiscoveryNode.isMasterNode(Settings settings)` — master-eligible? |
| 188 | +- `DiscoveryNode.hasDataRole(Settings settings)` — has the `data` role? |
| 189 | +- `DiscoveryNode.canContainData(Settings settings)` — any role that can contain data? |
| 190 | +- `DiscoveryNode.isIngestNode(Settings settings)` — ingest role? |
| 191 | + |
| 192 | +These use `node.roles` (or default roles) from config and don’t require cluster state. |
| 193 | + |
| 194 | +### 2.4 Quick lookup table (code) |
| 195 | + |
| 196 | +| Question | Code / pattern | |
| 197 | +|----------|----------------| |
| 198 | +| Am I the local node? | `clusterService.localNode()` or `clusterState.nodes().getLocalNode()` | |
| 199 | +| Is this node master-eligible? | `localNode.isMasterNode()` | |
| 200 | +| Am I the current elected master? | `clusterState.nodes().isLocalNodeElectedMaster()` | |
| 201 | +| Does this node hold data? | `localNode.canContainData()` | |
| 202 | +| Is this node ingest? | `localNode.isIngestNode()` | |
| 203 | +| This block runs only on master | Code inside a **MasterService** task, or guarded by `isLocalNodeElectedMaster()` | |
| 204 | +| This block runs on data nodes | Shard-level actions or checks like `localNode().canContainData()` | |
| 205 | + |
| 206 | +Using this, you can reliably tell from the code whether a given path runs on master nodes, data nodes, or both. |
0 commit comments