Skip to content

Commit 976b734

Browse files
committed
study notes
1 parent 81cbffd commit 976b734

File tree

1 file changed

+206
-0
lines changed

1 file changed

+206
-0
lines changed
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# Cluster state (cluster_state.json) and node-role reference
2+
3+
Two quick references: (1) **jq** and **jless** for reading `cluster_state.json`, and (2) **how to tell which node type runs a piece of code** (master vs data, etc.) when reading the Elasticsearch codebase.
4+
5+
---
6+
7+
## 1. jq and jless cheat sheet for cluster_state.json
8+
9+
### Getting cluster_state.json
10+
11+
- **REST**: `GET /_cluster/state?metric=_all` (or specific metrics: `version,master_node,nodes,metadata,routing_table,routing_nodes,blocks,customs`).
12+
- **File**: If you have a dump, it’s typically a single JSON object (same shape as the API response).
13+
14+
### Top-level shape (from `ClusterState`)
15+
16+
| Key | Type | Description |
17+
|-----|------|-------------|
18+
| `cluster_name` | string | Cluster name |
19+
| `version` | number | Cluster state version |
20+
| `state_uuid` | string | UUID of this cluster state |
21+
| `master_node` | string | **Node ID** of the elected master (only if metric `master_node` or `_all`) |
22+
| `blocks` | object | Cluster blocks (if requested) |
23+
| `nodes` | object | Map **node_id → node descriptor** (if metric `nodes` or `_all`) |
24+
| `nodes_versions` | array | Compatibility versions per node |
25+
| `nodes_features` | object | Node feature flags |
26+
| `metadata` | object | Indices, templates, etc. |
27+
| `routing_table` | object | Index → shard routing (if requested) |
28+
| `routing_nodes` | object | Unassigned shards + per-node shard list (if requested) |
29+
| `customs` | object | Custom state components |
30+
31+
### Node descriptor shape (each entry under `nodes`)
32+
33+
Key = **node id**. Value is an object:
34+
35+
| Key | Type | Description |
36+
|-----|------|-------------|
37+
| `name` | string | Human-readable node name |
38+
| `ephemeral_id` | string | Ephemeral node id (changes on restart) |
39+
| `transport_address` | string | Transport address |
40+
| `external_id` | string | External identifier |
41+
| `attributes` | object | Key-value attributes |
42+
| `roles` | array | **Array of role names**: `master`, `data`, `data_hot`, `ingest`, etc. |
43+
| `version` | string | Node version |
44+
| `min_index_version` | number | Min index version supported |
45+
| `max_index_version` | number | Max index version supported |
46+
47+
### jq: essentials
48+
49+
```bash
50+
# Pretty-print
51+
jq . cluster_state.json
52+
53+
# Cluster name and version
54+
jq '{cluster_name, version, state_uuid}' cluster_state.json
55+
56+
# Master node id
57+
jq '.master_node' cluster_state.json
58+
59+
# All node ids
60+
jq '.nodes | keys' cluster_state.json
61+
62+
# Node id → name
63+
jq '.nodes | to_entries | map({key: .key, name: .value.name}) | from_entries' cluster_state.json
64+
65+
# Nodes that have the "master" role (master-eligible)
66+
jq '.nodes | to_entries | map(select(.value.roles | index("master"))) | from_entries' cluster_state.json
67+
68+
# Nodes that have the "data" role
69+
jq '.nodes | to_entries | map(select(.value.roles | index("data"))) | from_entries' cluster_state.json
70+
71+
# List node names with their roles
72+
jq '.nodes | to_entries[] | {name: .value.name, roles: .value.roles}' cluster_state.json
73+
74+
# Find node by name (e.g. "node-1")
75+
jq '.nodes | to_entries[] | select(.value.name == "node-1") | {id: .key, node: .value}' cluster_state.json
76+
77+
# Master node id and its name
78+
jq '{master_node, master_name: .nodes[.master_node].name}' cluster_state.json
79+
80+
# Routing table: index names
81+
jq '.routing_table.indices | keys' cluster_state.json
82+
83+
# Shards for an index (e.g. "my_index")
84+
jq '.routing_table.indices.my_index.shards' cluster_state.json
85+
86+
# Unassigned shards (from routing_nodes)
87+
jq '.routing_nodes.unassigned' cluster_state.json
88+
89+
# Metadata: index list
90+
jq '.metadata.indices | keys' cluster_state.json
91+
```
92+
93+
### jq: “who is master / data” from nodes
94+
95+
```bash
96+
# Node ids that are master-eligible (have "master" in roles)
97+
jq '[.nodes | to_entries[] | select(.value.roles | index("master")) | .key]' cluster_state.json
98+
99+
# Node ids that can hold data (have "data" or any data_* in roles)
100+
jq '[.nodes | to_entries[] | select(.value.roles | any(contains("data"))) | .key]' cluster_state.json
101+
```
102+
103+
### jless: essentials
104+
105+
[jless](https://jless.io/) is a pager for JSON; keys are navigable.
106+
107+
```bash
108+
# Open file (arrow keys to expand/collapse, type to search)
109+
jless cluster_state.json
110+
111+
# Open from stdin
112+
jq -c . cluster_state.json | jless
113+
```
114+
115+
- **Navigation**: `Enter` expand/collapse, ``/`` collapse parent / go to next sibling, `/` search, `n`/`N` next/previous match.
116+
- **Filter by path**: Pre-filter with jq then pipe:
117+
`jq '.nodes' cluster_state.json | jless`
118+
- **Search**: `/master_node` then `Enter` to jump to that key; search for a node name or id to see where it appears.
119+
120+
### One-liners to answer common questions
121+
122+
```bash
123+
# Who is the current master? (id and name)
124+
jq '{master_id: .master_node, master_name: .nodes[.master_node].name}' cluster_state.json
125+
126+
# All nodes with roles
127+
jq '.nodes | to_entries | map({name: .value.name, id: .key, roles: .value.roles})' cluster_state.json
128+
129+
# Indices in cluster
130+
jq '.metadata.indices | keys' cluster_state.json
131+
```
132+
133+
---
134+
135+
## 2. Determining which node runs code (master vs data, etc.)
136+
137+
When reading Elasticsearch server code, use these **precise, reliable** ways to see whether logic runs on **master** nodes, **data** nodes, or **any** node.
138+
139+
### 2.1 “Am I the local node?” and “What is my role?”
140+
141+
- **Local node** (the node this process is running on):
142+
- `ClusterService#localNode()``DiscoveryNode`
143+
Example: `clusterService.localNode()`
144+
- Or from cluster state: `clusterState.nodes().getLocalNode()`
145+
- In coordination layer: `transportService.getLocalNode()` (same idea; may be used before cluster state has the local node).
146+
147+
- **Role checks on the local node** (use the `DiscoveryNode` from above):
148+
- **Master-eligible**: `localNode.isMasterNode()`
149+
True if this node has the `master` role (can participate in elections).
150+
- **Currently elected master**: `clusterState.nodes().isLocalNodeElectedMaster()`
151+
True iff the local node’s id equals `clusterState.nodes().getMasterNodeId()`.
152+
- **Can hold data**: `localNode.canContainData()`
153+
True if this node has any data role (e.g. `data`, `data_hot`, `data_content`, or stateless `index`/`search`).
154+
- **Has the “data” role specifically**: `DiscoveryNode.hasDataRole(settings)` (from settings); on a `DiscoveryNode`: check roles for `DiscoveryNodeRole.DATA_ROLE`.
155+
- **Ingest**: `localNode.isIngestNode()`
156+
True if this node has the `ingest` role.
157+
158+
**Relevant classes**:
159+
160+
- `org.elasticsearch.cluster.service.ClusterService``localNode()`, `state()`
161+
- `org.elasticsearch.cluster.node.DiscoveryNode``isMasterNode()`, `canContainData()`, `isIngestNode()`, `getRoles()`
162+
- `org.elasticsearch.cluster.node.DiscoveryNodes``getLocalNode()`, `getLocalNodeId()`, `getMasterNodeId()`, `getMasterNode()`, `isLocalNodeElectedMaster()`
163+
- `org.elasticsearch.cluster.node.DiscoveryNodeRole``MASTER_ROLE`, `DATA_ROLE`, `INGEST_ROLE`, etc.; role names match JSON (e.g. `"master"`, `"data"`).
164+
165+
### 2.2 “Does this code run only on the elected master?”
166+
167+
- **MasterService**
168+
Code that runs inside a **MasterService** task (e.g. submitted via `masterService.submitStateUpdateTask(...)` or a task queue) runs **only on the node that is currently the elected master**.
169+
- The executor’s `runOnlyOnMaster()` (default `true`) means: if this node loses mastership before the task runs, the task is failed with `NotMasterException` and not executed.
170+
- So: **if you see a `ClusterStateUpdateTask` / executor registered with MasterService, that code runs only on the master.**
171+
172+
- **Explicit checks in code**
173+
- `clusterState.nodes().isLocalNodeElectedMaster()` — “am I the current master?”
174+
- `assert clusterState.nodes().isLocalNodeElectedMaster()` — often used in code that is only supposed to run on the master.
175+
- `state.nodes().getMasterNodeId()` — who is master (node id); compare with `state.nodes().getLocalNodeId()` to see “am I master?”.
176+
177+
So when reading code:
178+
179+
1. **Master-only logic**: Look for **MasterService** submissions or **`isLocalNodeElectedMaster()`** (or assertions using it). That code path runs only on the elected master.
180+
2. **Data-node / shard logic**: Look for **shard-level** operations (e.g. `IndexShard`, `TransportReplicationAction` executing on a shard). Those run on nodes that **hold that shard** (data/index/search nodes). You can also check `localNode().canContainData()` in branches that are only relevant for data-holding nodes.
181+
3. **Any node**: If there is no MasterService and no “only on master” or “only on data” check, the code may run on any node (e.g. coordination, or routing).
182+
183+
### 2.3 From Settings (startup / config) rather than cluster state
184+
185+
When the question is “what **kind** of node is this configured to be?” (e.g. in constructors or at bootstrap):
186+
187+
- `DiscoveryNode.isMasterNode(Settings settings)` — master-eligible?
188+
- `DiscoveryNode.hasDataRole(Settings settings)` — has the `data` role?
189+
- `DiscoveryNode.canContainData(Settings settings)` — any role that can contain data?
190+
- `DiscoveryNode.isIngestNode(Settings settings)` — ingest role?
191+
192+
These use `node.roles` (or default roles) from config and don’t require cluster state.
193+
194+
### 2.4 Quick lookup table (code)
195+
196+
| Question | Code / pattern |
197+
|----------|----------------|
198+
| Am I the local node? | `clusterService.localNode()` or `clusterState.nodes().getLocalNode()` |
199+
| Is this node master-eligible? | `localNode.isMasterNode()` |
200+
| Am I the current elected master? | `clusterState.nodes().isLocalNodeElectedMaster()` |
201+
| Does this node hold data? | `localNode.canContainData()` |
202+
| Is this node ingest? | `localNode.isIngestNode()` |
203+
| This block runs only on master | Code inside a **MasterService** task, or guarded by `isLocalNodeElectedMaster()` |
204+
| This block runs on data nodes | Shard-level actions or checks like `localNode().canContainData()` |
205+
206+
Using this, you can reliably tell from the code whether a given path runs on master nodes, data nodes, or both.

0 commit comments

Comments
 (0)