Skip to content

Commit bc74501

Browse files
committed
docs: additional notes on troubleshooting messages
Signed-off-by: Andrew Richardson <[email protected]>
1 parent ee96117 commit bc74501

File tree

1 file changed

+22
-10
lines changed

1 file changed

+22
-10
lines changed

doc-site/docs/troubleshooting/undelivered_messages.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,20 @@ In general, FireFly messages come in three varieties:
1616

1717
1. **Unpinned private messages:** private messages delivered directly via data exchange
1818
2. **Pinned private messages:** private messages delivered via data exchange, with a hash of the message recorded on the blockchain ledger
19-
3. **Pinned broadcast messages:** messages stored in IPFS, with a hash and reference to the message shared on the blockchain ledger
19+
3. **Pinned broadcast messages:** messages stored in IPFS, with a hash and reference to the message shared
2020

21-
Note that all messages are batched for efficiency, but in cases of low throughput, you may frequently see batches
21+
All messages are batched for efficiency, but in cases of low throughput, you may frequently see batches
2222
containing exactly one message.
2323

2424
"Pinned" messages are those that use the blockchain ledger for reliable timestamping and ordering. These messages have
2525
two pieces which must be received before the message can be processed: the **batch** is the actual contents of
2626
the message(s), and the **pin** is the lightweight blockchain transaction that records the existence and ordering of
2727
that batch. We frequently refer to this combination as a **batch-pin**.
2828

29+
> Note: there is a fourth type of message denoted with the type "definition", used for things such as identitity claims
30+
> and advertisement of contract APIs. For most troubleshooting purposes these can be treated the same as pinned
31+
> broadcast messages, as they follow the same pattern (with only a few additional processings steps inside FireFly).
32+
2933
## Symptoms
3034

3135
When some part of the multiparty messaging infrastructure requires troubleshooting, common symptoms include:
@@ -39,9 +43,13 @@ When troubleshooting one of the symptoms above, the main goal is to identify the
3943
experiencing an issue. This can lead you to diagnose specific issues such as misconfiguration, network problems, database
4044
integrity problems, or potential code bugs.
4145

42-
In all cases, the **batch ID** is the most critical piece of data for determining the nature of the issue. This ID will be the
43-
same on all nodes involved in the messaging flow. The following two steps can be easily performed to check for the existence
44-
of the expected items:
46+
In all cases, the **batch ID** is the most critical piece of data for determining the nature of the issue. You can usually
47+
retrieve the batch for a particular message by querying `/messages/<message-id>` and looking for the `batch` field in the returned
48+
response. In rare cases, if this is not populated, you can also retrieve the message transaction via `/messages/<message-id>/transaction`,
49+
and then you can use the transaction ID to query `/batches?tx.id=<transaction-id>`.
50+
51+
The batch ID will be the same on all nodes involved in the messaging flow. Therefore, the following two steps can be
52+
easily performed to check for the existence of the expected items:
4553

4654
- query `/batches/<batch-id>` on each node that should have the message
4755
- query `/pins?batch=<batch-id>` on each node that should have the message (for pinned messages only)
@@ -50,18 +58,19 @@ Then choose one of these scenarios to focus in on an area of interest:
5058

5159
#### 1) Is the batch missing on a node that should have received it?
5260

53-
For private messages, this indicates a potential problem with **data exchange**. Check the sending node to see if the
54-
operations succeeded when sending the batch via data exchange, and check that the data exchange runtime is healthy.
61+
For private messages, this indicates a potential problem with **data exchange**. Check the sending node to see if the FireFly
62+
operations succeeded when sending the batch via data exchange, and check the data exchange logs for any issues processing it
63+
(the FireFly operation ID can be used to trace the operation through data exchange as well).
5564
If an operation failed on the sending node, you may need to retry it with `/operations/<op-id>/retry`.
5665

57-
For broadcast messages, this indicates a potential problem with **IPFS**. Check the sending node to see if the
66+
For broadcast messages, this indicates a potential problem with **IPFS**. Check the sending node to see if the FireFly
5867
operations succeeded when uploading the batch to IPFS, and the receiving node to see if the operations succeeded when
5968
downloading the batch from IPFS. If an operation failed, you may need to retry it with `/operations/<op-id>/retry`.
6069

6170
#### 2) Is the batch present, but the pin is missing?
6271

6372
This indicates a potential problem with the **blockchain connector**. Check if the underlying blockchain node is
64-
healthy and mining blocks. Check the sending node to see if the operation succeeded when pinning the batch via the
73+
healthy and mining blocks. Check the sending FireFly node to see if the operation succeeded when pinning the batch via the
6574
blockchain. Check the blockchain connector logs (such as evmconnect or fabconnect) to see if it is
6675
successfully processing events from the blockchain, or if it is encountering any errors before forwarding those events
6776
on to FireFly.
@@ -71,7 +80,10 @@ on to FireFly.
7180
Check the pin details to see if it contains a field `"dispatched": true`. If this field is false or missing, it means
7281
that the pin was received but couldn't be matched successfully with the off-chain batch contents. Check the FireFly
7382
logs and search for the batch ID - likely this issue is in FireFly and it will have logged some problem while
74-
aggregating the batch-pin.
83+
aggregating the batch-pin. In some cases, the FireFly logs may indicate that the pin could not be dispatched because
84+
it was "stuck" behind another pin on the same context - so you may need to follow the trail to a batch-pin for a
85+
different batch and determine why that earlier one was not processed (by starting over on this rubric
86+
and troubleshooting that batch).
7587

7688
## Opening an issue
7789

0 commit comments

Comments
 (0)