Skip to content

Commit 34ae03e

Browse files
authored
Merge branch 'main' into lfdt-bot-patch-1
2 parents f36de41 + deac886 commit 34ae03e

File tree

4 files changed

+102
-0
lines changed

4 files changed

+102
-0
lines changed

doc-site/docs/SUMMARY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,7 @@
2828
* contributors/*
2929
* [API Spec](swagger/index.md)
3030
* [FAQs](faqs/index.md)
31+
* [Troubleshooting](troubleshooting/index.md)
32+
* troubleshooting/*
3133
* [Release Notes](releasenotes/index.md)
3234
* releasenotes/*
69 KB
Loading
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: Troubleshooting
3+
---
4+
5+
This section includes troubleshooting tips for identifying issues with a running FireFly node, and for gathering useful data before opening an issue.
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Undelivered messages
3+
---
4+
5+
When using FireFly in multiparty mode to deliver broadcast or private messages, one potential problem is that of
6+
undelivered messages. In general FireFly's message delivery service should be extremely reliable, but understanding
7+
when something has gone wrong (and how to recover) can be important for maintaining system health.
8+
9+
## Background
10+
11+
This guide assumes some familiarity with how
12+
[multiparty event sequencing](../architecture/multiparty_event_sequencing.md) works.
13+
In general, FireFly messages come in three varieties:
14+
15+
![FireFly Message Types](../images/firefly_message_types.png "FireFly Message Types")
16+
17+
1. **Unpinned private messages:** private messages delivered directly via data exchange
18+
2. **Pinned private messages:** private messages delivered via data exchange, with a hash of the message recorded on the blockchain ledger
19+
3. **Pinned broadcast messages:** messages stored in IPFS, with a hash and reference to the message shared
20+
21+
All messages are batched for efficiency, but in cases of low throughput, you may frequently see batches
22+
containing exactly one message.
23+
24+
"Pinned" messages are those that use the blockchain ledger for reliable timestamping and ordering. These messages have
25+
two pieces which must be received before the message can be processed: the **batch** is the actual contents of
26+
the message(s), and the **pin** is the lightweight blockchain transaction that records the existence and ordering of
27+
that batch. We frequently refer to this combination as a **batch-pin**.
28+
29+
> Note: there is a fourth type of message denoted with the type "definition", used for things such as identitity claims
30+
> and advertisement of contract APIs. For most troubleshooting purposes these can be treated the same as pinned
31+
> broadcast messages, as they follow the same pattern (with only a few additional processings steps inside FireFly).
32+
33+
## Symptoms
34+
35+
When some part of the multiparty messaging infrastructure requires troubleshooting, common symptoms include:
36+
37+
- a message was sent, but is not present on some other node where it should have been received
38+
- a message is stuck indefinitely in "sent" or "pending" state
39+
40+
## Troubleshooting steps
41+
42+
When troubleshooting one of the symptoms above, the main goal is to identify the specific piece of the infrastructure that is
43+
experiencing an issue. This can lead you to diagnose specific issues such as misconfiguration, network problems, database
44+
integrity problems, or potential code bugs.
45+
46+
In all cases, the **batch ID** is the most critical piece of data for determining the nature of the issue. You can usually
47+
retrieve the batch for a particular message by querying `/messages/<message-id>` and looking for the `batch` field in the returned
48+
response. In rare cases, if this is not populated, you can also retrieve the message transaction via `/messages/<message-id>/transaction`,
49+
and then you can use the transaction ID to query `/batches?tx.id=<transaction-id>`.
50+
51+
The batch ID will be the same on all nodes involved in the messaging flow. Therefore, the following two steps can be
52+
easily performed to check for the existence of the expected items:
53+
54+
- query `/batches/<batch-id>` on each node that should have the message
55+
- query `/pins?batch=<batch-id>` on each node that should have the message (for pinned messages only)
56+
57+
Then choose one of these scenarios to focus in on an area of interest:
58+
59+
#### 1) Is the batch missing on a node that should have received it?
60+
61+
For private messages, this indicates a potential problem with **data exchange**. Check the sending node to see if the FireFly
62+
operations succeeded when sending the batch via data exchange, and check the data exchange logs for any issues processing it
63+
(the FireFly operation ID can be used to trace the operation through data exchange as well).
64+
If an operation failed on the sending node, you may need to retry it with `/operations/<op-id>/retry`.
65+
66+
For broadcast messages, this indicates a potential problem with **IPFS**. Check the sending node to see if the FireFly
67+
operations succeeded when uploading the batch to IPFS, and the receiving node to see if the operations succeeded when
68+
downloading the batch from IPFS. If an operation failed, you may need to retry it with `/operations/<op-id>/retry`.
69+
70+
#### 2) Is the batch present, but the pin is missing?
71+
72+
This indicates a potential problem with the **blockchain connector**. Check if the underlying blockchain node is
73+
healthy and mining blocks. Check the sending FireFly node to see if the operation succeeded when pinning the batch via the
74+
blockchain. Check the blockchain connector logs (such as evmconnect or fabconnect) to see if it is
75+
successfully processing events from the blockchain, or if it is encountering any errors before forwarding those events
76+
on to FireFly.
77+
78+
#### 3) Are the batch and pin both present, but the messages from the batch are still stuck in "sent" or "pending"?
79+
80+
Check the pin details to see if it contains a field `"dispatched": true`. If this field is false or missing, it means
81+
that the pin was received but couldn't be matched successfully with the off-chain batch contents. Check the FireFly
82+
logs and search for the batch ID - likely this issue is in FireFly and it will have logged some problem while
83+
aggregating the batch-pin. In some cases, the FireFly logs may indicate that the pin could not be dispatched because
84+
it was "stuck" behind another pin on the same context - so you may need to follow the trail to a batch-pin for a
85+
different batch and determine why that earlier one was not processed (by starting over on this rubric
86+
and troubleshooting that batch).
87+
88+
## Opening an issue
89+
90+
It's possible that the above steps may lead to an obvious solution (such as recovering a crashed service or retrying a
91+
failed operation). If they do not, you can open an issue. The more detail you can include from the troubleshooting above
92+
(including the type of message, the nodes involved, and the details on the batch and pin found when examining each node),
93+
the more likely it is that someone can help to suggest additional troubleshooting. Full logs from FireFly, and (as
94+
deemed relevant from the troubleshooting above) full logs from the data exchange or blockchain connector runtimes, will
95+
also make it easier to offer additional insight.

0 commit comments

Comments
 (0)