-
Notifications
You must be signed in to change notification settings - Fork 13
Closed
Description
Done Criteria
There is good monitoring/alarming on observer/adder that FilOz runs so we are notified proactively if there are any production issues.
Why Important
filecoin-project/lotus#13208 revealed that the FilOz-run observer instance had silently not been collecting data for a month due to storage space being filled up. This in turn mean the f3-aider instance wasn't helping with rebroadcasting messages. This in turn helped lead to F3 not making progress for multiple days.
User/Customer
Implementers who will get called in if F3 stops making progress.
Overall network participants who care about faster finality.l
Tasks
- Add monitoring for the f3-observer storage space. We should get a slack notification if storage fills up to 80%.
- Effectively addressed by the monitors described in Better operationalize the FilOz-run f3-observer and f3-aider #1041 (comment)
- Add monitoring for the f3-observer process. If the process isn't emitting a healthy heartbeat, then we should alarm.
- Effectively addressed by the monitors described in Better operationalize the FilOz-run f3-observer and f3-aider #1041 (comment)
-
Add monitoring for the f3-aider process. If the process isn't emitting a healthy heartbeat, then we should alarm.- We decided not to do this per Better operationalize the FilOz-run f3-observer and f3-aider #1041 (comment)
- (bonus) Setup storage pruning if the f3-observer storage is filling up.
Notes
- The actual infrastructure for this will run in https://github.com/filecoin-project/filoz-infra, but this issue was created in go-f3 since that repo is public and to give better visibility to the work.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done