ThreadX: Fix auto-reconnect and session cleanup by g4sp3r · Pull Request #1126 · eclipse-zenoh/zenoh-pico

g4sp3r · 2025-12-23T19:31:03Z

In our system (stm32 - threadx) with serial link to rmw_zenohd we experience issues if router gets restarted - zenoh-pico client would just keep publishing with no indication anything is wrong.
After restarting the router the topics are of course not shown because no re-connection attempt is done by the client. This is even with enabling Z_FEATURE_AUTO_RECONNECT.

First issue is found in src/transport/unicast/read.c where peer->common._received is always set to true even if no data is received. Probably this problem was created by STM32 ThreadX network.c implementation which uses non blocking read in _z_read_serial_internal. Non-blocking read was needed because otherwise client never re-sends the initial handshake to the router, meaning router would need to be started before the client.
With 1 fixed, the client now detects loss of lease when the router is restarted. Next problem is closing and reopening the connection. Some ThreadX system support functions were missing to properly stop all running tasks and are added in this PR.

We tried using Z_FEATURE_AUTO_RECONNECT unsuccessfully, so in our system we currently check if the read_task is still running, if not we undeclare pub/sub and close the session. After small timeout we again open the session and declare pub/sub.
Like this we finally have a system that survives common communication disturbances without any reboots.

🏷️ Label-Based Checklist

No specific label requirements detected.

Current labels: No labels

Add one of these labels to this PR to see relevant checklist items: api-sync, breaking-change, bug, ci, dependencies, documentation, enhancement, new feature, internal

This section updates automatically when labels change.

…imeslice.

…ived = true.

g4sp3r · 2025-12-25T21:51:25Z

Needs more work to support Z_FEATURE_AUTO_RECONNECT=1

Implemented cleanup queue for threads that need to self clean up. Memory is released when creating new threads. In current implementation new threads are created before _z_task_exit() is called so there is always one z_task waiting to be released between reconnects.

Allows closing session and undeclaring resources when lease expires. When lease expires the link is cleared. In case of Z_FEATURE_AUTO_RECONNECT=0, we want to call z_close, z_drop_ and z_undeclare_ to release resources but that sends messages on the closed link.

g4sp3r · 2025-12-27T00:37:32Z

Current version working with both manual and automatic reconnect.

Added cleanup queue for tasks that need to self-terminate (auto-reconnect case). Tasks are freed from the queue when creating new tasks. Currently one task remains in queue between reconnects due to _z_reopen() creating new tasks before _z_task_exit() is called.

Another issue is found in manual reconnect case where trying to drop session and undeclare resources after lease expires leads to sending messages with cleared link. This locks the thread in our system in assert->exit() and leads to segmentation fault in tests with tcp Linux client. Adding the NULL checks in send_msg allows us to release all the memory before re-opening the connection again.

After quick test of memory usage we observe ~22KB used with Z_FEATURE_AUTO_RECONNECT=0 and ~40KB used with Z_FEATURE_AUTO_RECONNECT=1. Considering memory usage is x2 and we currently loose another 4KB with the "stuck" task we will be using the manual re-connection method handled by application logic for now.

Should not use TX_WAIT_FOREVER from ISR. Also properly check the return code. Messages are now dropped if read thread can't keep up with the load.

g4sp3r · 2026-01-30T15:40:55Z

Pinging again for review. Changes in src/transport/... are kind of unexpected at this stage to say the least.

The PR is already running in real deployment.

shoniecaplan · 2026-02-20T05:50:50Z

Following this

g4sp3r added 4 commits December 23, 2025 19:46

ThreadX: add missing thread managment in system.c. Increase default t…

e10b306

…imeslice.

Unicast:read: check data size is >0 before setting peer->common._rece…

9daf792

…ived = true.

Formating

23482f6

partly revert formating

a49bb24

g4sp3r marked this pull request as draft December 25, 2025 21:49

g4sp3r added 2 commits December 26, 2025 14:57

g4sp3r changed the title ~~ThreadX: Fix unicast link detection. Fix closing connection. (Router restart handling)~~ ThreadX: Fix auto-reconnect and session cleanup Dec 27, 2025

g4sp3r added 2 commits December 27, 2025 14:06

clang-format 18.1.8

aaa2fb3

ThreadX: fix blocking tx_semaphore_get in UART ISR.

e71e34e

Should not use TX_WAIT_FOREVER from ISR. Also properly check the return code. Messages are now dropped if read thread can't keep up with the load.

g4sp3r marked this pull request as ready for review January 9, 2026 10:11

g4sp3r mentioned this pull request Feb 4, 2026

Support NetX Duo #1144

Open

g4sp3r mentioned this pull request Mar 3, 2026

Resource cleanup Pico-ROS/Pico-ROS-software#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThreadX: Fix auto-reconnect and session cleanup#1126

ThreadX: Fix auto-reconnect and session cleanup#1126
g4sp3r wants to merge 8 commits intoeclipse-zenoh:mainfrom
g4sp3r:main

g4sp3r commented Dec 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

g4sp3r commented Dec 25, 2025

Uh oh!

g4sp3r commented Dec 27, 2025

Uh oh!

g4sp3r commented Jan 30, 2026

Uh oh!

shoniecaplan commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

g4sp3r commented Dec 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🏷️ Label-Based Checklist

Uh oh!

g4sp3r commented Dec 25, 2025

Uh oh!

g4sp3r commented Dec 27, 2025

Uh oh!

g4sp3r commented Jan 30, 2026

Uh oh!

shoniecaplan commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

g4sp3r commented Dec 23, 2025 •

edited by github-actions bot

Loading