Skip to content

Subtle corruption in RP2040 divider in threadsafe_background mode due to unsafe SDK defaults #2731

@earlephilhower

Description

@earlephilhower

On the RP2040 there's a hardware divider which, I assume, is used by the SDK's ABI wrappers to handle integer division primitives (among other things). This is all well and good and runs nice and fast compared to the ARM aeabi stuff.

However, when running LWIP using threadsafe_background mode, LWIP callbacks are executed inside an IRQ context. So there's a possibility of the main code starting a division and getting an async task IRQ to do some TCP stuff (i.e. "write(socket, some_value/scalefactor)") which would also use the HW divider, trashing the state for the main app on IRQ return.

There are almost certainly integer divisions within LWIP code proper, so I don't think in general that it's something the user can control through careful coding.

So, I think it may make sense for the SDK to silently add the PICO_DIVIDER_DISABLE_INTERRUPTS=1 when async_context_background_threadsafe is used (or dump a #warning or #error).

Background

I have a user stress-testing the WiFi connection chip (100s/1000s of disconnects and transmits) and running into some really strange issues which I think might be related to this. Essentially we use (time_us_32() / 1000L) + 5000 to set some timeouts (in milliseconds) and he's reported a random error where that division is not monotonically increasing and we end up waiting for up to ~32b wraparound milliseconds (74 min?) in some cases.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions