Skip to content

fix: Prefetch messages in limited batches (#6915) #7031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iequidoo
Copy link
Collaborator

@iequidoo iequidoo commented Jul 22, 2025

No description provided.

@iequidoo
Copy link
Collaborator Author

iequidoo commented Jul 22, 2025

The actual problem for that user is that a timeout happens right after prefetching messages, so there's no progress over time:

...
2025-07-20T15:40:25.640Z    core/event              INFO    ""  1   "src/imap.rs:627: Not moving the message 51d56fb8-5f2d-441f-a01c-a29ff076304d@localhost that we have seen before."
2025-07-20T15:40:27.274Z    core/event              INFO    ""  1   "src/imap.rs:1323: Starting a full FETCH of message set \"243585:243648,243650:244098,244100:244813,244815:245041,245043:247823,247825:248833,248835:249504,249506:252485,252487:253338,253340:254613,254615:254643,254645,254644,254646:255159,255161:257016,257018:257400\"."
2025-07-20T15:40:27.293Z    core/event              WARNING ""  1   "src/scheduler.rs:731: Failed fetch_idle: fetch_move_delete: fetch_new_messages: fetch_many_msgs: Failed to process IMAP FETCH result: io: timed out: timed out"

Before this timeout there are lots of "Not moving the message..." messages within minutes, so i think adding a limit to prefetch() may help. Prefetching w/o a limit doesn't scale anyway. OTOH it looks like we have some bug in timeouts handling and this fix may hide it.

@link2xt maybe you have an idea why this timeout happens?

@link2xt
Copy link
Collaborator

link2xt commented Jul 22, 2025

There is a just fixed (core 2.4.0) bug in timeout handling: #7019
Timeout immediately after fetch start could be explained by the stream already being in a broken state due to a previous timeout that was ignored (otherwise we would not have reached fetch call).

src/imap.rs Outdated
let mut fetch_more = false;
if fetch_res.is_ok() {
let prefetch_uid_next = prefetch_uid_last + 1;
fetch_more = prefetch_uid_next < mailbox_uid_next;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mailbox_uid_next is updated when the folder is selected, so if we reached it with prefetch_uid_next, it does not mean there are no more mails. They could have arrived while we were looping.

Instead of loop with break, I'd split the interval between last remembered UIDNEXT and mailbox_uid_next into chunks of 100 (or 1000 like it is now) UIDs, and then fetch L..L+100, L+101..L+200, L+201..* instead. So the last chunk definitely picks up all the remaining mails that could have arrived while we were fetching.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also prefetch and fetch of a single batch can be moved into a separate function maybe, fetch_new_messages is quite large.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mailbox_uid_next is updated when the folder is selected, so if we reached it with prefetch_uid_next, it does not mean there are no more mails. They could have arrived while we were looping.

True, but the problem already exists currently because we don't prefetch messages again after fetching (and fetching takes more time). I thought about it, but was unsure that issuing an extra "UID FETCH" in a typical scenario when a couple of messages arrived is good. Maybe it should be fixed in Session::prefetch() -- it should look at uid_next of the mailbox (not fetch_new_messages()), and if it's not too large, replace the upper bound with *. Anyway, this doesn't guarantee that no new messages arrive before we issue IDLE and then, maybe just go idle and let the server notify us? This should be the protocol work IMU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also prefetch and fetch of a single batch can be moved into a separate function maybe, fetch_new_messages is quite large.

Done, looks better indeed. Also fixed read_cnt calculation. Also reduced batch size to 500 messages, 1000 may be too much. As for trying to prefetch w/o the upper limit in the last batch, i suggest to not complicate the code. In fact this PR makes the probability of new messages arrived in parallel but not processed before IDLE even less because the last batch will often overlap the mailbox's UIDNEXT and the time window when new messages may arrive is reduced greatly. Anyway going to IDLE one more time isn't a big problem, may be better than issuing an extra UID FETCH because even if we do UID FETCH ...:* at the last iteration and see a new message, we should redo it because processing that message takes some time and new messages may have arrived.

@iequidoo
Copy link
Collaborator Author

iequidoo commented Jul 24, 2025

Timeout immediately after fetch start could be explained by the stream already being in a broken state due to a previous timeout that was ignored (otherwise we would not have reached fetch call).

Looked at the log again, but it doesn't look like the timeout occurs during prefetching because it happens several times during ~30 min and "Not moving the message..." log sheet only grows 11 messages over time, so it seems that prefetch() gives out all responses successfully.

Can it happen because of inactivity on the socket TX? Because we don't issue any commands for minutes.

@link2xt
Copy link
Collaborator

link2xt commented Jul 24, 2025

Can it happen because of inactivity on the socket TX?

As far as I see tokio-io-timeout resets timeout on successful poll_write or poll_write_vectored, so this should not happen. Also read timeout is separate from write timeout. I suspect write timeout more or less never happens because there is an OS buffer and it is not full most of the time.

@iequidoo iequidoo force-pushed the iequidoo/prefetch-limit branch from 7531cdc to 1807934 Compare July 25, 2025 08:19
I have logs from a user where messages are prefetched for long minutes, and while it's not a problem
on its own, we can't rely that the connection overlives such a period, so make
`fetch_new_messages()` prefetch (and then actually download) messages in batches of 500 messages.
@iequidoo iequidoo force-pushed the iequidoo/prefetch-limit branch from 1807934 to f8ce467 Compare July 25, 2025 08:20
@iequidoo iequidoo requested a review from link2xt July 25, 2025 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants