chore: Add doc and rename function for flushing strategy #740

lym953 · 2025-07-11T20:39:32Z

Motivation

It took me quite some effort to understand flushing strategies. I want to make it easier to understand for me and future developers.

This PR

Tries to make flushing strategy code more readable:

Add/move comments
Create an enum ConcreteFlushStrategy, which doesn't contain Default because it is required to be resolved to a concrete strategy
Rename should_adapt to evaluate_concrete_strategy()

To reviewers

There are still a few things I don't understand, which are marked with TODO. Appreciate explanation!
Also correct me if any comment I added is wrong.

lym953 · 2025-07-11T20:39:56Z

bottlecap/src/config/flush_strategy.rs

@@ -8,13 +8,42 @@ pub struct PeriodicStrategy {

 #[derive(Clone, Copy, Debug, PartialEq)]
 pub enum FlushStrategy {
+    // Flush every 1s and at the end of the invocation


Some doc is moved from flush_control.rs

lym953 · 2025-07-11T20:42:54Z

bottlecap/src/lifecycle/flush_control.rs

@@ -57,12 +52,15 @@ impl FlushControl {
                i.set_missed_tick_behavior(Skip);
                i
            }
+            // TODO: Why is this 15 minutes?


Need help here

Looking at the history/blame can be helpful here: #599

This timeout is used for the race flush block. But when users want to flush at end, we should never hit the race flush. Initially it was set to a max unsigned int but that caused tokio to integer overflow when calculating the timeout

lym953 · 2025-07-11T20:43:07Z

bottlecap/src/lifecycle/invocation_times.rs

@@ -40,22 +47,23 @@ impl InvocationTimes {
        let should_adapt = (elapsed as f64 / (LOOKBACK_COUNT - 1) as f64) < ONE_TWENTY_SECONDS;
        if should_adapt {
            // Both units here are in seconds
+            // TODO: What does this mean?


Need help here

The comment addresses an earlier misconception when the author (me) thought that the flush_timeout was in milliseconds, so the elapsed time was then in seconds and it never hit.

The logic says that if we should adapt to a periodic strategy (which is defined above on 47, if the last 20 requests arrived within 2 minutes). The next bit decides if we should use the continuous strategy or if we should use a regular periodic strategy, which from your comment above on line 37-41 seems you've got handled pretty well.

I hope this helps!

Sorry if I was not clear. What I don't understand is why compare elapsed with flush_timeout.

For example, if flush_timeout is 30s, and there's one invocation every 1.6s, then elapsed will be 1.6s * 19 = 30.4s, and we will choose the synchronous "periodic" strategy with 20s interval. If flush takes 29s, then this will block lots of invocations.

Should we compare the average invocation interval (elapsed / (LOOKBACK_COUNT - 1)) with flush_timeout instead?

The idea is that we want to make sure the periodicity is fast enough that nonblocking flushing is unlikely to be broken by lambda pausing/freezing the CPU between invocations. Because the client will timeout after the flush_timeout, we make this comparison. We're just choosing 20 invocations arbitrarily to pick "very fast functions".

It's unrelated to what happens if a periodic flush takes 29s. If we choose the periodic strategy, we'll block on the call to /next until the periodic flush is complete. If we choose the continuous flush, we don't.

I get the overall idea that the threshold of avg interval should be positively correlated to flush_timeout, but I don't get why we directly compare elapsed (which is 19 * avg interval) with flush_timeout. I'm thinking of an alternative threshold, which more aggressively chooses the continuous strategy: (ignore int/float conversions and division errors in the code below)

let avg_interval = elapsed / (LOOKBACK_COUNT - 1); if avg_interval * 4 < flush_timeout { return Continuously; } else { return Periodically; }

This is similar to the current code, except that we use 4 here instead of 19. About this threshold:

Usually it's not likely that the actual interval between any two invocations is longer than 4 * avg_interval, so the freezing duration is unlikely longer than 4 * avg_interval, so the risk of flush timeout is small.

On the up side, it brings the benefits of the continuous strategy for a wider range of avg_interval values, i.e. we will transition from Periodic to Continuous when it's in the range of (flush_timeout / 19, flush_timeout / 4), i.e. (1.58s, 7.5s) if flush_timeout == 30s.

What do you think?

We can talk in person tomorrow if that's easier.

lym953 · 2025-07-14T19:02:07Z

Need another review. I made evaluate_concrete_strategy() accept a FlushStrategy to avoid the unreachable!().

astuyve · 2025-07-15T16:58:47Z

bottlecap/src/lifecycle/invocation_times.rs

+                    (elapsed as f64 / (LOOKBACK_COUNT - 1) as f64) < ONE_TWENTY_SECONDS;
+                if should_adapt {
+                    // Both units here are in seconds
+                    // TODO: What does this mean?


Did that answer your question (in the other thread)?

lym953 commented Jul 11, 2025

View reviewed changes

lym953 marked this pull request as ready for review July 11, 2025 20:43

lym953 requested a review from a team as a code owner July 11, 2025 20:43

astuyve approved these changes Jul 14, 2025

View reviewed changes

lym953 added 4 commits July 14, 2025 14:58

chore: Add doc and rename function for flushing strategy

053c452

fmt

3b99c03

Make evaluate_concrete_strategy() accept FlushStrategy param

ccd1819

Add comment

09e9c06

lym953 force-pushed the yiming.luo/flush-strategy-doc branch from d280bea to 09e9c06 Compare July 14, 2025 18:58

lym953 requested a review from astuyve July 15, 2025 14:11

astuyve reviewed Jul 15, 2025

View reviewed changes

astuyve approved these changes Jul 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Add doc and rename function for flushing strategy #740

chore: Add doc and rename function for flushing strategy #740

Uh oh!

lym953 commented Jul 11, 2025 •

edited

Loading

Uh oh!

lym953 Jul 11, 2025

Uh oh!

lym953 Jul 11, 2025

Uh oh!

astuyve Jul 14, 2025

Uh oh!

lym953 Jul 11, 2025

Uh oh!

astuyve Jul 14, 2025

Uh oh!

lym953 Jul 14, 2025 •

edited

Loading

Uh oh!

astuyve Jul 15, 2025

Uh oh!

lym953 Jul 15, 2025 •

edited

Loading

Uh oh!

lym953 commented Jul 14, 2025

Uh oh!

astuyve Jul 15, 2025

Uh oh!

Uh oh!

chore: Add doc and rename function for flushing strategy #740

Are you sure you want to change the base?

chore: Add doc and rename function for flushing strategy #740

Uh oh!

Conversation

lym953 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

This PR

To reviewers

Uh oh!

lym953 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

lym953 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

astuyve Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

lym953 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

astuyve Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

lym953 Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astuyve Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

lym953 Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lym953 commented Jul 14, 2025

Uh oh!

astuyve Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lym953 commented Jul 11, 2025 •

edited

Loading

lym953 Jul 14, 2025 •

edited

Loading

lym953 Jul 15, 2025 •

edited

Loading