-
Notifications
You must be signed in to change notification settings - Fork 13
chore: Add doc and rename function for flushing strategy #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -8,13 +8,42 @@ pub struct PeriodicStrategy { | |||
|
|||
#[derive(Clone, Copy, Debug, PartialEq)] | |||
pub enum FlushStrategy { | |||
// Flush every 1s and at the end of the invocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc is moved from flush_control.rs
@@ -57,12 +52,15 @@ impl FlushControl { | |||
i.set_missed_tick_behavior(Skip); | |||
i | |||
} | |||
// TODO: Why is this 15 minutes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need help here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the history/blame can be helpful here: #599
This timeout is used for the race flush block. But when users want to flush at end, we should never hit the race flush. Initially it was set to a max unsigned int but that caused tokio to integer overflow when calculating the timeout
@@ -40,22 +47,23 @@ impl InvocationTimes { | |||
let should_adapt = (elapsed as f64 / (LOOKBACK_COUNT - 1) as f64) < ONE_TWENTY_SECONDS; | |||
if should_adapt { | |||
// Both units here are in seconds | |||
// TODO: What does this mean? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need help here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment addresses an earlier misconception when the author (me) thought that the flush_timeout was in milliseconds, so the elapsed time was then in seconds and it never hit.
The logic says that if we should adapt to a periodic strategy (which is defined above on 47, if the last 20 requests arrived within 2 minutes). The next bit decides if we should use the continuous strategy or if we should use a regular periodic strategy, which from your comment above on line 37-41 seems you've got handled pretty well.
I hope this helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I was not clear. What I don't understand is why compare elapsed
with flush_timeout
.
For example, if flush_timeout
is 30s, and there's one invocation every 1.6s, then elapsed
will be 1.6s * 19 = 30.4s, and we will choose the synchronous "periodic" strategy with 20s interval. If flush takes 29s, then this will block lots of invocations.
Should we compare the average invocation interval (elapsed / (LOOKBACK_COUNT - 1)
) with flush_timeout
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that we want to make sure the periodicity is fast enough that nonblocking flushing is unlikely to be broken by lambda pausing/freezing the CPU between invocations. Because the client will timeout after the flush_timeout
, we make this comparison. We're just choosing 20 invocations arbitrarily to pick "very fast functions".
It's unrelated to what happens if a periodic flush takes 29s
. If we choose the periodic strategy, we'll block on the call to /next
until the periodic flush is complete. If we choose the continuous flush, we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the overall idea that the threshold of avg interval should be positively correlated to flush_timeout
, but I don't get why we directly compare elapsed
(which is 19 * avg interval) with flush_timeout
. I'm thinking of an alternative threshold, which more aggressively chooses the continuous strategy: (ignore int/float conversions and division errors in the code below)
let avg_interval = elapsed / (LOOKBACK_COUNT - 1);
if avg_interval * 4 < flush_timeout {
return Continuously;
} else {
return Periodically;
}
This is similar to the current code, except that we use 4 here instead of 19. About this threshold:
- Usually it's not likely that the actual interval between any two invocations is longer than
4 * avg_interval
, so the freezing duration is unlikely longer than4 * avg_interval
, so the risk of flush timeout is small. - On the up side, it brings the benefits of the continuous strategy for a wider range of
avg_interval
values, i.e. we will transition fromPeriodic
toContinuous
when it's in the range of(flush_timeout / 19, flush_timeout / 4)
, i.e.(1.58s, 7.5s)
ifflush_timeout == 30s
.
What do you think?
We can talk in person tomorrow if that's easier.
d280bea
to
09e9c06
Compare
Need another review. I made |
(elapsed as f64 / (LOOKBACK_COUNT - 1) as f64) < ONE_TWENTY_SECONDS; | ||
if should_adapt { | ||
// Both units here are in seconds | ||
// TODO: What does this mean? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that answer your question (in the other thread)?
Motivation
It took me quite some effort to understand flushing strategies. I want to make it easier to understand for me and future developers.
This PR
Tries to make flushing strategy code more readable:
ConcreteFlushStrategy
, which doesn't containDefault
because it is required to be resolved to a concrete strategyshould_adapt
toevaluate_concrete_strategy()
To reviewers
There are still a few things I don't understand, which are marked with
TODO
. Appreciate explanation!Also correct me if any comment I added is wrong.