Skip to content

Continuous aggregate invalidation log#9210

Draft
svenklemm wants to merge 5 commits intomainfrom
sven/cagg2
Draft

Continuous aggregate invalidation log#9210
svenklemm wants to merge 5 commits intomainfrom
sven/cagg2

Conversation

@svenklemm
Copy link
Member

@svenklemm svenklemm commented Jan 30, 2026

  • Add column invalidation_log to _timescaledb_catalog.continuous_agg

@svenklemm svenklemm changed the title sven/cagg2 Continuous aggregate invalidation log Jan 30, 2026
@svenklemm svenklemm force-pushed the sven/cagg2 branch 4 times, most recently from de6bfc5 to 90e63a1 Compare January 30, 2026 14:26
@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 94.82759% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/ts_catalog/continuous_agg.c 80.00% 0 Missing and 2 partials ⚠️
tsl/src/continuous_aggs/insert.c 92.85% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment on lines +281 to +290
invalidation_cagg_add_entries(int32 ht_id, Datum start, Datum end)
{
ContinuousAggInfo info = ts_continuous_agg_get_all_caggs_info(ht_id);
ListCell *lc;
foreach (lc, info.mat_hypertable_ids)
{
int32 mat_ht_id = lfirst_int(lc);
invalidation_cagg_add_entry(mat_ht_id, start, end);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svenklemm will we change this semantic in this PR? Today all the invalidation logs are per-hypertable and then the refresh procedure move it to the cagg invalidation log. According to the design document we aggreed it will not change now since we should benchmark first if it will not hurt performance for cases where a user have 1 hypertable and N caggs. /cc @gayyappan

Copy link
Member

@gayyappan gayyappan Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @fabriziomello and I discussed the potential problem with this.
With the current invalidation approach, we have 2x writes.
Directlyw riting to the mat invalidation log tables will result N times write amplification (N =#of caggs defined on the same hypertable) and slow down the regular DML path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @fabriziomello and I discussed the potential problem with this.

With the current invalidation approach, we have 2x writes.

Directlyw riting to the mat invalidation log tables will result N times write amplification (N =#of caggs defined on the same hypertable) and slow down the regular DML path.

Just to clarify this change will slow down DML changes only when it produces invalidation logs, so it can potentially affect backfills.

@gayyappan didn't got your point about 2x writes. Currently we write invalidations based on chunks affected in the hypertable invalidation log. With this implementation this number of rows will be inserted N times depending of the number of associated CAGGs.

There's a tradeoff moving to this implementation and the price is potentially slow down backfill operations. Is this a big problem? I really don't know and prefer to have some numbers before decide to move to this direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @fabriziomello and I discussed the potential problem with this.
With the current invalidation approach, we have 2x writes.
Directlyw riting to the mat invalidation log tables will result N times write amplification (N =#of caggs defined on the same hypertable) and slow down the regular DML path.

Just to clarify this change will slow down DML changes only when it produces invalidation logs, so it can potentially affect backfills.

@gayyappan didn't got your point about 2x writes. Currently we write invalidations based on chunks affected in the hypertable invalidation log. With this implementation this number of rows will be inserted N times depending of the number of associated CAGGs.

There's a tradeoff moving to this implementation and the price is potentially slow down backfill operations. Is this a big problem? I really don't know and prefer to have some numbers before decide to move to this direction.

Sorry, my phrasing was not precise. The write amplification is 1 extra write per transaction behind the invalidation threshold (i.e. this is not isolated to backfills. e.g. 5 minute caggs and hypertable with weekly chunks). With this change, the write amplification is N times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants