Problems related to pruning old AppUsageEvent and ServiceUsageEvent records

### Problem

A problem we've been facing involves the pruning of old AppUsageEvent and ServiceUsageEvent records. Often, these records are removed before the corresponding Apps or Services have actually stopped, making it difficult to determine how long those resources have been running. If a consumer starts polling after the `start` record is pruned, it won't know the true start time of that App or Service.

### Challenges and Use of Purge/Seed

A specific pain point relates to the `destructively_purge_all_and_reseed` endpoints for App and Service Usage Events. These endpoints are often used by a consumer when they initially start consuming event records or when they realize they have missed event records that have now been pruned. While `destructively_purge_all_and_reseed` recreates running resources in the database, it assigns new `start` timestamps that do not reflect actual creation or launch times. As a result, usage metrics can become misleading.

### Core Problems
* Pruning Before Completion
    * The system prunes old records to manage database growth. However, if an App/Service remains running for a long period, its `start` record may be deleted before the `stop` record exists
    * A newly added or recovering consumer will not see accurate start times
* Extended Downtime Leading to Missed Events
    * Sometimes, a usage-event polling service may go offline for an extended period (e.g. an unnoticed crash). By the time it resumes polling, older events may have been pruned, leaving gaps in historical data
* Accurate State Visibility
    * It becomes challenging to piece together which Apps or Services are still running when critical events have already been removed, forcing reliance on [destructively_purge_all_and_reseed](https://v3-apidocs.cloudfoundry.org/version/3.185.0/index.html#purge-and-seed-app-usage-events) to reset the data (where we lose accurate historical start times)

### Potential Approaches

After running into this issue repeatedly, I’ve created [a set of code changes](https://github.com/cloudfoundry/cloud_controller_ng/pull/4646) for addressing some of these issues:
* Keep `start` Records for Active Apps/Services
    * Records remain in place until the corresponding `stop` event is exists, preventing the loss of essential lifecycle information.
* Consumer Registration
    * By including `consumer_guid` and `after_guid` in usage-event requests, consumers can register themselves, allowing the Cloud Controller to avoid pruning events they have not yet processed
* Threshold-Based Pruning
    * A configurable limit (`threshold_for_keeping_unprocessed_records`) ensures the database does not grow indefinitely if a registered consumer stays offline. If the record count exceeds this threshold, older entries can still be pruned
* Endpoints for Managing Consumers
    * Operators or automated systems can view, remove, or otherwise manage registered consumers. This enables consumers to deregister themselves and make more informed decisions about when to request `destructively_purge_all_and_reseed`

### Questions for the Community
* Have folks run into a similar challenge with `start` events being pruned prematurely, leading to confusion about how long resources have been running?
* Have you had to use `destructively_purge_all_and_reseed` in a similar manner?
* Does retaining usage events of running Apps and Services sound like a beneficial idea? 
* Do consumer registration and threshold-based pruning strike a reasonable balance between data retention and database size management?
* Are there alternative approaches that could better manage event pruning while preserving critical usage data?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

Problem

Challenges and Use of Purge/Seed

Core Problems

Potential Approaches

Questions for the Community

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems related to pruning old AppUsageEvent and ServiceUsageEvent records #4182

Description

Problem

Challenges and Use of Purge/Seed

Core Problems

Potential Approaches

Questions for the Community

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions