Skip to content

Feature Request: Make model timeouts for unloading from memory configurableΒ #84

@chpiatt

Description

@chpiatt

Summary

Allow developers to set different timeouts when loading models for when the model is purged from memory.

Motivation

Many model use cases are latency-sensitive and potentially long-running. For those tasks that have more than a 5 minute gap between requests, the only way to avoid a cold start is to implement a separate polling script which is inefficient.

Proposed Implementation

Change the timeout from a static 5 minutes to a configurable time period - maybe from 1 minute (or even 1 request) to 24 hours?
Add an option to keep the model running without an automatic purge from memory for advanced users (only manually purging the model or replacing with another model)

Technical Considerations

Should probably couple with some feedback mechanism and/or limits set to prevent users from overusing system resources
Should gracefully handle bad shutdowns

Questions for Maintainers

What is the right time range to allow for maximum flexibility?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions