-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem or challenge?
One of the dreams of the composable data ecosystem is to quickly assemble a system from various components (DataFusion, data formats
DataFusion still releases once a month, which allows code to quickly flow but also causes at least 2 challenges:
- Takes non trivial work required to upgrade downstream projects, as mentioned in [Discuss] Release cadence / patch releases / Long Term Supported (lts) minor releases #5269
- Make upgrading and using downstream third-party extensions hard
Third party extensions like delta-rs and iceberg provide TableProviders
for DataFusion, which is really nice. However, to use those packages the versions of DataFusion must match exactly.
This means for an application that relies on multiple downstream packages must wait until ALL of them have upgraded to the new version in order to upgrade DataFusion. If there is any delay in the downstream libraries updating, it delays.
For example, an application that wants to use delta-rs, iceberg, and the table-providers
crate, there is a race after each upgrade of DataFusion
Let's take a release timeline for
- +0 days: DataFusion version
X
released - +7 days: New delta-rs releases upgraded to DataFusion
X
- +11 days: new iceberg crate released upgraded to DataFusion
X
- +12 days: new table-providers version is released
- +13-30 days: End user app can upgrade DataFusion and delta, and icerberg
- +31 days: New DataFusion is released again
Describe the solution you'd like
I would like downstream libraries to have more time and schedule flexibility when upgrading DataFusion and other dependent crates, so that it is easier to construct a system from different components
Describe alternatives you've considered
Option 1: Switch to major/minor release cadence
We could follow the model of arrow-rs which does releases monthly, but breaking releases only quarterly. Here is how it works in arrow-rs: https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule
This would mean continuing to release every month, but only allowing breaking API changes every 3rd release (or some other cadence)
The major cost here is that maintainers and contributors would have to be diligent about not merging breaking API changes until a major release
This is possible to automate somewhat:
- Semver-checks for all crate on merge and push #16078 from @logan-keede
- Adds script to detect breaking API changes/ semver #16541 from @lic
Option 2: LTS and feature branch
-Keep (at least) two branches going: LTS and main, as proposed by @andygrove in #5269
In this model we would likely backport changes to the LTS branch and make releases from there. The downside of this approach is that there is extra work to backport changes to LTS.
Additional context
No response