Skip to content

New method circular_array_windows().#1086

Open
sgtatham wants to merge 1 commit intorust-itertools:masterfrom
sgtatham:circular-array-windows
Open

New method circular_array_windows().#1086
sgtatham wants to merge 1 commit intorust-itertools:masterfrom
sgtatham:circular-array-windows

Conversation

@sgtatham
Copy link
Contributor

@sgtatham sgtatham commented Feb 5, 2026

This is very like the existing circular_tuple_windows, but imposes the minimum possible bounds on the input iterator: it must have cloneable items because each item is returned N times, and it must be Sized so that it can be stored in a struct. Unlike circular_tuple_windows, it doesn't require the input iterator itself to have extra traits, like Clone or ExactSizeIterator.

Because the return type is an array (as suggested in #1084), we must handle the zero-length case, because you can't have a constraint N>0. In that situation we still read to the end of the input iterator, discard each item as we read it, and return a zero-length array per item, preserving the invariant that this iterator is the same length as the input one.

@sgtatham
Copy link
Contributor Author

sgtatham commented Feb 5, 2026

I'm not very happy with this code – I have a nasty feeling it's much longer than it needs to be. But it seemed surprisingly tricky to get all the edge cases right, especially when the input iterator runs out early and you have to recycle elements multiple times!

I'd like to remove the dependency on Vec, so that this adapter wouldn't have to be gated on use_alloc. But I don't know what the policy is on adding extra dependencies. Something like arrayvec would replace Vec in a sensible way, I think.

(But you would have to give the arrayvecs capacity N, where you only really need N-1, because you can't do arithmetic on integer type parameters!)

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.76%. Comparing base (6814180) to head (8e4550c).
⚠️ Report is 173 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1086      +/-   ##
==========================================
- Coverage   94.38%   93.76%   -0.63%     
==========================================
  Files          48       51       +3     
  Lines        6665     6415     -250     
==========================================
- Hits         6291     6015     -276     
- Misses        374      400      +26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgtatham sgtatham force-pushed the circular-array-windows branch 2 times, most recently from b780c38 to 832cd72 Compare February 5, 2026 18:34
Comment on lines 118 to 129
let window = std::array::from_fn(|i| {
if i + 1 < N {
// The first N-1 items come from `ringbuf`
self.ringbuf[(i + self.ringpos) % self.ringbuf.len()].clone()
} else {
// The last item is the new one we just read
new_item.clone()
}
});

// Replace the oldest item in `ringbuf` with the new one.
self.ringbuf[self.ringpos] = new_item;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this first modify self.ringbuf, and only afterwards compute window. This could avoid the if/else within the array::from_fn.

@phimuemue
Copy link
Member

phimuemue commented Feb 5, 2026

Hi @sgtatham, thanks for tackling this.

It'll take some time to do a thorough review, but from what I've seen so far, this always generates as many windows as the original iterator had elements.

I think this method could afford to behave similar to slice::array_windows resp. slice::windows. They return no windows if N > underlying_iterator.count() (which is what tuple_windows already does).

I think we can exploit the N > underlying_iterator.count() to make prefix a Option<[T; N]> 1. Then, the first call to next() could call self.iter.next_array() to populate prefix, and eliminate prefix_needed. (prefix==None would then possibly replace CircularArrayWindowsState::NotYetStarted state.) I'd hope this also allows to eliminate balanced.

Moreover, could ringbuf be a [T; N]? Then - if I'm not mistaken - we could provide circular_array_windows without use_alloc.

Regarding CircularArrayWindowsState::Done: I would hope that this can be derived from the condition cycledpos==prefix.len()-1 (or something similar, please double check).

If it simplifies your implementation, you can even panic if N==0 (that's what slice::[array_]windows does).

Footnotes

  1. Ideally [T; N-1] but afaik Rust does not allow N-1 (generic parameters may not be used in const operations). [T; N] is probably close enough, and we can add a // TODO.

@sgtatham
Copy link
Contributor Author

sgtatham commented Feb 6, 2026

It'll take some time to do a thorough review, but from what I've seen so far, this always generates as many windows as the original iterator had elements.

Yes – that's the same behaviour as circular_tuple_windows which I'm replacing. In cases where you ask for a large window from a small list, the existing function repeats the input elements multiple times. For example, length-7 windows from a list (1,2,3) gives you (1,2,3,1,2,3,1), (2,3,1,2,3,1,2), and (3,1,2,3,1,2,3). The rationale seems to be to regard the input list as cyclic, and return a window of it for each possible start position in the cycle.

There certainly is an argument for the idea that perhaps asking for length-7 windows of a length-3 list should say "don't be silly, there's no such thing". I recently posted a poll on Mastodon about the case of length-2 windows, and almost everybody thought it was weird that if the input list has length 1 then the window (a,a) is returned. But I think it depends on why you want circular windows in the first place, and some people will want those degenerate elements.

You mention a comparison to slice::[array_]windows. But that's not cyclic. I agree it's uncontroversial that a non-cyclic windows operator returns no windows if there aren't enough input elements to make a whole one.

Then, the first call to next() could call self.iter.next_array() to populate prefix, and eliminate prefix_needed.

If we change the policy in edge cases, so that returned windows will never contain two copies of the same input item, then I believe that strategy would work, because if you can't .next_array a whole N items out of the start of the iterator, you were going to fail anyway and return nothing. So yes, in that case both prefix and ringbuf would either contain N items or none, and could be Option<[T;N]>.

But if we're keeping the behaviour like circular_tuple_windows, then that's not good enough, because if you can't fetch N things from the iterator (but can fetch at least one), then you have to keep the <N items you did get, and be prepared to cycle them.

That's where a lot of the complexity in this implementation comes from: the way that sometimes you have to start consuming things from prefix before you've even finished building prefix. In particular, that prevented me from constructing prefix as a fixed-size array using array::from_fn, because I would have needed the function to receive not just the current array index, but a slice showing me all the items in previous array slots, so that sometimes I could clone one of them.

@phimuemue
Copy link
Member

phimuemue commented Feb 6, 2026

Good point. Then let's try to stay in line with the existing tuple_circular_windows.1

Maybe this is a starting point how the first next-call could fill the array. I know it's ugly and inefficient, but it contains the complexity within fn next. And it might be replaced by something better one day.

Footnotes

  1. We could think about making these special cases explicit (e.g. by offering two different functions, or adding a parameter), but I suggest going with what we have.

@sgtatham
Copy link
Contributor Author

sgtatham commented Feb 6, 2026

Fair's fair, my current implementation is ugly and inefficient too 🙂

Yes, I see – avoid having to keep an array of Option inside the struct, by instead using a temporary one during setup, and moving the items into the plain [T;N] once it's complete. I probably should have thought of that. OK, I'll rework on that basis.

@sgtatham sgtatham force-pushed the circular-array-windows branch 2 times, most recently from 7afc0fa to 72b1b5c Compare February 7, 2026 09:24
@sgtatham
Copy link
Contributor Author

sgtatham commented Feb 7, 2026

Thanks for the suggestion! It worked pretty well. Here's the revised implementation, using no Vec and no longer depending on use_alloc. I've also managed to reduce the special cases for N<2 to only a few lines, instead of that huge entirely separate code path.

You were right that balance wasn't needed. It turns out that an equivalent criterion is that N-1 items have been consumed from prefix, regardless of whether they were all pulled out after the input iterator ended, or whether it ended while they were still being built. So tracking the number of items consumed from prefix is enough.

(Obvious with hindsight! We produce k windows of length N, which cover a total of k+N-1 items, so we finish precisely when we've used up the input iterator and then consumed N-1 more items.)

This is very like the existing `circular_tuple_windows`, but imposes
the minimum possible bounds on the input iterator: it must have
cloneable items because each item is returned N times, and it must be
Sized so that it can be stored in a struct. Unlike
`circular_tuple_windows`, it doesn't require the input iterator itself
to have extra traits, like Clone or ExactSizeIterator.

Because the return type is an array (as suggested in rust-itertools#1084), we must
handle the zero-length case, because you can't have a constraint
`N>0`. In that situation we still read to the end of the input
iterator, discard each item as we read it, and return a zero-length
array per item, preserving the invariant that this iterator is the
same length as the input one.
@sgtatham sgtatham force-pushed the circular-array-windows branch from 72b1b5c to 8e4550c Compare February 7, 2026 09:36
Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there, thanks for this! Already much better now that balanced et al are out. Also nice to see the Vecs gone.

However, I'm wondering if we can simplify a bit further. In particular the case distinction in read_item worries me.

As I wrote, I think we should fuse the underlying iterator, and I think fn next could then do sth like this:

fn next() {
 match self.iter.next() {
  None => match self.inner {
   Some(inner) => {// start re-using stuff from inner.prefix}
   None => None
  },
  Some(item) => match self.inner {
   Some(inner) => {update inner},
   None => {initialize inner}
  }
 }
}

The story would then be "try self.iter as long as possible, and only once it returns None, go with self.prefix...

Which begs one question: Can we make all of this even simpler by doing a stupid trick: Should we first implement array_windows, and then redirect circular_array_windows so that the first call to fn next sets up prefix and chains it after iter?

I.e. given tuple_windows, could we say the following?

enum CircularTupleWindows<N, Iter> {
 NotYetStarted(<Fused<Iter>>),
 Started(
  TupleWindows<N, Chain<Iter, [Iter::Item; N-1]>>
 ),
}

impl Iterator for CircularTupleWindows {
 fn next() {
  match self {
   NotYetStarted(iter) => self=Started(
    {
     let prefix = compute prefix from iter;
     iter.chain(prefix).tuple_windows()
    }
   ),
   Started(tuple_windows_iter) => {}
  }
  let Started(internal) = self else {panic!()}; // or similar
  internal.next()
 }
}

I'm sorry this came to my mind just now, but I think it's worth exploring.

As always: Use your judgement, and please excuse if I made obvious mistakes. (It's always too late in the evening when I do reviews...)

// windows of length `N` will cover `k+N-1` items in total. So we
// have output enough windows precisely when `prefix_pos` reaches
// `N-1`, whether or not we began incrementing `prefix_pos` during
// initial setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please boil this down to sth like "prefix stores the first N items (to be used when cycled)" or something.

In particular, the last paragraph may mislead, because even if the input iterator has k items, the output iterator does not necessarily have k items (e.g. if N>2*k).

Comment on lines +40 to +45
// `ringbuf` stores the _most recent_ N items from the input
// iterator, which were delivered in the most recent output
// window. It is stored in the form of a ring buffer, with
// `ringpos` identifying the element that logically comes first.
ringbuf: [T; N],
ringpos: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please boil this down to sth like ringbuf stores the elements for the current window. Maybe even rename ringbuf to current_window_elements or something, and comment "current_window_elements stored in ringbuffer-fashion".

/// Read the next item in the logical input sequence (consisting
/// of the contents of the input iterator followed by N-1 items
/// recycling from the beginning). Add it to the ring buffer.
fn read_item(&mut self, iter: &mut impl Iterator<Item = T>) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inline read_item: It relies on preconditions that are not evident from the function itself (e.g. it increments pos and accesses self.prefix[*pos]). Inlining it makes all these information local to fn next.

/// Make a new `CircularArrayWindowsInner`, in which `prefix`
/// contains the item `first`, plus `N-1` more items from the
/// provided iteraor (or recycle existing items if necessary).
fn new(first: T, iter: &mut impl Iterator<Item = T>) -> Self {
Copy link
Member

@phimuemue phimuemue Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inline this function. It manipulates iter, and - if we inline everything into fn next - , we have the complicated state manipulation localized.

Comment on lines +53 to +57
// To allow building up `prefix` incrementally, we make it in
// the form of an array of `Option`. Once we've finished, and
// all its elements are `Some`, we can map it through `unwrap`
// to make the unconditional `[T; N]` that goes in the output
// struct.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please boil this down to sth like: construct [Option<T>; N] and convert to [T; N] afterwards // TODO can we improve this?

I::Item: Clone,
{
fn len(&self) -> usize {
self.iter.len()
Copy link
Member

@phimuemue phimuemue Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong for N>2*self.iter.len() (the case where we must reuse elements from prefix to even create the first window).

Moreover, implementing ExactSizeIterator requires that "the implementation of Iterator::size_hint must return the exact size of the iterator".

Maybe let's leave out ExactSizeIterator for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's right for all combinations of N and the input length, because of the characterisation of circular windows I mentioned last time: regard the input iterator as cyclic, and return a window for every possible start point in the cycle.

For example, for input length 1 we always get one window, [a,a,a,a,...,a]. For input length 2 we get [a,b,a,b,...] and [b,a,b,a,...], exactly 2 windows. Both of these are regardless of N.

Comment on lines +142 to +146
None => match self.iter.next() {
None => {
// The input iterator was completely empty
None
}
Copy link
Member

@phimuemue phimuemue Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior for unfused iterators (iterators that yield Some(...) after they already yielded None) is questionable here - it would create inner on a non-first-call to next.

Let's fuse the underlying iterator. windows for unfused iterators are confusing anyway.

Comment on lines +915 to +928
/// use itertools::Itertools;
/// let mut v = Vec::new();
/// for [a, b] in (1..5).circular_array_windows() {
/// v.push([a, b]);
/// }
/// assert_eq!(v, vec![[1, 2], [2, 3], [3, 4], [4, 1]]);
///
/// let mut it = (1..5).circular_array_windows();
/// assert_eq!(Some([1, 2, 3]), it.next());
/// assert_eq!(Some([2, 3, 4]), it.next());
/// assert_eq!(Some([3, 4, 1]), it.next());
/// assert_eq!(Some([4, 1, 2]), it.next());
/// assert_eq!(None, it.next());
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the asserts uniform: Either go with assert_eq(..., vec![...]) or with the single assert_eq. (I prefer the first one, maybe even condensed to sth like assert_eq!((1..5).windows().collect(), vec![...]).)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I should have known that copying the demo code directly from circular_tuple_windows wouldn't be good enough 🙂

}

// array iterators
quickcheck! {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tests that verify that tuple_circular_windows does the same as array_circular_windows.

Comment on lines +915 to +928
/// use itertools::Itertools;
/// let mut v = Vec::new();
/// for [a, b] in (1..5).circular_array_windows() {
/// v.push([a, b]);
/// }
/// assert_eq!(v, vec![[1, 2], [2, 3], [3, 4], [4, 1]]);
///
/// let mut it = (1..5).circular_array_windows();
/// assert_eq!(Some([1, 2, 3]), it.next());
/// assert_eq!(Some([2, 3, 4]), it.next());
/// assert_eq!(Some([3, 4, 1]), it.next());
/// assert_eq!(Some([4, 1, 2]), it.next());
/// assert_eq!(None, it.next());
/// ```
Copy link
Member

@phimuemue phimuemue Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this describe what happens for e.g. [1,2].circular_windows<5>()?

@sgtatham
Copy link
Contributor Author

sgtatham commented Feb 8, 2026

Which begs one question: Can we make all of this even simpler by doing a stupid trick: Should we first implement array_windows, and then redirect circular_array_windows so that the first call to fn next sets up prefix and chains it after iter?

I had a try at this, but it didn't go well. After setting up prefix you have to chain the first copy as well as the second copy, and the edge cases are at least as painful as doing it directly.

But I did get what looks like a working implementation of array_windows itself out of it, and that's obviously worth having, because it would be silly to provide just the circular case and not the easier one!

I'll try again tomorrow with a different approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants