New method circular_array_windows().#1086
New method circular_array_windows().#1086sgtatham wants to merge 1 commit intorust-itertools:masterfrom
Conversation
|
I'm not very happy with this code – I have a nasty feeling it's much longer than it needs to be. But it seemed surprisingly tricky to get all the edge cases right, especially when the input iterator runs out early and you have to recycle elements multiple times! I'd like to remove the dependency on (But you would have to give the arrayvecs capacity N, where you only really need N-1, because you can't do arithmetic on integer type parameters!) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1086 +/- ##
==========================================
- Coverage 94.38% 93.76% -0.63%
==========================================
Files 48 51 +3
Lines 6665 6415 -250
==========================================
- Hits 6291 6015 -276
- Misses 374 400 +26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b780c38 to
832cd72
Compare
src/array_impl.rs
Outdated
| let window = std::array::from_fn(|i| { | ||
| if i + 1 < N { | ||
| // The first N-1 items come from `ringbuf` | ||
| self.ringbuf[(i + self.ringpos) % self.ringbuf.len()].clone() | ||
| } else { | ||
| // The last item is the new one we just read | ||
| new_item.clone() | ||
| } | ||
| }); | ||
|
|
||
| // Replace the oldest item in `ringbuf` with the new one. | ||
| self.ringbuf[self.ringpos] = new_item; |
There was a problem hiding this comment.
Couldn't this first modify self.ringbuf, and only afterwards compute window. This could avoid the if/else within the array::from_fn.
|
Hi @sgtatham, thanks for tackling this. It'll take some time to do a thorough review, but from what I've seen so far, this always generates as many windows as the original iterator had elements. I think this method could afford to behave similar to I think we can exploit the Moreover, could Regarding If it simplifies your implementation, you can even Footnotes
|
Yes – that's the same behaviour as There certainly is an argument for the idea that perhaps asking for length-7 windows of a length-3 list should say "don't be silly, there's no such thing". I recently posted a poll on Mastodon about the case of length-2 windows, and almost everybody thought it was weird that if the input list has length 1 then the window (a,a) is returned. But I think it depends on why you want circular windows in the first place, and some people will want those degenerate elements. You mention a comparison to
If we change the policy in edge cases, so that returned windows will never contain two copies of the same input item, then I believe that strategy would work, because if you can't But if we're keeping the behaviour like That's where a lot of the complexity in this implementation comes from: the way that sometimes you have to start consuming things from |
|
Good point. Then let's try to stay in line with the existing Maybe this is a starting point how the first Footnotes
|
|
Fair's fair, my current implementation is ugly and inefficient too 🙂 Yes, I see – avoid having to keep an array of |
7afc0fa to
72b1b5c
Compare
|
Thanks for the suggestion! It worked pretty well. Here's the revised implementation, using no You were right that (Obvious with hindsight! We produce |
This is very like the existing `circular_tuple_windows`, but imposes the minimum possible bounds on the input iterator: it must have cloneable items because each item is returned N times, and it must be Sized so that it can be stored in a struct. Unlike `circular_tuple_windows`, it doesn't require the input iterator itself to have extra traits, like Clone or ExactSizeIterator. Because the return type is an array (as suggested in rust-itertools#1084), we must handle the zero-length case, because you can't have a constraint `N>0`. In that situation we still read to the end of the input iterator, discard each item as we read it, and return a zero-length array per item, preserving the invariant that this iterator is the same length as the input one.
72b1b5c to
8e4550c
Compare
There was a problem hiding this comment.
Hi there, thanks for this! Already much better now that balanced et al are out. Also nice to see the Vecs gone.
However, I'm wondering if we can simplify a bit further. In particular the case distinction in read_item worries me.
As I wrote, I think we should fuse the underlying iterator, and I think fn next could then do sth like this:
fn next() {
match self.iter.next() {
None => match self.inner {
Some(inner) => {// start re-using stuff from inner.prefix}
None => None
},
Some(item) => match self.inner {
Some(inner) => {update inner},
None => {initialize inner}
}
}
}
The story would then be "try self.iter as long as possible, and only once it returns None, go with self.prefix...
Which begs one question: Can we make all of this even simpler by doing a stupid trick: Should we first implement array_windows, and then redirect circular_array_windows so that the first call to fn next sets up prefix and chains it after iter?
I.e. given tuple_windows, could we say the following?
enum CircularTupleWindows<N, Iter> {
NotYetStarted(<Fused<Iter>>),
Started(
TupleWindows<N, Chain<Iter, [Iter::Item; N-1]>>
),
}
impl Iterator for CircularTupleWindows {
fn next() {
match self {
NotYetStarted(iter) => self=Started(
{
let prefix = compute prefix from iter;
iter.chain(prefix).tuple_windows()
}
),
Started(tuple_windows_iter) => {}
}
let Started(internal) = self else {panic!()}; // or similar
internal.next()
}
}
I'm sorry this came to my mind just now, but I think it's worth exploring.
As always: Use your judgement, and please excuse if I made obvious mistakes. (It's always too late in the evening when I do reviews...)
| // windows of length `N` will cover `k+N-1` items in total. So we | ||
| // have output enough windows precisely when `prefix_pos` reaches | ||
| // `N-1`, whether or not we began incrementing `prefix_pos` during | ||
| // initial setup. |
There was a problem hiding this comment.
Please boil this down to sth like "prefix stores the first N items (to be used when cycled)" or something.
In particular, the last paragraph may mislead, because even if the input iterator has k items, the output iterator does not necessarily have k items (e.g. if N>2*k).
| // `ringbuf` stores the _most recent_ N items from the input | ||
| // iterator, which were delivered in the most recent output | ||
| // window. It is stored in the form of a ring buffer, with | ||
| // `ringpos` identifying the element that logically comes first. | ||
| ringbuf: [T; N], | ||
| ringpos: usize, |
There was a problem hiding this comment.
Please boil this down to sth like ringbuf stores the elements for the current window. Maybe even rename ringbuf to current_window_elements or something, and comment "current_window_elements stored in ringbuffer-fashion".
| /// Read the next item in the logical input sequence (consisting | ||
| /// of the contents of the input iterator followed by N-1 items | ||
| /// recycling from the beginning). Add it to the ring buffer. | ||
| fn read_item(&mut self, iter: &mut impl Iterator<Item = T>) -> bool { |
There was a problem hiding this comment.
Please inline read_item: It relies on preconditions that are not evident from the function itself (e.g. it increments pos and accesses self.prefix[*pos]). Inlining it makes all these information local to fn next.
| /// Make a new `CircularArrayWindowsInner`, in which `prefix` | ||
| /// contains the item `first`, plus `N-1` more items from the | ||
| /// provided iteraor (or recycle existing items if necessary). | ||
| fn new(first: T, iter: &mut impl Iterator<Item = T>) -> Self { |
There was a problem hiding this comment.
Please inline this function. It manipulates iter, and - if we inline everything into fn next - , we have the complicated state manipulation localized.
| // To allow building up `prefix` incrementally, we make it in | ||
| // the form of an array of `Option`. Once we've finished, and | ||
| // all its elements are `Some`, we can map it through `unwrap` | ||
| // to make the unconditional `[T; N]` that goes in the output | ||
| // struct. |
There was a problem hiding this comment.
Please boil this down to sth like: construct [Option<T>; N] and convert to [T; N] afterwards // TODO can we improve this?
| I::Item: Clone, | ||
| { | ||
| fn len(&self) -> usize { | ||
| self.iter.len() |
There was a problem hiding this comment.
I think this is wrong for N>2*self.iter.len() (the case where we must reuse elements from prefix to even create the first window).
Moreover, implementing ExactSizeIterator requires that "the implementation of Iterator::size_hint must return the exact size of the iterator".
Maybe let's leave out ExactSizeIterator for now.
There was a problem hiding this comment.
I think it's right for all combinations of N and the input length, because of the characterisation of circular windows I mentioned last time: regard the input iterator as cyclic, and return a window for every possible start point in the cycle.
For example, for input length 1 we always get one window, [a,a,a,a,...,a]. For input length 2 we get [a,b,a,b,...] and [b,a,b,a,...], exactly 2 windows. Both of these are regardless of N.
| None => match self.iter.next() { | ||
| None => { | ||
| // The input iterator was completely empty | ||
| None | ||
| } |
There was a problem hiding this comment.
Behavior for unfused iterators (iterators that yield Some(...) after they already yielded None) is questionable here - it would create inner on a non-first-call to next.
Let's fuse the underlying iterator. windows for unfused iterators are confusing anyway.
| /// use itertools::Itertools; | ||
| /// let mut v = Vec::new(); | ||
| /// for [a, b] in (1..5).circular_array_windows() { | ||
| /// v.push([a, b]); | ||
| /// } | ||
| /// assert_eq!(v, vec![[1, 2], [2, 3], [3, 4], [4, 1]]); | ||
| /// | ||
| /// let mut it = (1..5).circular_array_windows(); | ||
| /// assert_eq!(Some([1, 2, 3]), it.next()); | ||
| /// assert_eq!(Some([2, 3, 4]), it.next()); | ||
| /// assert_eq!(Some([3, 4, 1]), it.next()); | ||
| /// assert_eq!(Some([4, 1, 2]), it.next()); | ||
| /// assert_eq!(None, it.next()); | ||
| /// ``` |
There was a problem hiding this comment.
Please make the asserts uniform: Either go with assert_eq(..., vec![...]) or with the single assert_eq. (I prefer the first one, maybe even condensed to sth like assert_eq!((1..5).windows().collect(), vec![...]).)
There was a problem hiding this comment.
Ah, I should have known that copying the demo code directly from circular_tuple_windows wouldn't be good enough 🙂
| } | ||
|
|
||
| // array iterators | ||
| quickcheck! { |
There was a problem hiding this comment.
We should add tests that verify that tuple_circular_windows does the same as array_circular_windows.
| /// use itertools::Itertools; | ||
| /// let mut v = Vec::new(); | ||
| /// for [a, b] in (1..5).circular_array_windows() { | ||
| /// v.push([a, b]); | ||
| /// } | ||
| /// assert_eq!(v, vec![[1, 2], [2, 3], [3, 4], [4, 1]]); | ||
| /// | ||
| /// let mut it = (1..5).circular_array_windows(); | ||
| /// assert_eq!(Some([1, 2, 3]), it.next()); | ||
| /// assert_eq!(Some([2, 3, 4]), it.next()); | ||
| /// assert_eq!(Some([3, 4, 1]), it.next()); | ||
| /// assert_eq!(Some([4, 1, 2]), it.next()); | ||
| /// assert_eq!(None, it.next()); | ||
| /// ``` |
There was a problem hiding this comment.
Should this describe what happens for e.g. [1,2].circular_windows<5>()?
I had a try at this, but it didn't go well. After setting up But I did get what looks like a working implementation of I'll try again tomorrow with a different approach. |
This is very like the existing
circular_tuple_windows, but imposes the minimum possible bounds on the input iterator: it must have cloneable items because each item is returned N times, and it must be Sized so that it can be stored in a struct. Unlikecircular_tuple_windows, it doesn't require the input iterator itself to have extra traits, like Clone or ExactSizeIterator.Because the return type is an array (as suggested in #1084), we must handle the zero-length case, because you can't have a constraint
N>0. In that situation we still read to the end of the input iterator, discard each item as we read it, and return a zero-length array per item, preserving the invariant that this iterator is the same length as the input one.