Releases: neon-sunset/RangeExtensions
2.1.1
Changes
🚀 Enhancements
- Further speed-up vectorized array/span initialization (improves
ToList()too) (9a667be)
Loop body comparison:
| Old | New | New Aarch64 |
|-------------------------------|-------------------------------|-----------------------------------|
| M02_L01: | M02_L01: | G_M000_IG04: |
| vmovd xmm1,r9d | vmovupd [rax],ymm2 | str q18, [x1] |
| vpbroadcastd ymm1,xmm1 | vmovupd [rax+20],ymm0 | str q16, [x1, #0x10] |
| vpaddd ymm1,ymm1,ymm0 | vpaddd ymm2,ymm2,ymm1 | add v18.4s, v18.4s, v17.4s |
| add r9d,r11d | vpaddd ymm0,ymm0,ymm1 | add v16.4s, v16.4s, v17.4s |
| vmovd xmm2,r9d | add rax,40 | add x1, x1, #32 |
| vpbroadcastd ymm2,xmm2 | add r9d,10 | add w5, w5, #8 |
| vpaddd ymm2,ymm2,ymm0 | cmp r11d,r9d | cmp w4, w5 |
| add r9d,r11d | jg short M02_L01 | bgt G_M000_IG04 |
| movsxd r10,esi | | |
| vmovupd [rax+r10*4],ymm1 | | |
| lea r10d,[rsi+8] | | |
| movsxd r10,r10d | | |
| vmovupd [rax+r10*4],ymm2 | | |
| add esi,10 | | |
| cmp edi,esi | | |
| jg short M02_L01 | | |
|-------------------------------|-------------------------------|-----------------------------------|
🗒️ Notes
If you are interested, make sure to check the past two releases for more details!
https://github.com/neon-sunset/RangeExtensions/releases/tag/2.0.0
https://github.com/neon-sunset/RangeExtensions/releases/tag/2.1.0
Full Changelog: 2.1.0...2.1.1
Published with dotnet-releaser
2.1.0
Changes
🚀 Enhancements
- SIMDify array/span initialization which improves performance by appx. 2-4x depending on supported vector width (PR #16)
- Improve
.ToList()performance by 2x via directly passingRangeEnumerabletoList<int>constructor which subsequently callsICollection.ToArray(...)
BenchmarkDotNet=v0.13.3, OS=macOS 13.1 (22C65) [Darwin 22.2.0]
AMD Ryzen 7 5800X 3.80GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=8.0.100-alpha.1.23056.11
[Host] : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
DefaultJob : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2| Method | Length | Mean | Error | Ratio | Allocated |
|---|---|---|---|---|---|
| RangeToArray | 10 | 8.249 ns | 0.1493 ns | 1.00 | 64 B |
| EnumerableToArray | 10 | 15.416 ns | 0.1819 ns | 1.87 | 104 B |
| RangeToList | 10 | 18.367 ns | 0.1284 ns | 2.23 | 120 B |
| EnumerableToList | 10 | 22.675 ns | 0.4197 ns | 2.75 | 136 B |
| RangeSelectToArray | 10 | 14.934 ns | 0.1617 ns | 1.81 | 96 B |
| EnumerableSelectToArray | 10 | 27.146 ns | 0.2635 ns | 3.30 | 152 B |
| RangeToArray | 100 | 19.584 ns | 0.1200 ns | 1.00 | 424 B |
| EnumerableToArray | 100 | 44.881 ns | 0.2353 ns | 2.29 | 464 B |
| RangeToList | 100 | 29.633 ns | 0.2484 ns | 1.51 | 480 B |
| EnumerableToList | 100 | 97.329 ns | 0.7041 ns | 4.97 | 496 B |
| RangeSelectToArray | 100 | 62.172 ns | 0.6928 ns | 3.18 | 456 B |
| EnumerableSelectToArray | 100 | 75.454 ns | 0.8087 ns | 3.85 | 512 B |
| RangeToArray | 10000 | 690.475 ns | 10.5986 ns | 1.00 | 40024 B |
| EnumerableToArray | 10000 | 3,285.207 ns | 22.7170 ns | 4.76 | 40064 B |
| RangeToList | 10000 | 1,231.378 ns | 11.5101 ns | 1.78 | 40080 B |
| EnumerableToList | 10000 | 7,269.712 ns | 75.8317 ns | 10.52 | 40096 B |
| RangeSelectToArray | 10000 | 5,024.135 ns | 49.6868 ns | 7.27 | 40056 B |
| EnumerableSelectToArray | 10000 | 5,044.309 ns | 67.8153 ns | 7.30 | 40112 B |
📦 Dependencies
- Bump BenchmarkDotNet from 0.13.2 to 0.13.3 (PR #15) by @dependabot (bot)
🧰 Misc
Full Changelog: 2.0.0...2.1.0
Published with dotnet-releaser
2.0.0
Changes
✨ New Features
Select and Where
This release expands the list of Range extensions with several convenience methods that bring it much closer to counterparts in Rust, Python and others.
var floats = (0..100).Select(i => (float)i);
var odd = (0..100).Where(i => i % 2 != 0);
var randomNumbers = (0..1000)
.Select(_ => Random.Shared.Next())
.ToArray();Iterating through these directly is significantly faster when compared to Enumerable.Range.
In fact, with DynamicPGO enabled, on osx-arm64 the loop foreach (var i in (0..1000).Select(i => i * 2)) gets within 1.3-1.5x ratio of a regular plain for loop. Delegate devirtualization has gotten so good!
IList on RangeEnumerable and SelectRange
Now both these types also implement IList.
While RangeEnumerable is effectively a materialized sequence of numbers, SelectRange serves a purpose of "Select View" over that sequence, with exact T values materialized on access similar to plain IEnumerable<T>.
While this does not conform 100% to existing semantics, it was a necessary change to take the advantage of IEnumerable internals for methods that lack bespoke implementations in this library or when RangeExtensions enumerators are boxed.
Bespoke Aggregate, First, Last and more
The library now exposes additional bespoke LINQ method implementations for shorthand usage together with Range that further improve its usability in functional scenarios.
var digits = (0..10)
.Aggregate(new StringBuilder(), (sb, i) => sb.Append(i))
.ToString();
Assert.Equal("0123456789", digits);
// None of these allocate, and all of them are O(1)
var fifteenthDecimal = (0..1337)
.Select(i => (decimal)i)
.Take(10)
.ElementAt(4); // or even just [4]
// Efficient scan from the end of the range
var lastEven = (0..1337)
.Where(i => i % 2 is 0)
.Last();Further optimizations on RangeEnumerable
Previously,Range was nested inside of RangeEnumerable only being unwrapped when creating an enumerator.
This was complicating the job for the JIT code generation and preventing it from seeing that we have already verified a few conditions like whether the range is valid and doesn't start from end for either of the indexes, whether its start value is greater or equal to zero, etc (depends on particular loop used with range).
In addition, this approach was causing further issues when RangeEnumerable was boxed or nested in either RangeSelect or RangeWhere which made the initial prototype have performance barely better or sometimes worse than private implementations of Select and Where iterators from BCL.
Therefore, the best solution turned out to be the most straightforward and likely intellectually disappointing to people who read release notes till the end.
First and foremost, reinterpret-casting Indexes nested inside of Range into ints allowed for better constant propagation and dead branch elimination reducing the size and improving the quality of codegen for precondition checks on an enumerable/enumerator creation.
Next, by flattening all nested structs into their parent holders i.e. storing int _start; int _end; instead of Range and copying RangeEnumerable implementation bits to RangeSelect and RangeWhere instead of nesting RangeEnumerable, it was possible to achieve pretty good enregistering of struct fields, effectively removing the abstraction and turning them into locals.
This in combination with delegate inlining (requires .NET 7 and DynamicPGO) allowed to bring the performance of direct foreach (var i in (0..Length).Select(i => i * 2)) loops to within 30% of plain for loop without any delegates. This also applies to (0..100).Aggregate(...) methods and any other that uses LINQ methods to produce a value functional-style.
However, I decided to postpone (or maybe it was small brain cope?) correctly abstracting away duplicate code in .SpeedOpt.cs via either generics or templating/code generation to deliver the update as is. If you are interested, I will be happy to review and accept a PR that closes #14
📦 Dependencies
- Bump Microsoft.NET.Test.Sdk from 17.3.1 to 17.3.2 (PR #9) by @dependabot (bot)
- Bump coverlet.collector from 3.1.2 to 3.2.0 (PR #11) by @dependabot (bot)
- Bump Microsoft.NET.Test.Sdk from 17.3.2 to 17.4.0 (PR #12) by @dependabot (bot)
- Bump Microsoft.NET.Test.Sdk from 17.4.0 to 17.4.1 (PR #13) by @dependabot (bot)
Full Changelog: 1.2.2...2.0.0
Published with dotnet-releaser
1.2.2
Changes
🐛 Bug Fixes
- Fix a typo in ThrowHelpers.EmptyRange() message. (89df618)
📦 Dependencies
- Bump MinVer from 4.1.0 to 4.2.0 (PR #6) by @dependabot (bot)
- Bump BenchmarkDotNet from 0.13.1 to 0.13.2 (PR #7) by @dependabot (bot)
- Bump Microsoft.NET.Test.Sdk from 17.3.0 to 17.3.1 (PR #8) by @dependabot (bot)
Full Changelog: 1.2.1...1.2.2
Published with dotnet-releaser
1.2.1
Changes
🐛 Bug Fixes
📦 Dependencies
- Bump Microsoft.NET.Test.Sdk from 17.2.0 to 17.3.0 (PR #4) by @dependabot (bot)
🧰 Misc
- Re-enable Coveralls publishing because it has been fixed (2248bda)
- Complete coverage to 100% (8d6f42b)
- Update description (f453133)
Full Changelog: 1.2.0...1.2.1
1.2.0
Changes
✨ New Features
- Add Span overloads for CopyTo/TryCopyTo for fun (6af9719)
- Implement ICollection on RangeEnumerable for performance (4318e63)
🐛 Bug Fixes
- Fix ICollection.CopyTo implementation and complete test coverage (c71c076)
📦 Dependencies
- Bump xunit from 2.4.1 to 2.4.2 (PR #3) by @dependabot (bot)
🧰 Misc
- Make tests compatible with XUnit 2.4.2 (76970d0)
- Update README, slighly improve Sum() codegen and replace Count() with Count where appropriate (2b3efb6)
Full Changelog: 1.1.0...1.2.0
1.1.0
Changes
✨ New Features
- Add
rangeEnumerable.Sum()
🧰 Misc
- Further optimizations, cleanup and additional tests (482e7b5)
Full Changelog: 1.0.0...1.1.0
1.0.0
✨Initial Release!
Full Changelog: 6ad37f19b58b2776f800017425ea51f58122e2a4...1.0.0