Skip to content

draft: Improve map performance through unsafe uninitialized capacity #83297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

JaapWijnen
Copy link
Contributor

@JaapWijnen JaapWijnen commented Jul 24, 2025

I analysing performance of a piece of code where a call to map was inside a hot path. Most time was spent inside the append method inside map's implementation.
When replacing the implementation with the changes above we saw a significant performance increase (Will add numbers and a way to reproduce after running ci benchmarks)
These changes ofcourse need some additional work to make sure ABI is maintained which I'd like to pursue once the performance benefits of these changes are confirmed.

@JaapWijnen JaapWijnen requested a review from a team as a code owner July 24, 2025 15:48
@asl
Copy link
Contributor

asl commented Jul 24, 2025

@swift-ci please test

@asl
Copy link
Contributor

asl commented Jul 24, 2025

@swift-ci please benchmark

@JaapWijnen
Copy link
Contributor Author

There's a bit more going on wrt the old method where the mangled name shows up in SwiftOnoneSupport.swift here:

@_semantics("prespecialize.$sSa28_unsafeUninitializedCapacity16initializingWithSayxGSi_ySryxGz_SiztKXEtKcfC")

So I'm afraid I'll have to wrap my had around how to fix the ABI part before tests will pass.

@stephentyrone
Copy link
Contributor

@glessard has been wrangling with Onone support for similar issues, and should be able to help out once he has a clearer picture of how to move these forward.

@glessard
Copy link
Contributor

glessard commented Jul 24, 2025

Indeed. I have a PR here dealing with generalizing this Array initializer, and the next step is to solve the ABI issue with SwiftOnoneSupport.

@JaapWijnen
Copy link
Contributor Author

Ah ok great thanks @glessard I'll adjust my PR once yours is merged!

@JaapWijnen JaapWijnen force-pushed the improve-map-performance-through-unsafeUninitializedCapacity branch from e686d28 to 2e6a661 Compare July 25, 2025 23:31
@clackary
Copy link

@swift-ci please test

@glessard
Copy link
Contributor

@swift-ci Apple Silicon benchmark

JaapWijnen and others added 2 commits July 26, 2025 01:51
Avoid capturing state that's also available from our parameters

Co-authored-by: Guillaume Lessard <[email protected]>
@JaapWijnen
Copy link
Contributor Author

Thanks for the review @glessard ready for another test run

@glessard
Copy link
Contributor

@swift-ci please smoke test

Co-authored-by: Guillaume Lessard <[email protected]>
@JaapWijnen
Copy link
Contributor Author

Oops, fixed! Will clean up the commits a little once we have some results (don't have access to my machine right now sorry)

@glessard
Copy link
Contributor

@swift-ci please smoke test

@glessard glessard self-requested a review July 26, 2025 06:06
@glessard glessard dismissed their stale review July 26, 2025 06:06

Corrected

@JaapWijnen
Copy link
Contributor Author

Not sure what happened to the windows build. Shall we try to run the benchmarks?

@glessard
Copy link
Contributor

@swift-ci please benchmark

@glessard
Copy link
Contributor

@swift-ci test Windows platform

for _ in 0..<n {
result.append(try transform(self[i]))
formIndex(after: &i)
return try unsafe Array<T>(unsafeUninitializedCapacity: n) { (
Copy link
Contributor

@stephentyrone stephentyrone Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glessard I assume we'll update this to use an OutputSpan-based init once that's feasible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be preferable indeed.

@glessard
Copy link
Contributor

glessard commented Jul 27, 2025

The benchmarks aren't good news, with much worse regressions than improvements. It would be nice to know more about the example where you saw an improvement:

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
MapReduceClass2 11.031 129.368 +1072.7% 0.09x
MapReduceClassShort2 82.3 222.3 +170.1% 0.37x
MapReduceShortString 8.807 11.889 +35.0% 0.74x (?)
MapReduceShort 803.81 1064.375 +32.4% 0.76x (?)
MapReduceAnyCollectionShort 920.588 1132.667 +23.0% 0.81x (?)
DictOfArraysToArrayOfDicts 344.4 402.5 +16.9% 0.86x (?)
BucketSort 104.227 119.105 +14.3% 0.88x (?)
StringAdder 301.714 343.5 +13.8% 0.88x (?)
StringBuilder 301.143 339.333 +12.7% 0.89x (?)
DataToStringEmpty 91.0 102.4 +12.5% 0.89x (?)
StringBuilderSmallReservingCapacity 308.833 345.0 +11.7% 0.90x (?)
StringUTF16Builder 319.032 356.0 +11.6% 0.90x (?)
ArrayPlusEqualSingleElementCollection 882.556 984.063 +11.5% 0.90x (?)
ArraySetElement 293.0 323.2 +10.3% 0.91x (?)
Array2D 4382.0 4822.4 +10.1% 0.91x (?)
PrefixWhileSequence 179.455 193.8 +8.0% 0.93x (?)
StringInterpolationSmall 884.211 951.053 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
CxxSpanTests.map 459.4 43.818 -90.5% 10.48x
MapReduceAnyCollection 132.0 34.727 -73.7% 3.80x
MapReduce 90.941 33.76 -62.9% 2.69x
RangeAssignment 124.071 91.889 -25.9% 1.35x (?)
ArrayAppendGenericStructs 1430.0 1180.0 -17.5% 1.21x (?)
SortAdjacentIntPyramids 907.5 760.0 -16.3% 1.19x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
RomanNumbers.o 3972 4511 +13.6% 0.88x
AngryPhonebook.o 7688 8010 +4.2% 0.96x
DictOfArraysToArrayOfDicts.o 20855 21651 +3.8% 0.96x
StringEdits.o 8039 8326 +3.6% 0.97x
DriverUtils.o 124971 128549 +2.9% 0.97x
Monoids.o 147058 150414 +2.3% 0.98x
WordCount.o 33405 34033 +1.9% 0.98x
ReversedCollections.o 8063 8209 +1.8% 0.98x
Breadcrumbs.o 45577 46345 +1.7% 0.98x
 
Improvement OLD NEW DELTA RATIO
FlattenList.o 3288 2817 -14.3% 1.17x
BucketSort.o 8747 7779 -11.1% 1.12x
RangeAssignment.o 2672 2392 -10.5% 1.12x
ArrayRemoveAll.o 6873 6353 -7.6% 1.08x
CString.o 6432 5965 -7.3% 1.08x
RangeOverlaps.o 5361 4977 -7.2% 1.08x
LazyFilter.o 7509 7005 -6.7% 1.07x
RangeContains.o 6369 5985 -6.0% 1.06x
IntegerParsing.o 73653 69240 -6.0% 1.06x
DictionaryGroup.o 10835 10315 -4.8% 1.05x
QueueTest.o 10817 10329 -4.5% 1.05x
RemoveWhere.o 11787 11299 -4.1% 1.04x
UTF16Decode.o 18988 18483 -2.7% 1.03x
UTF8Decode.o 21296 20790 -2.4% 1.02x
FloatingPointConversion.o 31033 30492 -1.7% 1.02x
MapReduce.o 23003 22732 -1.2% 1.01x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
MapReduceAnyCollection 171.286 879.5 +413.5% 0.19x
MapReduceAnyCollectionShort 996.25 2146.667 +115.5% 0.46x
CxxSpanTests.map 390.75 674.667 +72.7% 0.58x
ArrayInClass 130.682 174.296 +33.4% 0.75x (?)
DistinctClassFieldAccesses 35.946 44.061 +22.6% 0.82x
MapReduceShortString 9.746 11.795 +21.0% 0.83x (?)
ArrayPlusEqualSingleElementCollection 1097.049 1296.326 +18.2% 0.85x
CxxSpanTests.filter 568.0 669.5 +17.9% 0.85x (?)
StrComplexWalk 3018.571 3556.667 +17.8% 0.85x (?)
CharIndexing_ascii_unicodeScalars_Backwards 7660.0 8975.0 +17.2% 0.85x
StringWalk 1225.6 1435.676 +17.1% 0.85x
ArraySetElement 261.75 305.286 +16.6% 0.86x (?)
StringAdder 301.667 346.5 +14.9% 0.87x (?)
CharIteration_japanese_unicodeScalars 5546.667 6367.273 +14.8% 0.87x (?)
CharIndexing_punctuated_unicodeScalars_Backwards 1785.333 2038.182 +14.2% 0.88x (?)
CharIndexing_tweet_unicodeScalars_Backwards 15240.0 17350.0 +13.8% 0.88x (?)
CharIteration_chinese_unicodeScalars 3460.0 3938.667 +13.8% 0.88x (?)
CharIteration_tweet_unicodeScalars 7977.778 9066.667 +13.6% 0.88x (?)
CharIteration_punctuatedJapanese_unicodeScalars 846.429 960.741 +13.5% 0.88x (?)
CharIteration_ascii_unicodeScalars 4062.222 4610.0 +13.5% 0.88x (?)
MapReduceShort 921.765 1045.625 +13.4% 0.88x
SubstringFromLongString2 21.767 24.667 +13.3% 0.88x (?)
CharIteration_punctuated_unicodeScalars 958.049 1080.0 +12.7% 0.89x (?)
Array2D 4594.0 5154.0 +12.2% 0.89x (?)
CharIndexing_chinese_unicodeScalars_Backwards 6730.909 7531.429 +11.9% 0.89x (?)
CharIndexing_japanese_unicodeScalars_Backwards 10665.0 11928.0 +11.8% 0.89x (?)
StringComparison_ascii 185.7 206.667 +11.3% 0.90x (?)
CharIndexing_russian_unicodeScalars_Backwards 7632.0 8432.0 +10.5% 0.91x (?)
StringInterpolationSmall 866.471 956.875 +10.4% 0.91x (?)
CharIndexing_punctuatedJapanese_unicodeScalars_Backwards 1573.846 1737.6 +10.4% 0.91x (?)
ParseInt.IntSmall.Decimal 261.111 287.0 +9.9% 0.91x (?)
PrefixWhileSequence 216.5 237.75 +9.8% 0.91x (?)
NaiveRRC.init.largeContiguous 6.527 7.126 +9.2% 0.92x
PrefixWhileAnySeqCRangeIter 206.333 224.8 +9.0% 0.92x (?)
SubstringFromLongStringGeneric2 26.385 28.688 +8.7% 0.92x (?)
WordCountHistogramASCII 2551.852 2772.0 +8.6% 0.92x (?)
PrefixWhileAnySeqCntRange 206.556 223.9 +8.4% 0.92x (?)
Set.subtracting.Seq.Empty.Box 105.643 114.368 +8.3% 0.92x (?)
String.replaceSubrange.String.Small 38.731 41.87 +8.1% 0.93x (?)
StaticArray 1.64 1.772 +8.0% 0.93x (?)
StringBuilderSmallReservingCapacity 396.5 427.5 +7.8% 0.93x (?)
NaiveRRC.append.largeContiguous 6.186 6.657 +7.6% 0.93x (?)
StringBuilder 388.6 418.0 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
MapReduceLazySequence 87.714 44.0 -49.8% 1.99x
MapReduce 123.929 80.105 -35.4% 1.55x
MapReduceLazyCollection 65.886 43.344 -34.2% 1.52x
CxxStringConversion.cxx.to.swift 142.0 109.5 -22.9% 1.30x (?)
ConvertFloatingPoint.MockFloat64ToInt64 654.0 512.5 -21.6% 1.28x (?)
Monoids 46348180.0 39515711.0 -14.7% 1.17x
Set.filter.Int100.16k 49.912 42.623 -14.6% 1.17x (?)
Set.filter.Int100.28k 88.37 75.594 -14.5% 1.17x (?)
Set.filter.Int100.24k 72.667 62.625 -13.8% 1.16x
Set.filter.Int100.20k 61.308 52.87 -13.8% 1.16x (?)
Set.isSuperset.Seq.Empty.Int 55.5 47.917 -13.7% 1.16x (?)
Set.isStrictSuperset.Seq.Empty.Int 70.8 61.389 -13.3% 1.15x (?)
Set.isDisjoint.Int.Empty 54.448 47.889 -12.0% 1.14x (?)
RangeAssignment 132.429 117.071 -11.6% 1.13x (?)
ObjectiveCBridgeStubToNSStringRef 85.962 76.586 -10.9% 1.12x (?)
NormalizedIterator_fastPrenormal 600.0 543.696 -9.4% 1.10x (?)
MapReduceString 42.4 38.69 -8.7% 1.10x (?)
NormalizedIterator_ascii 96.5 88.731 -8.1% 1.09x (?)
PrefixAnyCollectionLazy 33244.0 30823.0 -7.3% 1.08x (?)
Set.isSubset.Int.Empty 51.167 47.607 -7.0% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
RomanNumbers.o 3755 4427 +17.9% 0.85x
AngryPhonebook.o 7507 8116 +8.1% 0.92x
DictOfArraysToArrayOfDicts.o 17755 18824 +6.0% 0.94x
MapReduce.o 18295 19294 +5.5% 0.95x
StringComparison.o 28797 30273 +5.1% 0.95x
ReversedCollections.o 6545 6848 +4.6% 0.96x
StringEdits.o 7906 8221 +4.0% 0.96x
Monoids.o 126748 130619 +3.1% 0.97x
Breadcrumbs.o 38365 39533 +3.0% 0.97x
CxxSpanTests.o 14515 14955 +3.0% 0.97x
WordCount.o 31311 32203 +2.8% 0.97x
DriverUtils.o 111325 114142 +2.5% 0.98x
 
Improvement OLD NEW DELTA RATIO
RangeAssignment.o 2464 2008 -18.5% 1.23x
FlattenList.o 3008 2530 -15.9% 1.19x
RangeOverlaps.o 4818 4236 -12.1% 1.14x
RangeContains.o 5752 5170 -10.1% 1.11x
CString.o 5936 5491 -7.5% 1.08x
ArrayRemoveAll.o 6252 5793 -7.3% 1.08x
IntegerParsing.o 67992 63369 -6.8% 1.07x
LazyFilter.o 6764 6305 -6.8% 1.07x
DictionaryGroup.o 10384 9926 -4.4% 1.05x
QueueTest.o 10586 10146 -4.2% 1.04x
RemoveWhere.o 10604 10164 -4.1% 1.04x
FloatingPointConversion.o 35875 35378 -1.4% 1.01x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
CxxStringConversion.cxx.to.swift 113.667 145.0 +27.6% 0.78x (?)
StringAdder 389.167 447.4 +15.0% 0.87x
KeyPathReadPerformance 9.605 10.974 +14.3% 0.88x
KeyPathsSmallStruct 9.916 11.116 +12.1% 0.89x
Hanoi 7630.0 8310.0 +8.9% 0.92x (?)
ArrayAppendAscii 12503.5 13523.5 +8.2% 0.92x (?)
StrComplexWalk 4804.0 5192.5 +8.1% 0.93x
SubstringFromLongString2 100.208 108.273 +8.0% 0.93x
String.replaceSubrange.String.Small 42.702 46.023 +7.8% 0.93x
 
Improvement OLD NEW DELTA RATIO
RandomDoubleDef 37800.0 35140.0 -7.0% 1.08x
RandomDoubleOpaqueDef 38200.0 35542.857 -7.0% 1.07x (?)

Code size: -swiftlibs

Regression OLD NEW DELTA RATIO
libswiftSwiftPrivate.dylib 28672 32768 +14.3% 0.88x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 32 GB

@glessard
Copy link
Contributor

@swift-ci Apple Silicon benchmark

@natecook1000
Copy link
Member

It looks like this change is identifying an optimization that sometimes takes place in map-reduce chains? The only real effect here should be the elimination of uniqueness checking in Array.append, but clearly something significant is blowing up in the MapReduceClass... benchmarks.

@stephentyrone
Copy link
Contributor

Fairly sure the regression is due to Array vs ContiguousArray.

Avoid bridging by building a `ContiguousArray`
@glessard
Copy link
Contributor

@swift-ci please benchmark

@JaapWijnen
Copy link
Contributor Author

Sorry currently travelling until August 2nd. If this new benchmark run is still not conclusive, I will share our test when I'm back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants