Avoid calling expensive.toUri.toString and other micro-optimizations
#1629
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1628
This PR micro-optimizes the top three hotspots i found in the JProfiler profiles compiling the Netty codebase using Javac in Zinc in Mill
sortBy
.sortBy(x => x.toUri().toString())creates a new URI object and String for every comparison during the sort - that's O(n log n) object allocations, and it's blowing up the profile on the linked issue..sortedinstead uses Path's nativeComparable<Path>implementation which compares paths directly without allocating objects.Both produce deterministic ordering (which is what's needed for reproducible content hashes). The only semantic difference is the comparison method (lexicographic path comparison vs URI string comparison), but for the purpose of consistent hashing, both work.
isDirectory
toVirtualFilecalledFiles.isDirectory()on every file from filtered list. AddedtoVirtualFileForRegularFile()that skips the syscall sys we know it is already a regular fileCaching contentHash
Changed all our
def contentHashes tolazy vals. Looking at the code it seems to be safe - we re-create the file instances every time, and when file metadata changesClasspathCachecreates a newVirtualFile. Furthermore,contentHashStris already alazy val, so it seems the code already assumes the files are re-created when changed and so it should be safe to cache the hashesManually tested this via
publishLocaland manually reproducing the original workload in the linked issue, the performance seems to have gone back to previous levels and the profiler no longer showstoURItaking up 34% of CPU time, and theisDirectoryandcontentHashhotspots each taking ~5% of CPU time dropped off the profiles as well../mill clean && time ./mill __.compilein the Netty codebase dropped from ~15s to ~8.5s