-
Notifications
You must be signed in to change notification settings - Fork 117
perf: increase interpolator3 speed and remove large minestom generator allocations #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev/7.0-2
Are you sure you want to change the base?
perf: increase interpolator3 speed and remove large minestom generator allocations #512
Conversation
return cache.get(pack(x, z)); | ||
} | ||
|
||
private long pack(final int x, final int z) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we already have a function somewhere which packs two integers into a long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is from like the 3.0 days, I wrote it.
.maximumSize(128) | ||
.recordStats() | ||
.build((Pair<Integer, Integer> key) -> generateChunk(key.getLeft(), key.getRight())); | ||
.build((Long key) -> generateChunk(unpackX(key), unpackZ(key))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be worthwhile to have our own kind of cache which uses is backed by a fastutil Long2ObjectMaps
and avoids boxing the primitives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have something in the works for this for layered, out of scope for this PR but good idea
...n/java/com/dfsek/terra/addons/chunkgenerator/generation/math/interpolation/Interpolator.java
Show resolved
Hide resolved
import org.gradle.api.Project | ||
import org.gradle.kotlin.dsl.apply | ||
|
||
fun Project.configureBenchmarking() { | ||
apply(plugin = "me.champeau.jmh") | ||
} No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned it a while ago that it might be good to add some jmh benchmarks, I made a gradle config for that, I forget if it was just identical to this or if I had smth else as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
public class Interpolator3Benchmark { | ||
private final Interpolator3 interpolator = new Interpolator3(0, 1, 0, 1, 0, 1, 0, 1); | ||
|
||
@Benchmark | ||
public void benchmarkInterpolator3(Blackhole blackhole) { | ||
blackhole.consume(interpolator.trilerp(0.5, 0.75, 0.5)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this benchmark should probably use random values to stop the jvm from
perhaps pre-populate an array with random values and step through that for each invocation of the benchmark to avoid the overhead of the random number generator, since this is rather low level.
which architectures & jvms has this been tested on? low level optimizations like this are extremely finicky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@Warmup(iterations = 2, time = 1) | ||
@Measurement(iterations = 2, time = 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good to do more iterations for a bit longer than that.
I usually see 1 warmup iteration for 5 seconds & 5 measurement iterations for 5 seconds, for a total of 60 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
double b = ArithmeticFunctions.fma(2, y, -1); | ||
double g = ArithmeticFunctions.fma(2, z, -1); | ||
|
||
// using explicit fma here somehow makes this slower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share the benchmarks for this?
in fact, it would be good if you could share all the benchmarks you did for different changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that honestly should not be the case as the jvm should just optimize fma into a*b + c on non supported platforms
Pull Request
Description
Makes Interpolator3, a previous hotspot, run faster.
Changelog
Checklist
Mandatory checks
ver/
prefix)or is a branch that is intended to be merged into a version branch.
CONTRIBUTING.md
document in the root of the git repository.
Types of changes
Compatibility
Documentation
Testing
(Do benchmarks count here?)
Licensing
release it under GPLv3.
released under GPLv3 or a compatible license.