Skip to content

Conversation

Bloeckchengrafik
Copy link
Member

Pull Request

Description

Makes Interpolator3, a previous hotspot, run faster.

Changelog

Checklist

Mandatory checks

  • The base branch of this PR is an unreleased version branch (that has a ver/ prefix)
    or is a branch that is intended to be merged into a version branch.
  • There are no already existing PRs that provide the same changes.
  • The PR is within the scope of Terra (i.e. is something a configurable terrain generator should be doing).
  • Changes follow the code style for this project.
  • I have read the CONTRIBUTING.md
    document in the root of the git repository.

Types of changes

  • Bug Fix
  • Build system
  • Documentation
  • New Feature
  • Performance
  • Refactoring
  • Repository
  • Revert
  • Style
  • Tests
  • Translation

Compatibility

  • Introduces a breaking change
  • Introduces new functionality in a backwards compatible way.
  • Introduces bug fixes

Documentation

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Testing

  • I have added tests to cover my changes.
  • All new and existing tests passed.

(Do benchmarks count here?)

Licensing

  • [x ] I am the original author of this code, and I am willing to
    release it under GPLv3.
  • I am not the original author of this code, but it is in public domain or
    released under GPLv3 or a compatible license.

return cache.get(pack(x, z));
}

private long pack(final int x, final int z) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we already have a function somewhere which packs two integers into a long

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is from like the 3.0 days, I wrote it.

.maximumSize(128)
.recordStats()
.build((Pair<Integer, Integer> key) -> generateChunk(key.getLeft(), key.getRight()));
.build((Long key) -> generateChunk(unpackX(key), unpackZ(key)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be worthwhile to have our own kind of cache which uses is backed by a fastutil Long2ObjectMaps and avoids boxing the primitives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have something in the works for this for layered, out of scope for this PR but good idea

Comment on lines +1 to +6
import org.gradle.api.Project
import org.gradle.kotlin.dsl.apply

fun Project.configureBenchmarking() {
apply(plugin = "me.champeau.jmh")
} No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it a while ago that it might be good to add some jmh benchmarks, I made a gradle config for that, I forget if it was just identical to this or if I had smth else as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +15 to +22
public class Interpolator3Benchmark {
private final Interpolator3 interpolator = new Interpolator3(0, 1, 0, 1, 0, 1, 0, 1);

@Benchmark
public void benchmarkInterpolator3(Blackhole blackhole) {
blackhole.consume(interpolator.trilerp(0.5, 0.75, 0.5));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this benchmark should probably use random values to stop the jvm from

perhaps pre-populate an array with random values and step through that for each invocation of the benchmark to avoid the overhead of the random number generator, since this is rather low level.

which architectures & jvms has this been tested on? low level optimizations like this are extremely finicky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +13 to +14
@Warmup(iterations = 2, time = 1)
@Measurement(iterations = 2, time = 5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably good to do more iterations for a bit longer than that.

I usually see 1 warmup iteration for 5 seconds & 5 measurement iterations for 5 seconds, for a total of 60 seconds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

double b = ArithmeticFunctions.fma(2, y, -1);
double g = ArithmeticFunctions.fma(2, z, -1);

// using explicit fma here somehow makes this slower
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share the benchmarks for this?

in fact, it would be good if you could share all the benchmarks you did for different changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that honestly should not be the case as the jvm should just optimize fma into a*b + c on non supported platforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants