Rewrite `core.checkedint` functions. #1394

denis-sh · 2015-09-22T16:31:39Z

Use fast and simple sign comparison instead of long, slow and complicated value comparisons.

Also remove reference as it doesn't apply anymore.

P.S.

I planned to rewrite this module implementation since its creation because it's clearly written in a hurry (e.g. see r < x || r < y check in addu), sorry this took that long to get to it. IMHO this is one of the modules that have to show source code readers how simple, robust and efficient code should be written in D.

MartinNowak · 2015-09-24T21:09:52Z

Looks good, maybe add a few microbenchmarks to https://github.com/D-Programming-Language/druntime/tree/master/benchmark.

dnadlinger · 2015-09-24T21:51:06Z

Why all the style changes? And because you specifically mention the module as a potential style showcase: I don't think we should encourage the use of e.g. bitwise not for boolean values, nor relying on operator precedence wherever possible.

Also, you should add your name after Walter's. He is still the primary author of the API.

denis-sh · 2015-09-25T08:59:02Z

Looks good, maybe add a few microbenchmarks to

OK, I will able to do it probably tomorrow.

Why all the style changes?

Do you mean const additions, spaces after cast(X), and ulong(x) * ulong(y) to cast(ulong) x * y changes? If so, what exactly is the problem here?

And because you specifically mention the module as a potential style showcase: I don't think we should encourage the use of e.g. bitwise not for boolean values,

No boolean values are used. All types are size_t.

nor relying on operator precedence wherever possible.

Hm, I don't see any complicated cases in my code, the only thing used are a & b || c && d (it's known logical operators has lower priority compared to arithmetic/bitwise) and a >>> b + c (C++'s cin/cout taught us about these cases).
Just a note: In contrast, previous file version had this: && ((r == y)? r : (r / x) != y), where the only probably needed for clarity parantheses pair is around r / x != y and is missed but three useless pairs are added just to complicate expression reading as (r == y) tells the reader ?: evaluates before == and then the reader may think the last != y compares the result of the whole previous text with y.

Also, you should add your name after Walter's. He is still the primary author of the API.

Sorry, I supposed it was alphabetically sorted for some reason. Fixed.

WalterBright · 2015-09-25T09:34:00Z

src/core/checkedint.d

@@ -19,10 +19,9 @@
 * relative to the cost of the operation itself, compiler implementations are free
 * to recognize them and generate equivalent and faster code.
 *
- * References: $(LINK2 http://blog.regehr.org/archives/1139, Fast Integer Overflow Checks)


Why take out the reference? This is critical information.

Function implementations are completely rewritten so there is no connection with this blog any more. Also after looking into the blog it looks very silly: after the author performs bitshifts in checked_add_3 instead of using bitwise operators (like done in this pull) he uses this long and slow way:

(sa && sb && !sr) || (!sa && !sb && sr)

So is the really a reason to include a reference to this blog?

WalterBright · 2015-09-25T09:43:38Z

As noted in the reference that was elided by this PR (and should be reinstated), doing these checks is tricky because of the ways some compilers generate code. The way I wrote the code was to avoid various code generators invalidating the results.

I'm not saying your method is wrong, but I do suggest a careful review of the unittests to ensure that it really does work correctly and is tested thoroughly.

denis-sh · 2015-09-27T08:16:38Z

...doing these checks is tricky because of the ways some compilers generate code. The way I wrote the code was to avoid various code generators invalidating the results.

Do you mean there are code generation bugs and these bugs should be considered in every module we write? Because I don't see when this pull rely on some concrete implementation of UB case.

denis-sh · 2015-09-27T08:23:26Z

return cast(size_t) (n >>> T.sizeof * 8 - 1);

Many compilers generate very good code for bool b = a < b;. I suggest that instead of the shift.

size_t _differentSign(T)(in T a, in T b)

should return a bool

Do you suppose _signBit should be n => n < 0 instead of n => cast(size_t) (n >>> T.sizeof * 8 - 1) and _differentSign should be (a, b) => a._signBit != b._signBit instead of (a, b) => a._signBit ^ b._signBit?

ibuclaw · 2015-09-27T08:35:59Z

src/core/checkedint.d

+    const r = x * y;
+    if (  ~x._differentSign(y)  // x and y has the same sign
+         & r._signBit           // and result is negative (covers min * -1)
+        || x && r / x != y)     // or perform simple check for x != 0


Something amazing going on with whitespace here. Quite dizzying to review at a glance.

denis-sh · 2015-09-27T11:38:33Z

Improved the original version by using (a ^ b)._signBit instead of a._signBit ^ b._signBit to save one shift operation.
Restored reference to silly blog post (still against this change).

Do you suppose _signBit should be n => n < 0 instead of n => cast(size_t) (n >>> T.sizeof * 8 - 1) and _differentSign should be (a, b) => a._signBit != b._signBit instead of (a, b) => a._signBit ^ b._signBit?

Added commit which rewrites implementation this way. The generated code still looks pretty good on GDC (just uses two conditional jumps instead of one) and very good on Clang ((a < 0) != (b < 0) is fully optimized but still there are two conditional jumps for a && b construction) (and sorry, didn't test LDC, only Clang).

ibuclaw · 2015-09-27T11:50:48Z

The generated code still looks pretty good on GDC (just uses two conditional jumps instead of one)

Such a shame that GDC ignores these function bodies then. ;-)

dnadlinger · 2015-09-28T09:41:16Z

src/core/checkedint.d

-    if (r < int.min || r > int.max)
+    const r = x - y;
+    if (   x._differentSign(y)   // x and y has different sign
+        && x._differentSign(r))  // result and x has different sign


I would really rather put the && at the end of the first line (which is recommended in style guides anyway, e.g. Code Complete) so that you don't need the funky whitespace in the first line. Just makes it harder for the brain to parse.

dnadlinger · 2015-09-28T09:43:08Z

LDC won't use those function bodies either, except on platforms where there is no native instruction (e.g. long on x86).

tsbockman · 2015-09-29T14:49:47Z

This version of core.checkedint still passes my preciseops test suite, which is considerably more thorough than the druntime unittests.

Performance-wise, I find it a bit slower with DMD than the old version was.

denis-sh · 2015-09-29T14:57:18Z

Performance-wise, I find it a bit slower with DMD than the old version was.

With DMD, could you check the first commit (the one without the use of logical operations) please?

tsbockman · 2015-09-29T15:59:39Z

(I should note that my tests assume no one has gotten around to implementing intrinsics for core.checkedint on DMD yet; I need to retest if this is incorrect.)

The bitwise version was slightly slower than the logical version, even after I optimized it by combining some unnecessary shift operations.

denis-sh · 2015-09-29T16:11:11Z

The bitwise version was slightly slower than the logical version, even after I optimized it by combining some unnecessary shift operations.

What functions exactly are you measuring? Is adds for long slower? I.e. is ~cast(size_t)((x ^ y) >>> 63) & cast(size_t)((x ^ r) >>> 63) slower than x < 0 && y < 0 && r >= 0 || x >= 0 && y >= 0 && r < 0?

tsbockman · 2015-09-29T16:35:20Z

I am running the benchmark found in main.d of my CheckedInt project. It uses checked arithmetic to do various make-work calculations like prime finding, integer square root, the collatz sequence, and quicksort.

It was designed to test the performance of my checked integer type, but in the process it exercises all the contents of core.checkedint. On DMD, most of the runtime is taken up by other things. But, when holding other factors constant, the performance differences between various core.checkedint implementations are discernible.

I am not really a big fan of micro-benchmarks, because it is so hard to get them right. If you really want, I can take a stab at breaking things down in more detail later. However, I think it's probably a waste of time; the old core.checkedint implementation is already fast enough that, with a good optimizer (by which I mean GDC), the performance boost from enabling intrinsics is barely noticeable.

Compared to bitwise equivalents, the kind of test logic used in the old implementation looks slow, but in reality it is highly amenable to optimization and superscalar execution.

If this pull request is accepted, it should be for some reason other than runtime performance.

denis-sh · 2015-09-29T17:08:04Z

If this pull request is accepted, it should be for some reason other than runtime performance.

Agree. I tried to improve the whole module implementation style.

denis-sh · 2015-09-29T17:09:27Z

This is a && b / c != d, parenthesis really does help make it clear at a glance just exactly what it is doing.

Readded.

I would really rather put the && at the end of the first line

Changed as I'm probably the only one here using such style and ignoring Code Complete.

MartinNowak · 2015-09-29T17:09:52Z

I am not really a big fan of micro-benchmarks

Let's please add them anyhow. Number of instructions and latency is a good proxy, though you can hardly measure the impact on the branch cache.

denis-sh · 2015-09-29T17:18:58Z

Looks good, maybe add a few microbenchmarks to https://github.com/D-Programming-Language/druntime/tree/master/benchmark.

Not sure about benchmarks. For now I only can describe resulting machine code.

This is what Clang does (one can see full (a < 0) != (b < 0) optimization applied):

adds(int, int, bool*):
    lea eax, [rsi + rdi]
    xor esi, edi
    js  .LBB0_3
    xor edi, eax
    jns .LBB0_3
    mov byte ptr [rdx], 1
.LBB0_3:
    ret

and this is what GCC does (just direct shifting and comparison, looks rather slow):

adds(int, int, bool*):
    leal    (%rdi,%rsi), %eax
    shrl    $31, %esi
    shrl    $31, %edi
    cmpb    %dil, %sil
    jne .L4
    movl    %eax, %ecx
    shrl    $31, %ecx
    cmpb    %cl, %sil
    je  .L4
    movb    $1, (%rdx)
.L4:
    rep ret

So yes, one have to use (a ^ b)._signBit or GCC will not optimize it (but if one does use ^, GCC will produce the same assembly as Clang's shown above).

tsbockman · 2015-09-29T17:23:14Z

@MartinNowak

Looks good, maybe add a few microbenchmarks to https://github.com/D-Programming-Language/druntime/tree/master/benchmark.

Is there an equivalent standard benchmark for Phobos somewhere?

denis-sh · 2015-09-29T17:38:26Z

~diffSign(x, y) && diffSign(x, r) is a wrong code if diffSign returns 1.

~~Further looking at resulting assembly is even more interesting:~~

~~~diffSign(x, y) && diffSign(x, r) gives us this with with Clang:~~

adds(int, int, bool*):
    add esi, edi               # looks unoptimal, could use EAX via lea
    xor edi, esi
    jns .LBB0_2
    mov byte ptr [rdx], 1
.LBB0_2:
    mov eax, esi               # as EAX wasn't used
    ret

~~and this with GCC (if (a ^ b)._signBit is used of course):~~

adds(int, int, bool*):
    lea eax, [rsi+rdi]
    xor edi, eax
    jns .L4
    mov BYTE PTR [rdx], 1
.L4:
    rep ret

GCC is the "winner" here. The instruction count winner who can't optimize (a < 0) != (b < 0).

~~Usage of ! instead of ~ or & instead of && increases instruction count, sometimes a lot.~~

denis-sh · 2015-09-29T17:50:38Z

~~> Further looking at resulting assembly is even more interesting:~~

~~Good short assembly. Good except it's incorrect. Looks like it didn't take much time too find codegen bug.~~

denis-sh · 2015-09-29T18:31:39Z

Further looking at resulting assembly is even more interesting:

Good short assembly. Good except it's incorrect. Looks like it didn't take much time too find codegen bug.

Sorry for the noise, just my example code was incorrect. ~1 isn't zero.

Use fast and simple sign comparison instead of long, slow and complicated value comparisons. Also remove reference as it doesn't apply anymore.

At least Clang knows `(a < 0) != (b < 0)` pattern. The only possible shortcoming of this change is that use of logical operators currently causes compilers to emit two conditional jumps instead of one for bitwise operations.

denis-sh · 2015-09-29T18:44:22Z

Changed the first commit to use sign-extending shift to make ~diffSign(x, y) && diffSign(x, r) construction valid too because otherwise there should be an warning comment and I don't want to add one just for the first commit.

denis-sh · 2015-09-29T18:51:40Z

Not sure about benchmarks.

Will try to add one in a couple of days.

MartinNowak · 2015-09-29T19:59:02Z

Is there an equivalent standard benchmark for Phobos somewhere?

Not that I'm aware of, but this PR is druntime code anyhow.

tsbockman · 2015-09-29T20:36:42Z

@MartinNowak

Not that I'm aware of, but this PR is druntime code anyhow.

Sure. I was just wondering because it would be relevant to something else I'm working on.

DmitryOlshansky · 2016-03-21T09:29:33Z

Ping @denis-sh

denis-sh · 2016-03-21T18:19:14Z

Ping @denis-sh

version block styling issue or request for benchmark?

DmitryOlshansky · 2016-03-21T18:24:18Z

version block styling issue or request for benchmark?

I think benchmarking is more important and was your last intention.

denis-sh force-pushed the imprve-`core.checkedint` branch from a0b8b50 to 83ec881 Compare September 25, 2015 08:33

WalterBright reviewed Sep 25, 2015
View reviewed changes

ibuclaw reviewed Sep 27, 2015
View reviewed changes

denis-sh force-pushed the imprve-`core.checkedint` branch from 83ec881 to cdbc2c0 Compare September 27, 2015 11:30

dnadlinger reviewed Sep 28, 2015
View reviewed changes

denis-sh force-pushed the imprve-`core.checkedint` branch from cdbc2c0 to 2986f54 Compare September 29, 2015 17:06

denis-sh force-pushed the imprve-`core.checkedint` branch from 2986f54 to f57723c Compare September 29, 2015 18:37

denis-sh added 2 commits September 29, 2015 21:37

Rewrite core.checkedint functions.

a675dcb

Use fast and simple sign comparison instead of long, slow and complicated value comparisons. Also remove reference as it doesn't apply anymore.

Use logical operations in and let compiler to optimize.

f57723c

At least Clang knows `(a < 0) != (b < 0)` pattern. The only possible shortcoming of this change is that use of logical operators currently causes compilers to emit two conditional jumps instead of one for bitwise operations.

dlang-bot added Needs Rebase needs a `git rebase` performed stalled Needs Work labels Jan 1, 2018

Uh oh!

Rewrite core.checkedint functions. #1394

Are you sure you want to change the base?

Rewrite core.checkedint functions. #1394

Uh oh!

Conversation

denis-sh commented Sep 22, 2015

P.S.

Uh oh!

MartinNowak commented Sep 24, 2015

Uh oh!

dnadlinger commented Sep 24, 2015

Uh oh!

denis-sh commented Sep 25, 2015

Uh oh!

WalterBright Sep 25, 2015

Choose a reason for hiding this comment

Uh oh!

denis-sh Sep 27, 2015

Choose a reason for hiding this comment

Uh oh!

WalterBright commented Sep 25, 2015

Uh oh!

denis-sh commented Sep 27, 2015

Uh oh!

denis-sh commented Sep 27, 2015

Uh oh!

ibuclaw Sep 27, 2015

Choose a reason for hiding this comment

Uh oh!

denis-sh commented Sep 27, 2015

Uh oh!

ibuclaw commented Sep 27, 2015

Uh oh!

dnadlinger Sep 28, 2015

Choose a reason for hiding this comment

Uh oh!

dnadlinger commented Sep 28, 2015

Uh oh!

tsbockman commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

tsbockman commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

tsbockman commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

MartinNowak commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

tsbockman commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

denis-sh commented Sep 29, 2015

Uh oh!

MartinNowak commented Sep 29, 2015

Uh oh!

tsbockman commented Sep 29, 2015

Uh oh!

DmitryOlshansky commented Mar 21, 2016

Uh oh!

denis-sh commented Mar 21, 2016

Uh oh!

DmitryOlshansky commented Mar 21, 2016

Uh oh!

Uh oh!

Rewrite `core.checkedint` functions. #1394

Rewrite `core.checkedint` functions. #1394