I noticed that the compile command in the README did not contain the -ffast-math flag. Without that flag, there is almost no advantage for the optimized code (speedups of 0.9x - 1.1x). Adding -ffast-math gave me up to a 1,000,000x speedup on the largest problem size in the benchmark.
After doing some poking around, I found that the following patch, which changes the definition of WEIGHT_MAX from __builtin_inf() to DBL_MAX more-or-less removes the advantage that DMMSY Opt has, even when using -ffast-math:
diff --git a/include/common.h b/include/common.h
index 1234107..a31c568 100644
--- a/include/common.h
+++ b/include/common.h
@@ -3,12 +3,13 @@
#include <stdint.h>
#include <stdbool.h>
+#include <float.h>
typedef uint32_t node_t;
typedef double weight_t;
// Infinity equivalent for double
-#define WEIGHT_MAX __builtin_inf()
+#define WEIGHT_MAX DBL_MAX
#define NODE_MAX ((node_t)-1)
// Edge structure
I compiled the code with the following command: clang -O3 -ffast-math -march=native -flto -DNDEBUG src/*.c -I include -o dmmsy_bench -lm, using Clang version 21.1.8
Timing Results
Timing output before patch (WEIGHT_MAX = __builtin_inf())
Details
===========================================================================
n m Dijkstra DMMSY Res DMMSY Opt Spd(Opt)
---------------------------------------------------------------------------
1000 5000 0.0170 0.0037 0.0000 342.80 x
5000 25000 0.0008 0.0002 0.0000 18.41 x
10000 50000 0.9935 0.8738 0.0001 12480.83x
25000 125000 2.9053 2.6133 0.0001 51573.26x
50000 250000 6.4332 5.7430 0.0001 74372.67x
75000 375000 10.0353 9.1648 0.0001 129767.44x
100000 500000 13.8912 12.5027 0.0001 93438.25x
150000 750000 22.6547 20.2336 0.0002 114417.77x
200000 1000000 29.8072 27.1743 0.0003 96541.64x
250000 1250000 42.9747 35.8088 0.0002 179810.35x
350000 1750000 64.0961 60.6058 0.0005 133071.51x
500000 2500000 114.3707 102.8302 0.0002 646162.01x
750000 3750000 195.8446 189.5765 0.0002 979222.88x
1000000 5000000 305.1935 321.8353 0.0003 1220774.15x
===========================================================================
Timing output after patch (WEIGHT_MAX = DBL_MAX)
Details
===========================================================================
n m Dijkstra DMMSY Res DMMSY Opt Spd(Opt)
---------------------------------------------------------------------------
1000 5000 0.0180 0.0244 0.0200 0.90 x
5000 25000 0.0008 0.0002 0.0000 17.26 x
10000 50000 1.0094 0.9951 1.0156 0.99 x
25000 125000 2.9112 2.9222 2.9948 0.97 x
50000 250000 6.4643 6.4533 6.6439 0.97 x
75000 375000 10.0461 10.4196 10.2352 0.98 x
100000 500000 13.5921 13.7043 14.0339 0.97 x
150000 750000 21.5044 21.6953 22.5653 0.95 x
200000 1000000 30.4776 36.0661 32.5664 0.94 x
250000 1250000 42.2718 40.8033 44.2124 0.96 x
350000 1750000 68.5082 69.1117 71.3077 0.96 x
500000 2500000 113.9036 114.7898 121.7260 0.94 x
750000 3750000 189.4533 214.2840 214.3745 0.88 x
1000000 5000000 312.2623 317.0314 340.4416 0.92 x
===========================================================================
Both implementations appear to pass the correctness checks.
Unfortunately, I don't have the resources to dig in further to see if this can be rectified and the original advantage of DMMSY Opt restored.
This problem is mentioned in #1, but it is claimed there that the benchmarks should still show the speedup with -ffast-math removed. This does not appear to be the case on my computer, and further, it seems that -ffast-math shows no speedup when WEIGHT_MAX is modified as above. I do not know if this is specific to my system or a more general problem.
I noticed that the compile command in the README did not contain the
-ffast-mathflag. Without that flag, there is almost no advantage for the optimized code (speedups of 0.9x - 1.1x). Adding-ffast-mathgave me up to a 1,000,000x speedup on the largest problem size in the benchmark.After doing some poking around, I found that the following patch, which changes the definition of
WEIGHT_MAXfrom__builtin_inf()toDBL_MAXmore-or-less removes the advantage that DMMSY Opt has, even when using-ffast-math:I compiled the code with the following command:
clang -O3 -ffast-math -march=native -flto -DNDEBUG src/*.c -I include -o dmmsy_bench -lm, using Clang version 21.1.8Timing Results
Timing output before patch (
WEIGHT_MAX = __builtin_inf())Details
Timing output after patch (
WEIGHT_MAX = DBL_MAX)Details
Both implementations appear to pass the correctness checks.
Unfortunately, I don't have the resources to dig in further to see if this can be rectified and the original advantage of DMMSY Opt restored.
This problem is mentioned in #1, but it is claimed there that the benchmarks should still show the speedup with
-ffast-mathremoved. This does not appear to be the case on my computer, and further, it seems that-ffast-mathshows no speedup whenWEIGHT_MAXis modified as above. I do not know if this is specific to my system or a more general problem.The compiler optimization essentially breaks the algorithm's state-reset mechanism between consecutive runs.
When compiling with -ffast-math, the compiler implicitly enables -ffinite-math-only, allowing the compiler to assume that floating-point variables will never contain NaN or Infinity.
When the compiler encounters this conditional check in
if (ws->d[v] == WEIGHT_MAX) { // where WEIGHT_MAX is __builtin_inf() ws->dirty_d[ws->ds_cnt++] = v; }it evaluates
ws->d[v] == __builtin_inf()as statically false and completely optimizes away the block of code inside the if statement.As a result of this optimization,
ws->dirty_dis never populated with newly discovered vertices.Since it is empt…