-
Notifications
You must be signed in to change notification settings - Fork 135
Open
Description
Hello,
I'm solving large systems (130 million x 130 million, approx 16 NNZs per row, P1 FEM for a Poisson-like equation)
on a big machine (4 CPUs Intel Xeon Gold 6240L, 18 cores/CPU, with hyperthreading, this makes 144 cores).
I am solving a series of linear systems (34 of them, it is a Newton iteration). I noticed that if I'm using all the
144 cores, performance is not as good as when using a limited number of cores (the optimum seems to be around 32 concurrent threads).
Is there a maximum recommended number of threads to use with AMGCL ?
Thank you in advance !
Running times with 144 threads
o-[Linear solve] 22 iters in 82.01 seconds 0 GFlop/s ||Ax-b||/||b||=7.8568e-05
o-[Linear solve] 10 iters in 127.75 seconds 0 GFlop/s ||Ax-b||/||b||=7.25841e-05
o-[Linear solve] 13 iters in 167.87 seconds 0 GFlop/s ||Ax-b||/||b||=6.33273e-05
o-[Linear solve] 13 iters in 81.49 seconds 0 GFlop/s ||Ax-b||/||b||=9.99431e-05
o-[Linear solve] 16 iters in 83.96 seconds 0 GFlop/s ||Ax-b||/||b||=6.81104e-05
o-[Linear solve] 17 iters in 271.67 seconds 0 GFlop/s ||Ax-b||/||b||=6.06977e-05
o-[Linear solve] 18 iters in 381.52 seconds 0 GFlop/s ||Ax-b||/||b||=7.01995e-05
o-[Linear solve] 18 iters in 187.06 seconds 0 GFlop/s ||Ax-b||/||b||=8.95339e-05
o-[Linear solve] 19 iters in 252.78 seconds 0 GFlop/s ||Ax-b||/||b||=9.93522e-05
o-[Linear solve] 20 iters in 250.98 seconds 0 GFlop/s ||Ax-b||/||b||=9.19154e-05
o-[Linear solve] 16 iters in 314.01 seconds 0 GFlop/s ||Ax-b||/||b||=6.86911e-05
o-[Linear solve] 16 iters in 284.93 seconds 0 GFlop/s ||Ax-b||/||b||=9.49837e-05
o-[Linear solve] 16 iters in 503.08 seconds 0 GFlop/s ||Ax-b||/||b||=8.53771e-05
o-[Linear solve] 21 iters in 393.95 seconds 0 GFlop/s ||Ax-b||/||b||=7.50488e-05
o-[Linear solve] 21 iters in 315.77 seconds 0 GFlop/s ||Ax-b||/||b||=9.23609e-05
o-[Linear solve] 24 iters in 82.01 seconds 0 GFlop/s ||Ax-b||/||b||=9.86193e-05
o-[Linear solve] 23 iters in 84.48 seconds 0 GFlop/s ||Ax-b||/||b||=7.68229e-05
o-[Linear solve] 23 iters in 518.88 seconds 0 GFlop/s ||Ax-b||/||b||=7.55897e-05
o-[Linear solve] 26 iters in 294.59 seconds 0 GFlop/s ||Ax-b||/||b||=7.50017e-05
o-[Linear solve] 26 iters in 452.91 seconds 0 GFlop/s ||Ax-b||/||b||=8.27896e-05
o-[Linear solve] 29 iters in 307.91 seconds 0 GFlop/s ||Ax-b||/||b||=8.07129e-05
o-[Linear solve] 30 iters in 738.95 seconds 0 GFlop/s ||Ax-b||/||b||=8.9117e-05
o-[Linear solve] 29 iters in 487.42 seconds 0 GFlop/s ||Ax-b||/||b||=8.3026e-05
o-[Linear solve] 29 iters in 169.96 seconds 0 GFlop/s ||Ax-b||/||b||=7.79059e-05
o-[Linear solve] 34 iters in 761.2 seconds 0 GFlop/s ||Ax-b||/||b||=7.77524e-05
o-[Linear solve] 33 iters in 540.74 seconds 0 GFlop/s ||Ax-b||/||b||=8.34394e-05
o-[Linear solve] 37 iters in 770.17 seconds 0 GFlop/s ||Ax-b||/||b||=8.85991e-05
o-[Linear solve] 22 iters in 103.75 seconds 0 GFlop/s ||Ax-b||/||b||=8.51125e-05
o-[Linear solve] 27 iters in 511.16 seconds 0 GFlop/s ||Ax-b||/||b||=9.32069e-05
o-[Linear solve] 35 iters in 293.12 seconds 0 GFlop/s ||Ax-b||/||b||=9.44283e-05
o-[Linear solve] 84 iters in 566.3 seconds 0 GFlop/s ||Ax-b||/||b||=8.68865e-05
o-[Linear solve] 93 iters in 579.91 seconds 0 GFlop/s ||Ax-b||/||b||=7.53275e-05
o-[Linear solve] 38 iters in 613.97 seconds 0 GFlop/s ||Ax-b||/||b||=9.31847e-05
o-[Linear solve] 35 iters in 145.38 seconds 0 GFlop/s ||Ax-b||/||b||=9.78387e-05
o-[Linear solve] 73 iters in 1307.2 seconds 0 GFlop/s ||Ax-b||/||b||=8.12692e-05
Running times with 64 threads
o-[Linear solve] 22 iters in 107.72 seconds 0 GFlop/s ||Ax-b||/||b||=7.8568e-05
o-[Linear solve] 10 iters in 88.31 seconds 0 GFlop/s ||Ax-b||/||b||=7.25841e-05
o-[Linear solve] 13 iters in 124.47 seconds 0 GFlop/s ||Ax-b||/||b||=6.33273e-05
o-[Linear solve] 13 iters in 150.9 seconds 0 GFlop/s ||Ax-b||/||b||=9.99431e-05
o-[Linear solve] 16 iters in 288.14 seconds 0 GFlop/s ||Ax-b||/||b||=6.81104e-05
o-[Linear solve] 17 iters in 151.65 seconds 0 GFlop/s ||Ax-b||/||b||=6.06977e-05
o-[Linear solve] 18 iters in 152.54 seconds 0 GFlop/s ||Ax-b||/||b||=7.01995e-05
o-[Linear solve] 18 iters in 330.22 seconds 0 GFlop/s ||Ax-b||/||b||=8.95339e-05
o-[Linear solve] 19 iters in 152.76 seconds 0 GFlop/s ||Ax-b||/||b||=9.93522e-05
o-[Linear solve] 20 iters in 87.05 seconds 0 GFlop/s ||Ax-b||/||b||=9.19154e-05
o-[Linear solve] 16 iters in 194.89 seconds 0 GFlop/s ||Ax-b||/||b||=6.86911e-05
o-[Linear solve] 16 iters in 265.67 seconds 0 GFlop/s ||Ax-b||/||b||=9.49837e-05
o-[Linear solve] 16 iters in 298.82 seconds 0 GFlop/s ||Ax-b||/||b||=8.53771e-05
o-[Linear solve] 21 iters in 183.34 seconds 0 GFlop/s ||Ax-b||/||b||=7.50488e-05
o-[Linear solve] 21 iters in 89.42 seconds 0 GFlop/s ||Ax-b||/||b||=9.23609e-05
o-[Linear solve] 24 iters in 176.19 seconds 0 GFlop/s ||Ax-b||/||b||=9.86193e-05
o-[Linear solve] 23 iters in 120.3 seconds 0 GFlop/s ||Ax-b||/||b||=7.68229e-05
o-[Linear solve] 23 iters in 96.84 seconds 0 GFlop/s ||Ax-b||/||b||=7.55897e-05
o-[Linear solve] 26 iters in 294.83 seconds 0 GFlop/s ||Ax-b||/||b||=7.50017e-05
o-[Linear solve] 26 iters in 517.7 seconds 0 GFlop/s ||Ax-b||/||b||=8.27896e-05
o-[Linear solve] 29 iters in 456.25 seconds 0 GFlop/s ||Ax-b||/||b||=8.07129e-05
o-[Linear solve] 30 iters in 177.95 seconds 0 GFlop/s ||Ax-b||/||b||=8.9117e-05
o-[Linear solve] 29 iters in 245.04 seconds 0 GFlop/s ||Ax-b||/||b||=8.3026e-05
o-[Linear solve] 29 iters in 99.87 seconds 0 GFlop/s ||Ax-b||/||b||=7.79059e-05
o-[Linear solve] 34 iters in 130.31 seconds 0 GFlop/s ||Ax-b||/||b||=7.77524e-05
o-[Linear solve] 33 iters in 114.76 seconds 0 GFlop/s ||Ax-b||/||b||=8.34394e-05
o-[Linear solve] 37 iters in 327.9 seconds 0 GFlop/s ||Ax-b||/||b||=8.85991e-05
o-[Linear solve] 22 iters in 110.32 seconds 0 GFlop/s ||Ax-b||/||b||=8.51125e-05
o-[Linear solve] 27 iters in 509.96 seconds 0 GFlop/s ||Ax-b||/||b||=9.32069e-05
o-[Linear solve] 35 iters in 97.31 seconds 0 GFlop/s ||Ax-b||/||b||=9.44283e-05
o-[Linear solve] 84 iters in 366.19 seconds 0 GFlop/s ||Ax-b||/||b||=8.68865e-05
o-[Linear solve] 93 iters in 705.1 seconds 0 GFlop/s ||Ax-b||/||b||=7.53275e-05
o-[Linear solve] 38 iters in 535.45 seconds 0 GFlop/s ||Ax-b||/||b||=9.31847e-05
o-[Linear solve] 35 iters in 232.96 seconds 0 GFlop/s ||Ax-b||/||b||=9.77832e-05
o-[Linear solve] 73 iters in 189.8 seconds 0 GFlop/s ||Ax-b||/||b||=8.12697e-05
Running times with 32 threads (best)
o-[Linear solve] 22 iters in 119.7 seconds 0 GFlop/s ||Ax-b||/||b||=7.8568e-05
o-[Linear solve] 10 iters in 86.19 seconds 0 GFlop/s ||Ax-b||/||b||=7.25841e-05
o-[Linear solve] 13 iters in 84.63 seconds 0 GFlop/s ||Ax-b||/||b||=6.33273e-05
o-[Linear solve] 13 iters in 82.87 seconds 0 GFlop/s ||Ax-b||/||b||=9.99431e-05
o-[Linear solve] 16 iters in 91.99 seconds 0 GFlop/s ||Ax-b||/||b||=6.81104e-05
o-[Linear solve] 17 iters in 97.42 seconds 0 GFlop/s ||Ax-b||/||b||=6.06977e-05
o-[Linear solve] 18 iters in 92.24 seconds 0 GFlop/s ||Ax-b||/||b||=7.01995e-05
o-[Linear solve] 18 iters in 100.1 seconds 0 GFlop/s ||Ax-b||/||b||=8.95339e-05
o-[Linear solve] 19 iters in 99.02 seconds 0 GFlop/s ||Ax-b||/||b||=9.93522e-05
o-[Linear solve] 20 iters in 101.49 seconds 0 GFlop/s ||Ax-b||/||b||=9.19154e-05
o-[Linear solve] 16 iters in 89.07 seconds 0 GFlop/s ||Ax-b||/||b||=6.86911e-05
o-[Linear solve] 16 iters in 91.68 seconds 0 GFlop/s ||Ax-b||/||b||=9.49837e-05
o-[Linear solve] 16 iters in 94.34 seconds 0 GFlop/s ||Ax-b||/||b||=8.53771e-05
o-[Linear solve] 21 iters in 96.1 seconds 0 GFlop/s ||Ax-b||/||b||=7.50488e-05
o-[Linear solve] 21 iters in 102.63 seconds 0 GFlop/s ||Ax-b||/||b||=9.23609e-05
o-[Linear solve] 24 iters in 117.35 seconds 0 GFlop/s ||Ax-b||/||b||=9.86193e-05
o-[Linear solve] 23 iters in 97.54 seconds 0 GFlop/s ||Ax-b||/||b||=7.68229e-05
o-[Linear solve] 23 iters in 111.96 seconds 0 GFlop/s ||Ax-b||/||b||=7.55897e-05
o-[Linear solve] 26 iters in 106.37 seconds 0 GFlop/s ||Ax-b||/||b||=7.50017e-05
o-[Linear solve] 26 iters in 114.08 seconds 0 GFlop/s ||Ax-b||/||b||=8.27896e-05
o-[Linear solve] 29 iters in 115.44 seconds 0 GFlop/s ||Ax-b||/||b||=8.07129e-05
o-[Linear solve] 30 iters in 121.03 seconds 0 GFlop/s ||Ax-b||/||b||=8.9117e-05
o-[Linear solve] 29 iters in 118.67 seconds 0 GFlop/s ||Ax-b||/||b||=8.3026e-05
o-[Linear solve] 29 iters in 115.42 seconds 0 GFlop/s ||Ax-b||/||b||=7.79059e-05
o-[Linear solve] 34 iters in 120.77 seconds 0 GFlop/s ||Ax-b||/||b||=7.77524e-05
o-[Linear solve] 33 iters in 124.65 seconds 0 GFlop/s ||Ax-b||/||b||=8.34394e-05
o-[Linear solve] 37 iters in 136.87 seconds 0 GFlop/s ||Ax-b||/||b||=8.85991e-05
o-[Linear solve] 22 iters in 111.16 seconds 0 GFlop/s ||Ax-b||/||b||=8.51125e-05
o-[Linear solve] 27 iters in 103.59 seconds 0 GFlop/s ||Ax-b||/||b||=9.32069e-05
o-[Linear solve] 35 iters in 137.49 seconds 0 GFlop/s ||Ax-b||/||b||=9.44283e-05
o-[Linear solve] 84 iters in 231.12 seconds 0 GFlop/s ||Ax-b||/||b||=8.68865e-05
o-[Linear solve] 93 iters in 279.63 seconds 0 GFlop/s ||Ax-b||/||b||=7.53275e-05
o-[Linear solve] 38 iters in 143.59 seconds 0 GFlop/s ||Ax-b||/||b||=9.31847e-05
o-[Linear solve] 35 iters in 120.24 seconds 0 GFlop/s ||Ax-b||/||b||=9.78314e-05
o-[Linear solve] 73 iters in 208.3 seconds 0 GFlop/s ||Ax-b||/||b||=8.12692e-05
Running times with 16 threads (increases again)
o-[Linear solve] 22 iters in 155.64 seconds 0 GFlop/s ||Ax-b||/||b||=7.8568e-05
o-[Linear solve] 10 iters in 106.97 seconds 0 GFlop/s ||Ax-b||/||b||=7.25841e-05
o-[Linear solve] 13 iters in 113.38 seconds 0 GFlop/s ||Ax-b||/||b||=6.33273e-05
o-[Linear solve] 13 iters in 107.89 seconds 0 GFlop/s ||Ax-b||/||b||=9.99431e-05
o-[Linear solve] 16 iters in 126.62 seconds 0 GFlop/s ||Ax-b||/||b||=6.81104e-05
o-[Linear solve] 17 iters in 127.25 seconds 0 GFlop/s ||Ax-b||/||b||=6.06977e-05
o-[Linear solve] 18 iters in 129.72 seconds 0 GFlop/s ||Ax-b||/||b||=7.01995e-05
o-[Linear solve] 18 iters in 136.27 seconds 0 GFlop/s ||Ax-b||/||b||=8.95339e-05
o-[Linear solve] 19 iters in 139.25 seconds 0 GFlop/s ||Ax-b||/||b||=9.93522e-05
o-[Linear solve] 20 iters in 135.35 seconds 0 GFlop/s ||Ax-b||/||b||=9.19154e-05
o-[Linear solve] 16 iters in 125.27 seconds 0 GFlop/s ||Ax-b||/||b||=6.86911e-05
o-[Linear solve] 16 iters in 112.47 seconds 0 GFlop/s ||Ax-b||/||b||=9.49837e-05
o-[Linear solve] 16 iters in 118.59 seconds 0 GFlop/s ||Ax-b||/||b||=8.53771e-05
o-[Linear solve] 21 iters in 143.79 seconds 0 GFlop/s ||Ax-b||/||b||=7.50488e-05
o-[Linear solve] 21 iters in 145.26 seconds 0 GFlop/s ||Ax-b||/||b||=9.23609e-05
o-[Linear solve] 24 iters in 148.75 seconds 0 GFlop/s ||Ax-b||/||b||=9.86193e-05
o-[Linear solve] 23 iters in 157.27 seconds 0 GFlop/s ||Ax-b||/||b||=7.68229e-05
o-[Linear solve] 23 iters in 148.08 seconds 0 GFlop/s ||Ax-b||/||b||=7.55897e-05
o-[Linear solve] 26 iters in 165.84 seconds 0 GFlop/s ||Ax-b||/||b||=7.50017e-05
o-[Linear solve] 26 iters in 161.99 seconds 0 GFlop/s ||Ax-b||/||b||=8.27896e-05
o-[Linear solve] 29 iters in 174.06 seconds 0 GFlop/s ||Ax-b||/||b||=8.07129e-05
o-[Linear solve] 30 iters in 176.88 seconds 0 GFlop/s ||Ax-b||/||b||=8.9117e-05
o-[Linear solve] 29 iters in 173 seconds 0 GFlop/s ||Ax-b||/||b||=8.3026e-05
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels