Skip to content

Optimize and simplfy cob_decimal_set_mpf_core#282

Closed
dyoparis wants to merge 0 commit intoOCamlPro:gitside-gnucobol-3.xfrom
dyoparis:gitside-gnucobol-3.x
Closed

Optimize and simplfy cob_decimal_set_mpf_core#282
dyoparis wants to merge 0 commit intoOCamlPro:gitside-gnucobol-3.xfrom
dyoparis:gitside-gnucobol-3.x

Conversation

@dyoparis
Copy link
Copy Markdown

For integer input value avoid loop for multiply by 10

@GitMensch
Copy link
Copy Markdown
Collaborator

From first glance: looks good to me as mpf_integer_p is very cheap (even if will it only true when the mpf was assigned with mpf_set_ui and friends, which we don't do in numeric.c, where the data comes from cob_decimal_set_double [not sure if we have 0 fraction in that case] and via cob_decimal_set_mpf from intrinsic.c [where only cob_intr_e and cob_intr_log10 seem to possibly get into that code).

Can you please share your performance test in COBOL so we can easily compare the results before/after?
For most cases that should be slightly slower, but for the ones that may use the new code path it should be much faster.

@dyoparis
Copy link
Copy Markdown
Author

How do i test ?
My guess call for example factorial function with and without integer ?

@GitMensch
Copy link
Copy Markdown
Collaborator

We should only optimize when we know that it helps :-)
Yes, that test (calling it 10000 times) would be valid.

@dyoparis
Copy link
Copy Markdown
Author

Results are :

   IDENTIFICATION DIVISION.
   PROGRAM-ID . stuff .
  *>
   ENVIRONMENT DIVISION.
   DATA DIVISION .
   WORKING-STORAGE  SECTION.

   01 A               PIC S9(10)V999999 COMP-3     .
   01 B               FLOAT-DECIMAL-34      .

  *>
   PROCEDURE DIVISION .
  *>------------------
  *>
  *    KO means that the result in floatingg point is not meaningful
  *    ie the floating point result is 0
  *
  *

       perform do-it 10000 Times .
       goback .
  *>
   DO-IT.
  *>------------------
  *>
       move function EXP(10) TO A
       move function EXP(10) TO B
  *>
       move function EXP(10.234) TO A
       move function EXP(10.234) TO B .`

Before Patch

Performance counter stats for './stuff':

     18,132.18 msec task-clock                       #    1.000 CPUs utilized
           208      context-switches                 #   11.471 /sec
             0      cpu-migrations                   #    0.000 /sec
           264      page-faults                      #   14.560 /sec
27,196,528,945      cycles                           #    1.500 GHz
36,978,885,217      instructions                     #    1.36  insn per cycle                      
    56,112,259      branch-misses                    
11,685,200,186      L1-dcache-loads                  #  644.445 M/sec
     1,534,257      L1-dcache-load-misses            #    0.01% of all L1-dcache accesses

  18.137390899 seconds time elapsed

  18.133943000 seconds user
   0.000000000 seconds sys

After Patch

Performance counter stats for './stuff':

     18,142.78 msec task-clock                       #    1.000 CPUs utilized
           220      context-switches                 #   12.126 /sec
             5      cpu-migrations                   #    0.276 /sec
           263      page-faults                      #   14.496 /sec
27,212,323,350      cycles                           #    1.500 GHz
36,979,850,273      instructions                     #    1.36  insn per cycle
    55,902,956      branch-misses
11,687,964,183      L1-dcache-loads                  #  644.221 M/sec
    13,938,554      L1-dcache-load-misses            #    0.12% of all L1-dcache accesses


  18.146812620 seconds time elapsed

  18.140493000 seconds user
   0.004000000 seconds sys

@GitMensch
Copy link
Copy Markdown
Collaborator

Nice test program.

As you already checked with perf stat - do you checked also with perf record (or debug output) if the new code was executed?

The part why the difference - even if executed - would possibly be quite small is that the string version doesn't have any temporary allocations and is very small in case of integers.
While that looks more "artificial" as a test you could try to increase the integer used to the max without any trailing zero.

.. or that may only be used with log10? I've not tested the old/new codeflow, currently just wondering - and reminding myself that I had a bunch of interesting ideas that ... were totally mood when testing them, so I've dropped them.


Side note: a high performance gain can be reached with a different codegen in case of the target being a binary float item (COMP-1/COMP-2/FLOAT/DOUBLE/LONG-DOUBLE).

The case COMPUTE anything = anyfunc (= no calculation, no ROUNDED clause, no [NOT] ON SIZE ERROR) could be optimized to MOVE anyfunc TO anything

The case MOVE FUNCTION LOG/EXP/EXP10/... TO flt-item can be optimized to something like

cob_flt_log (double *(b_123456), cob_get_int (f_567)); which would only use plain log functions in libcob and no mpf_t or mpz_t at all

@dyoparis
Copy link
Copy Markdown
Author

dyoparis commented Mar 27, 2026

Hi Simon.
Yes the new code was executed but ony the non integer branch.

Here is a new test with SQRT in time the argument is a perfect square the result is an integer and the integer branch is executed.

       IDENTIFICATION DIVISION.
       PROGRAM-ID . stuff .
       ENVIRONMENT DIVISION.
       DATA DIVISION .
       WORKING-STORAGE  SECTION.
       01 B               FLOAT-DECIMAL-34      .
       01 C               BINARY-LONG  UNSIGNED .

      *>
       PROCEDURE DIVISION .
      *>------------------
      *>
      *>
           move 1 to C .
           perform do-it UNTIL C = 1000000 .
      *>
           goback .
      *>
       DO-IT.
      *>------------------
      *>
           move function SQRT(C) TO B
      *>
           add 1 to C .

Before Patch

 Performance counter stats for './stuff':

         16,465.71 msec task-clock                       #    1.000 CPUs utilized
               275      context-switches                 #   16.701 /sec
                 1      cpu-migrations                   #    0.061 /sec
               263      page-faults                      #   15.973 /sec
    24,696,175,668      cycles                           #    1.500 GHz
    34,725,884,648      instructions                     #    1.41  insn per cycle
           50,253,409      branch-misses
     9,265,970,737      L1-dcache-loads                  #  562.743 M/sec
        12,057,160      L1-dcache-load-misses            #    0.13% of all L1-dcache accesses
  
      16.471331887 seconds time elapsed

      16.463266000 seconds user
       0.003999000 seconds sys

After Patch

 Performance counter stats for './stuff':

         16,446.64 msec task-clock                       #    1.000 CPUs utilized
               481      context-switches                 #   29.246 /sec
                 4      cpu-migrations                   #    0.243 /sec
               261      page-faults                      #   15.869 /sec
    24,665,841,734      cycles                           #    1.500 GHz
    34,747,264,353      instructions                     #    1.41  insn per cycle
        45,002,709      branch-misses
     9,256,879,946      L1-dcache-loads                  #  562.843 M/sec
        22,813,303      L1-dcache-load-misses            #    0.25% of all L1-dcache accesses
 
      16.454028652 seconds time elapsed

      16.447815000 seconds user
       0.000000000 seconds sys

@GitMensch
Copy link
Copy Markdown
Collaborator

GitMensch commented Mar 27, 2026

OK, I think this is the point to agree to "good thought, but actually takes longer because the time used is not that often and when used is only a very minor shortcut", right?
In any case that shows you have increased performance check abilities now :-)

BTW: it was hard for me to read your last post, so I took the freedom to reformat it (pre-defined addition for perf output, COBOL code block) - you may want to have a look via "Edit" what I did there.

@dyoparis
Copy link
Copy Markdown
Author

dyoparis commented Mar 27, 2026

the last test with perfect squares integer only :

      IDENTIFICATION DIVISION.
       PROGRAM-ID . stuff .
      *>
       ENVIRONMENT DIVISION.
       DATA DIVISION .
       WORKING-STORAGE  SECTION.

       01 B               FLOAT-DECIMAL-34      .
       01 C               BINARY-LONG  UNSIGNED .
       01 D               BINARY-LONG  UNSIGNED .

      *>
       PROCEDURE DIVISION .
      *>------------------
      *>
      *>
           move 2 to C .
           perform do-it 400000 times     .
      *>
           goback .
      *>
       DO-IT.
      *>------------------
      *>
           compute D = C * C .
           move function SQRT(D) TO B
      *>
           add 1 to C .

Before Patch

Performance counter stats for './test_2':

      6,226.35 msec task-clock                       #    0.999 CPUs utilized
            84      context-switches                 #   13.491 /sec
             1      cpu-migrations                   #    0.161 /sec
           261      page-faults                      #   41.919 /sec
 9,268,127,202      cycles                           #    1.489 GHz
12,910,263,934      instructions                     #    1.39  insn per cycle
    17,093,392      branch-misses
 3,515,514,985      L1-dcache-loads                  #  564.619 M/sec
     8,458,299      L1-dcache-load-misses            #    0.24% of all L1-dcache accesses

   6.229694797 seconds time elapsed

   6.228214000 seconds user
   0.000000000 seconds sys

After Patch

Performance counter stats for './test_2':

      6,170.46 msec task-clock                       #    0.999 CPUs utilized
            91      context-switches                 #   14.748 /sec
             1      cpu-migrations                   #    0.162 /sec
           263      page-faults                      #   42.622 /sec
 9,254,803,563      cycles                           #    1.500 GHz
12,715,027,692      instructions                     #    1.37  insn per cycle
    17,339,018      branch-misses
 3,468,332,438      L1-dcache-loads                  #  562.087 M/sec
     6,548,557      L1-dcache-load-misses            #    0.19% of all L1-dcache accesses

   6.173998809 seconds time elapsed

   6.164363000 seconds user
   0.008000000 seconds sys

We get a slight time saving for integers and no regression for other types.

Personally, I find the code clearer. It's up to you to accept the patch or not :-)

NB: it's ok for the reformating :-)

@dyoparis dyoparis closed this Apr 4, 2026
@dyoparis dyoparis force-pushed the gitside-gnucobol-3.x branch from 249587f to 326ce55 Compare April 4, 2026 10:35
@GitMensch
Copy link
Copy Markdown
Collaborator

No need to push away anythng - just close next time :-)
This also shows in the future that (and what) we inspected here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants