|
| 1 | +--- |
| 2 | +title: They do it with mirrors, you know - that sort of thing |
| 3 | +description: "While comfort-watching the indomitable Joan Hickson as Agatha Christie\u2019s |
| 4 | + Miss Marple in The Body in the Library, it occurred to me that Miss Marple would |
| 5 | + have been a formidable debugger. Since returning from holiday one, two, three weeks |
| 6 | + ago, I\u2019ve been mostly straightening out and finalising the final Relocatable |
| 7 | + OCaml PR. A frustrating task, because I know these things will take weeks and have |
| 8 | + little to show for at the end, so one spends the entire time feeling it should be |
| 9 | + finished by now. It\u2019s just about there, when this little testsuite failure |
| 10 | + popped up:" |
| 11 | +url: https://www.dra27.uk/blog/platform/2025/06/22/they-do-it-with-mirrors.html |
| 12 | +date: 2025-06-22T00:00:00-00:00 |
| 13 | +preview_image: |
| 14 | +authors: |
| 15 | +- "" |
| 16 | +source: |
| 17 | +ignore: |
| 18 | +--- |
| 19 | + |
| 20 | +<p>While comfort-watching the indomitable <a href="https://en.wikipedia.org/wiki/Joan_Hickson">Joan Hickson</a> |
| 21 | +as Agatha Christie’s <a href="https://en.wikipedia.org/wiki/Miss_Marple_(TV_series)">Miss Marple</a> |
| 22 | +in <a href="https://en.wikipedia.org/wiki/The_Body_in_the_Library_(film)">The Body in the Library</a>, |
| 23 | +it occurred to me that Miss Marple would have been a formidable debugger. Since |
| 24 | +returning from holiday <del>one</del>, <del>two</del>, three weeks ago, I’ve been mostly |
| 25 | +straightening out and finalising the final Relocatable OCaml PR. A frustrating |
| 26 | +task, because I know these things will take weeks and have little to show for at |
| 27 | +the end, so one spends the entire time feeling it should be finished by now. |
| 28 | +It’s just about there, when this little testsuite failure popped up:</p> |
| 29 | + |
| 30 | +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>List of failed tests: |
| 31 | + tests/lib-unix/common/cloexec.ml |
| 32 | + tests/warnings/mnemonics.mll |
| 33 | +</code></pre></div></div> |
| 34 | + |
| 35 | +<p>In both cases there was a similar, very strange-looking error:</p> |
| 36 | + |
| 37 | +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the file '/home/runner/work/ocaml/ocaml/testsuite/tests/lib-unix/_ocamltest/tests/lib-unix/common/cloexec/ocamlc.byte/cloexec_leap.exe' is not a bytecode executable file |
| 38 | +</code></pre></div></div> |
| 39 | + |
| 40 | +<p>and</p> |
| 41 | + |
| 42 | +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>the file '/home/runner/work/ocaml/ocaml/testsuite/tests/warnings/_ocamltest/tests/warnings/mnemonics/ocamlc.byte/mnemonics.byte' is not a bytecode executable file |
| 43 | +Fatal error: exception File "mnemonics.mll", line 55, characters 2-8: Assertion failed |
| 44 | +</code></pre></div></div> |
| 45 | + |
| 46 | +<p>Now, as it happens, the diagnosis of <em>what</em> was happening was relatively quick |
| 47 | +for me. I’ve dusted off and thrown around so many obscure bits of the runtime |
| 48 | +system on so many diverse configurations and platforms with Relocatable OCaml |
| 49 | +that it’s resulted in a lot of other bugs being fixed <em>before</em> the main PRs, |
| 50 | +some bugs fixed <em>with</em> the main PRs and then a pile of follow-up work with the |
| 51 | +additional parts. There’s one particularly long-standing bug on Windows:</p> |
| 52 | + |
| 53 | +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C:\Users\DRA>where ocamlc.byte |
| 54 | +C:\Users\DRA\AppData\Local\opam\default\bin\ocamlc.byte.exe |
| 55 | + |
| 56 | +C:\Users\DRA>where ocamlc.byte.exe |
| 57 | +C:\Users\DRA\AppData\Local\opam\default\bin\ocamlc.byte.exe |
| 58 | + |
| 59 | +C:\Users\DRA>ocamlc.byte.exe --version |
| 60 | +5.2.0 |
| 61 | + |
| 62 | +C:\Users\DRA>ocamlc.byte --version |
| 63 | +unknown option --version |
| 64 | +</code></pre></div></div> |
| 65 | + |
| 66 | +<p>Strange, huh: <code class="language-plaintext highlighter-rouge">ocamlc.byte.exe</code> does one thing and <code class="language-plaintext highlighter-rouge">ocamlc.byte</code> does another! |
| 67 | +The precise diagnosis of what’s going on there is nearly a novel in itself. The |
| 68 | +fix is quite involved, and is at the “might get put into PR 3; might be left for |
| 69 | +the future” stage. The failures across CI were just the Unix builds which use |
| 70 | +the stub launcher for bytecode (it’s an obscure corner of startup which lives in |
| 71 | +<a href="https://github.com/ocaml/ocaml/tree/trunk/stdlib/header.c"><code class="language-plaintext highlighter-rouge">stdlib/header.c</code></a> |
| 72 | +and which has received a pre-Relocatable overhaul in <a href="https://github.com/ocaml/ocaml/pull/13988">ocaml/ocaml#13988</a>). |
| 73 | +There are so many bits to Relocatable OCaml that I have a master script that |
| 74 | +puts them all together and then backports them: the CI failure was only on the |
| 75 | +“trunk” version of this, the 5.4, 5.3 and 5.2 versions passing as normal. The |
| 76 | +backports don’t include the “future” work, so that quickly pointed me at the |
| 77 | +work sitting in <a href="https://github.com/dra27/ocaml/pull/190/commits">dra27/ocaml#190</a>.</p> |
| 78 | + |
| 79 | +<p>Both those failures are from tests which themselves spawn executables as part of |
| 80 | +the test. What was particularly strange was mnemonics because that doesn’t call |
| 81 | +itself, rather it calls the compiler:</p> |
| 82 | + |
| 83 | +<div class="language-ocaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">let</span> <span class="n">mnemonics</span> <span class="o">=</span> |
| 84 | + <span class="k">let</span> <span class="n">stdout</span> <span class="o">=</span> <span class="s2">"warn-help.out"</span> <span class="k">in</span> |
| 85 | + <span class="k">let</span> <span class="n">n</span> <span class="o">=</span> |
| 86 | + <span class="nn">Sys</span><span class="p">.</span><span class="n">command</span> |
| 87 | + <span class="nn">Filename</span><span class="p">.(</span><span class="n">quote_command</span> <span class="o">~</span><span class="n">stdout</span> |
| 88 | + <span class="n">ocamlrun</span> <span class="p">[</span><span class="n">concat</span> <span class="n">ocamlsrcdir</span> <span class="s2">"ocamlc"</span><span class="p">;</span> <span class="s2">"-warn-help"</span><span class="p">])</span> |
| 89 | + <span class="k">in</span> |
| 90 | + <span class="k">assert</span> <span class="p">(</span><span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">);</span> |
| 91 | +</code></pre></div></div> |
| 92 | + |
| 93 | +<p>That’s invoking the <code class="language-plaintext highlighter-rouge">ocamlc</code> bytecode binary from the root of the build tree |
| 94 | +passing it as an argument directly to <code class="language-plaintext highlighter-rouge">runtime/ocamlrun</code> in the root of the |
| 95 | +build tree. The fact that ocamlrun is then displaying a message referring to |
| 96 | +<code class="language-plaintext highlighter-rouge">mnemonics.byte</code> is very strange, but was down to a bug in my fix for this other |
| 97 | +issue. The core of the bug-fix is that the stub launcher, having opened the |
| 98 | +bytecode image to find its <code class="language-plaintext highlighter-rouge">RNTM</code> section so it can search for the runtime to |
| 99 | +call now leaves the file descriptor open and hands its number over to <code class="language-plaintext highlighter-rouge">ocamlrun</code> |
| 100 | +as part of the <code class="language-plaintext highlighter-rouge">exec</code> call (works on Windows as well). The problem was the |
| 101 | +cleanup from this in <code class="language-plaintext highlighter-rouge">ocamlrun</code> itself, where that environment is reset having |
| 102 | +been consumed:</p> |
| 103 | + |
| 104 | +<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if defined(_WIN32) |
| 105 | +</span> <span class="n">_wputenv</span><span class="p">(</span><span class="s">L"__OCAML_EXEC_FD="</span><span class="p">);</span> |
| 106 | +<span class="cp">#elif defined(HAS_SETENV_UNSETENV) |
| 107 | +</span> <span class="n">unsetenv</span><span class="p">(</span><span class="s">"__OCAML_EXEC_FD="</span><span class="p">);</span> |
| 108 | +<span class="cp">#endif |
| 109 | +</span></code></pre></div></div> |
| 110 | + |
| 111 | +<p>There’s a stray <code class="language-plaintext highlighter-rouge">=</code> at the end of the Unix branch there 🫣 Right, problem solved |
| 112 | +and, were I Inspector Slack, I should have zipped straight round to Basil |
| 113 | +Blake’s gaudy cottage, handcuffs at the ready.</p> |
| 114 | + |
| 115 | +<p>But what about the second murder? Which, in this case, is why the heck hadn’t |
| 116 | +this been seen before? That’s the kind of thing that terrifies me with a fix |
| 117 | +like this: the bug is obvious, but was something else being masked and, more to |
| 118 | +the point, have I just changed something which introduced a <em>different</em> bug |
| 119 | +which happened to cause this one to be visible. At this point, I made a note, |
| 120 | +closed my laptop, and returned to my knitting (no, wait, that was Miss Marple). |
| 121 | +Then the penny dropped: the compiler’s being configured here with |
| 122 | +<code class="language-plaintext highlighter-rouge">--with-target-sh=exe</code> (on Unix, that means that bytecode executables |
| 123 | +intentionally avoid shebang-style scripts and use the stub), which should mean |
| 124 | +that those two tests are compiled using the stub. Except that because we test |
| 125 | +the compiler in the build tree, previously the compiler picks up |
| 126 | +<code class="language-plaintext highlighter-rouge">stdlib/runtime-launch-info</code> which is the <em>build</em> version of that header, not |
| 127 | +the <em>target</em> version. However, one of the refactorings I’ve done in <a href="https://github.com/dra27/ocaml/pull/189/commits/c60e4aafcf97bde037445e4cd94a9e659caf072a">c60e4aaf</a> |
| 128 | +stops using <code class="language-plaintext highlighter-rouge">runtime-launch-info</code> this way (I introduced that header in <a href="https://github.com/ocaml/ocaml/pull/12751">ocaml/ocaml#12751</a> |
| 129 | +as part of OCaml 5.2.0). A side-effect of that change is that |
| 130 | +<code class="language-plaintext highlighter-rouge">stdlib/runtime-launch-info</code> is actually the target version of the header, and |
| 131 | +the <em>root</em> bytecode compiler is <em>now</em> behaving as we’d always been expecting it |
| 132 | +to that test, using target configuration defined in <code class="language-plaintext highlighter-rouge">utils/config.ml</code>… and so |
| 133 | +only now revealing this latent bug in my fix.</p> |
| 134 | + |
| 135 | +<p><em>“They do it with mirrors, you know-that sort of thing-if you understand me.” |
| 136 | +Inspector Curry did not understand. He stared and wondered if Miss Marple was |
| 137 | +quite right in the head.</em></p> |
0 commit comments