Skip to content

Support multiple workers for NODEFS /wordpress mounts – Asyncify #2317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jul 8, 2025

Conversation

brandonpayton
Copy link
Member

@brandonpayton brandonpayton commented Jul 2, 2025

Overview

Adds multi-worker support for Node.js Asyncify builds to complement JSPI support and enable usage in Node < v23. Note this doesn't work with Asyncify in web browsers.

This PR also adds fileLockManager support for @php-wasm/cli

Implementation

This PR is different from other Asyncify PRs in that it doesn't actually make things work with Asyncify. Instead, it switches to synchronous message passing when JSPI is unavailable. This way, the Asyncify builds never have to engage in stack switching around fd_close() or fcntl(). This wasn't the first choice, but getting the Asyncify builds right was just too challenging, so we had to use another approach.

Synchronous message passing via Comlink

Important

This PR forks the Comlink library to add afterResponseSent?: (ev: MessageEvent) => void argument to the expose() function. The rest is unchanged. Comlink isn't getting many new PRs so skipping updates (or backporting occasionally) seems fine.

Playground already uses Comlink for RPC between workers. This PR adds synchronous bindings via exposeSync and wrapSync.

The specific technique of exchanging messages is described in https://github.com/adamziel/js-synchronous-messaging:

  • Worker A calls postMessage() on Worker B's MessagePort to start the RPC exchange.
  • Worker A uses Atomics.wait (with a SharedArrayBuffer) to synchronously wait for a notification from the Worker B.
  • Worker A uses receiveMessageOnPort to synchronously read the data sent by Worker B.

For usage example, see comlink-sync.spec.ts.

The upsides of this approach:

  • Saves dozens-to-hundreds of hours on debugging Asyncify issues
  • Increased reliability
  • Provides useful stack traces when errors do happen.

The downsides:

  • Fragmentation: Both synchronous and asynchronous handlers exist to get the best our of both Asyncify and JSPI.
  • Node.js-only: This extension does not implement a Safari-friendly transport. SharedArrayBuffer is an option, but
    it requires more restrictive CORP+COEP headers which breaks, e.g., YouTube embeds. Synchronous XHR
    might work if we really need Safari support for one of the new asynchronous features, but other than
    that let's just skip adding new asynchronous WASM features to Safari until WebKit supports stack switching.
  • Message passing between workers is slow. Avoid using synchronous messaging for syscalls that are invoked frequently and handled asynchronously in the same worker.

Dual channel support

The Emscripten-built php.js requires either:

  • A synchronous or asynchronous fileLockManager – when JSPI is available
  • A synchronous fileLockManager – when JSPI is not available

This is implemented with preprocessor directives, e.g.

      #if ASYNCIFY == 2
          return Asyncify.handleAsync(async () => {
      #endif
          // ..code..
      #if ASYNCIFY == 2
          });
      #endif

Why support both methods and not always use synchronous calls? Because web browsers have no receiveMessageOnPort and can only handle asynchronous message passing. Supporting both sync and async message channels provides maximum compatibility. The only environment where fileLockManager is not supported is Safari.

Testing Instructions (or ideally a Blueprint)

Playground CLI

Run Playground CLI server with 5 workers using Asyncify:

node --disable-warning=ExperimentalWarning --experimental-strip-types --experimental-transform-types --import ./packages/meta/src/node-es-module-loader/register.mts ./packages/playground/cli/src/cli.ts server --experimental-multi-worker=5 --mount-before-install ./tmp/new-site:/wordpress

Create some posts, install some plugins, confirm it does not crash.

Then do the same with JSPI and confirm everything continues to work:

node --experimental-wasm-jspi --disable-warning=ExperimentalWarning --experimental-strip-types --experimental-transform-types --import ./packages/meta/src/node-es-module-loader/register.mts ./packages/playground/cli/src/cli.ts server --experimental-multi-worker=5 --mount-before-install ./tmp/new-site:/wordpress

PHP.wasm CLI

Run the test script below and confirm it does not crash or corrupt the database:

node --loader=./packages/meta/src/node-es-module
-loader/loader.mts ./packages/php-wasm/cli/src/main.ts test.php

test.php

<?php

$db = new SQLite3('./db.sqlite');

$db->exec('CREATE TABLE IF NOT EXISTS products (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    price REAL NOT NULL,
    category TEXT NOT NULL
)');

// Insert some product data
$db->exec("INSERT INTO products (name, price, category) VALUES ('Laptop', 999.99, 'Electronics')");
$db->exec("INSERT INTO products (name, price, category) VALUES ('Coffee Mug', 12.50, 'Kitchen')");
$db->exec("INSERT INTO products (name, price, category) VALUES ('Notebook', 5.99, 'Office')");
$db->exec("INSERT INTO products (name, price, category) VALUES ('Headphones', 79.99, 'Electronics')");

$result = $db->query('SELECT * FROM products ORDER BY category, name');
while ($row = $result->fetchArray(SQLITE3_ASSOC)) {
    echo "- {$row['name']} ({$row['category']}): $" . number_format($row['price'], 2) . "\n";
}

$db->close();

@brandonpayton brandonpayton requested a review from adamziel July 2, 2025 21:22
@brandonpayton brandonpayton self-assigned this Jul 2, 2025
@brandonpayton
Copy link
Member Author

I am encountering a null-or-signature-mismatch error here:
Screenshot 2025-07-02 at 7 00 17 PM

The line is this, the zend_rc_dtor_func call in this function:

ZEND_API void ZEND_FASTCALL rc_dtor_func(zend_refcounted *p)
{
	ZEND_ASSERT(GC_TYPE(p) <= IS_CONSTANT_AST);
	zend_rc_dtor_func[GC_TYPE(p)](p);
}

I was able to find the GC_TYPE by expanding the refcounted pointer struct while debugging in Cursor. The GH type num was 7, which is an offset in this array:

static const zend_rc_dtor_func_t zend_rc_dtor_func[] = {
	[IS_UNDEF] =        (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_NULL] =         (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_FALSE] =        (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_TRUE] =         (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_LONG] =         (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_DOUBLE] =       (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_STRING] =       (zend_rc_dtor_func_t)zend_string_destroy,
	[IS_ARRAY] =        (zend_rc_dtor_func_t)zend_array_destroy,
	[IS_OBJECT] =       (zend_rc_dtor_func_t)zend_objects_store_del,
	[IS_RESOURCE] =     (zend_rc_dtor_func_t)zend_list_free,
	[IS_REFERENCE] =    (zend_rc_dtor_func_t)zend_reference_destroy,
	[IS_CONSTANT_AST] = (zend_rc_dtor_func_t)zend_ast_ref_destroy
};

If the GC type is indeed 7, then it looks like the dtor is zend_array_destroy(). Its signature doesn't look that different from the other dtor signatures I've examined, but I can instrument that function and its caller above to see if we can learn more.

@adamziel
Copy link
Collaborator

adamziel commented Jul 3, 2025

Any chance it would work on trunk with @mho22 wrapper? Or was that a completely different code path?

@brandonpayton
Copy link
Member Author

Any chance it would work on trunk with @mho22 wrapper? Or was that a completely different code path?

It might! I think it is a different path, but it feels like it could be a similar thing. Will try shortly. :)

@brandonpayton
Copy link
Member Author

After merging trunk into this branch, I'm still seeing the same issue. Will instrument php-src and see what can be found.

@mho22
Copy link
Collaborator

mho22 commented Jul 3, 2025

@adamziel @brandonpayton For the record, my wrapper is only targeting PHP7.4 and PHP7.3 to replace the old EMULATE_FUNCTION_POINTER_CASTS option applied to PHP versions lower than 8 :

RUN if [ "${PHP_VERSION:0:1}" -lt "8" ]; then \
	echo -n ' -s EMULATE_FUNCTION_POINTER_CASTS=1' >> /root/.emcc-php-wasm-flags; \
fi

It may be the same issue but Brandon is right, that's a new one. And the zend_array_destroy function seems to have a correct signature :

ZEND_API void ZEND_FASTCALL zend_array_destroy(HashTable *ht)

But maybe not, I think HashTable *ht should be replaced with zend_array to fit with the zend_refcounted *p from :

typedef void (ZEND_FASTCALL *zend_rc_dtor_func_t)(zend_refcounted *p);

So maybe you'll need to apply a new patch [ to PHP8.4 and PHP8.3 at least ] like this :

+ static void zend_array_destroy_wrapper(zend_refcounted *p) {
+    HashTable *ht = (HashTable*)p;
+    zend_array_destroy(ht);
+}

static const zend_rc_dtor_func_t zend_rc_dtor_func[] = {
	[IS_UNDEF] =        (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_NULL] =         (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_FALSE] =        (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_TRUE] =         (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_LONG] =         (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_DOUBLE] =       (zend_rc_dtor_func_t)zend_empty_destroy,
	[IS_STRING] =       (zend_rc_dtor_func_t)zend_string_destroy,
-	[IS_ARRAY] =        (zend_rc_dtor_func_t)zend_array_destroy,
+	[IS_ARRAY] =        zend_array_destroy_wrapper
	[IS_OBJECT] =       (zend_rc_dtor_func_t)zend_objects_store_del,
	[IS_RESOURCE] =     (zend_rc_dtor_func_t)zend_list_free,
	[IS_REFERENCE] =    (zend_rc_dtor_func_t)zend_reference_destroy,
	[IS_CONSTANT_AST] = (zend_rc_dtor_func_t)zend_ast_ref_destroy
};

I hope this helps.

@brandonpayton
Copy link
Member Author

But maybe not, I think HashTable *ht should be replaced with zend_array to fit with the zend_refcounted *p from :

Thank you, @mho22. 🙇 This discussion is helpful. Hopefully, we are close to where the problem is.

I took a look at the HashTable and zend_array types, and they are both type aliases for the same struct like:

typedef struct _zend_array HashTable;
typedef struct _zend_array      zend_array;

My mental model suggests those types are equivalent and would not conflict, but it would be good to try a wrapper to confirm. Also, maybe there is some other type that is marked with IS_ARRAY that is not a HashTable or zend_array... We'll see!

@brandonpayton
Copy link
Member Author

Note: I am still getting the issue with the wrapper, but debugging has gotten a little easier:
The issue occurs immediately after the first attempt to lock the sqlite DB, so we can just set a breakpoint on that first attempt and then enable a PHP rc_dtor_func breakpoint (we can't enable it before or we will hit it frequently).

I headed to AFK soon but plan to spend more time on this in the morning.

@brandonpayton
Copy link
Member Author

I added a lot more logging in php-src to try to track this but then started having "null or signature mismatch" errors with the logging. 🤦 But I'm learning some things and continuing to dig.

I wanted to step into the /emsdk/emscripten/ libc functions that are encountering the error and found that those debug source paths are set as part of another layer of the build process that builds cached versions of libc (among other libs). So I'm looking into how to affect the underlying libc build process.

@adamziel
Copy link
Collaborator

adamziel commented Jul 4, 2025

Does it also fail in the same way on PHP 7.3 and 7.4? If not, maybe we can learn something from the differences in source?

@adamziel
Copy link
Collaborator

adamziel commented Jul 4, 2025

I wanted to step into the /emsdk/emscripten/ libc functions that are encountering the error

Oh, interesting! I thought the error happened at the moment of a mismatched call – am I understanding it wrong?

@adamziel
Copy link
Collaborator

adamziel commented Jul 5, 2025

Would safe_heap_log or assertions=2 reveal anything else? Also, would instrumenting the pointer casts internals reveal anything about the wrappers generated for these destructors?

https://emscripten.org/docs/tools_reference/settings_reference.html?utm_source=perplexity

@adamziel
Copy link
Collaborator

adamziel commented Jul 5, 2025

Also -sEMSCRIPTEN_TRACING, -sSTRICT, -sSIGNATURE_CONVERSIONS

@brandonpayton
Copy link
Member Author

Thanks, @adamziel! Those all sound like good suggestions, and I plan to try them next.

Oh, interesting! I thought the error happened at the moment of a mismatched call – am I understanding it wrong?

I don't know for sure, but your thinking matches my mental model. I think the error may have started occurring in an additional location when I added code that logged to a file. (I tried to use the wasm_trace function, but it was logging an empty message... not sure why yet)

@brandonpayton
Copy link
Member Author

brandonpayton commented Jul 5, 2025

As a note regarding our pursuit of more/easier debugging:
The mystery of /emsdk/emscripten source paths in the debug info for libc, the setting appears to be here in the Emscripten build code:

  flags += [f'-ffile-prefix-map={source_dir}={DETERMINISTIC_PREFIX}',
            f'-ffile-prefix-map={relative_source_dir}={DETERMINISTIC_PREFIX}',
            f'-fdebug-compilation-dir={DETERMINISTIC_PREFIX}']

where DETERMINISTIC_PREFIX is a constant set to "/emsdk/emscripten". There is not currently a way to override this, but perhaps the Emscripten team would consider allowing an override with an environment var or other setting.

We could also explore post-processing DWARF info in wasm, but it would likely involve writing code to do that. So far, I have not found good tools for it.

Complete source debugging is not required to solve this problem, but it would be amazing to be able to step into any function that is failing.

@brandonpayton
Copy link
Member Author

We could also explore post-processing DWARF info in wasm, but it would likely involve writing code to do that. So far, I have not found good tools for it.

There is this Rust lib:
https://github.com/gimli-rs/gimli

Some libs like this seem to expect elf binaries but this one says:

Cross-platform: gimli makes no assumptions about what kind of object file you're working with. The flipside to that is that it's up to you to provide an ELF loader on Linux or Mach-O loader on macOS.

So it should be able to just process the DWARF debug info if it can be read from custom WebAssembly sections. Note how the debug info is placed in custom sections in the a small .wasm lib I build with Emscripten:

 % wasm-objdump -h test.wasm 

test.wasm:	file format wasm 0x1
module name: <test.wasm>

Sections:

     Type start=0x0000000b end=0x000000c4 (size=0x000000b9) count: 28
   Import start=0x000000c7 end=0x000001c5 (size=0x000000fe) count: 9
 Function start=0x000001c7 end=0x00000227 (size=0x00000060) count: 95
    Table start=0x00000229 end=0x0000022e (size=0x00000005) count: 1
   Memory start=0x00000230 end=0x00000238 (size=0x00000008) count: 1
   Global start=0x0000023a end=0x00000251 (size=0x00000017) count: 4
   Export start=0x00000254 end=0x000003d3 (size=0x0000017f) count: 21
     Elem start=0x000003d5 end=0x000003e2 (size=0x0000000d) count: 1
DataCount start=0x000003e4 end=0x000003e5 (size=0x00000001) count: 2
     Code start=0x000003e9 end=0x000066f1 (size=0x00006308) count: 95
     Data start=0x000066f4 end=0x0000740d (size=0x00000d19) count: 2
   Custom start=0x00007410 end=0x00007a2a (size=0x0000061a) "name"
   Custom start=0x00007a2d end=0x0000aaec (size=0x000030bf) ".debug_abbrev"
   Custom start=0x0000aaf0 end=0x000182ec (size=0x0000d7fc) ".debug_info"
   Custom start=0x000182ef end=0x0001b74c (size=0x0000345d) ".debug_str"
   Custom start=0x0001b750 end=0x00024703 (size=0x00008fb3) ".debug_line"
   Custom start=0x00024707 end=0x0002aaf2 (size=0x000063eb) ".debug_loc"
   Custom start=0x0002aaf5 end=0x0002b7fb (size=0x00000d06) ".debug_ranges"
   Custom start=0x0002b7fe end=0x0002b89d (size=0x0000009f) ".debug_aranges"
   Custom start=0x0002b8a0 end=0x0002b934 (size=0x00000094) "target_features"

@adamziel
Copy link
Collaborator

adamziel commented Jul 5, 2025

I wonder if our override of proc_open.c and proc_open.h causes the bulk of these issues. We use these two, 7.4-specific, files for every PHP version from 7.4 to 8.4 and they don't account for any changes in PHP core 🤔 At the same time, for this specific issue PHP 7.4 still fails with the same error as 8.0 and it's not a problem with JSPI.

…de-es-module-loader/loader.mts ./packages/php-wasm/cli/src/main.ts test.php`
@adamziel
Copy link
Collaborator

adamziel commented Jul 5, 2025

I found two more functions that were at the callstack during the crash but not on the ayncify list (for PHP 7.4):

+"zend_unclean_zval_ptr_dtor",\
+"php_sqlite3_object_free_storage",\

Adding them to the asyncify list did not resolve the problem.

@adamziel adamziel force-pushed the add-multi-worker-asyncify-support branch from de87c6f to 0e5387b Compare July 8, 2025 08:39
@adamziel adamziel marked this pull request as ready for review July 8, 2025 09:34
@adamziel adamziel requested a review from a team as a code owner July 8, 2025 09:34
@adamziel adamziel changed the title Expand php-wasm/node multi-worker support to Asyncify builds Support multiple workers for NODEFS /wordpress mounts – Asyncify Jul 8, 2025
@adamziel adamziel force-pushed the add-multi-worker-asyncify-support branch from 79ad795 to ea4cb3c Compare July 8, 2025 14:15
@adamziel
Copy link
Collaborator

adamziel commented Jul 8, 2025

The tests are all passing, let's do it!

@adamziel adamziel merged commit a67cb5b into trunk Jul 8, 2025
25 checks passed
@adamziel adamziel deleted the add-multi-worker-asyncify-support branch July 8, 2025 16:21
adamziel added a commit that referenced this pull request Jul 14, 2025
…nc.ts (#2363)

## Motivation for the change, related issues

#2317 introduced a
dynamic import in a CJS code path (`comlink-sync.ts`). Node 20 requires
a `--experimental-vm-modules` flag to run dynamic imports in such a
context. This PR tries calling `require()` first to make it all work
again.

## Testing Instructions (or ideally a Blueprint)

Confirm the `test-built-npm-packages` test passed.
@brandonpayton
Copy link
Member Author

Nice job, @adamziel! Using receiveMessageOnPort in Node.js is so handy. You can easily post anything that is supported by structuredClone(), and that likely has fewer limitations than writing the result to a SharedArrayBuffer like I do here:
https://github.com/brandonpayton/blocking-rpc/blob/179fe283774e5a4599cc2c49935e4d946eb7e449/index.ts#L552-L553

@brandonpayton
Copy link
Member Author

Also, thank you for all your help and for closing out this PR! 🙇

@brandonpayton
Copy link
Member Author

Perhaps we could support Safari with one of these:

  • localStorage / indexedDb + Atomics.pause()

This is a cool idea. Atomics.pause() offers an interesting possibility for compromise: Less burdensome busy waiting.

  • Synchronous XMLHttpRequest request routed via the service worker.

@adamziel, contrary to what I said our offline discussion earlier today, it looks like this could be workable for us.

If I understand the following tests correctly, Service Worker...

Since php-wasm/web is run in worker threads, it seems like we could use sync xhr for blocking if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants