Skip to content

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Jul 21, 2025

Pull Request Description

  • Use Dual NI + JVM Mode for Loading Libraries #13172 to
    • avoid including test classes in enso native image binary
    • ENSO_LAUNCHER=test will only add -ea option
    • ENSO_LAUNCHER=test will not modify classpath
  • Let's build up on Emit a warning on non-AOT ready libraries #12468
    • switch the non-AOT ready libraries into "guest JVM" loading mode
    • e.g. all test/Base_Test/polyglot/java/*.jar are going to be loaded by in this "dual JVM mode"
  • originally there were higher ambitions for this PR, but just ...
  • getting "dual JVM" working for test/Base_Tests is a good progress to make
  • to mock "dual JVM" mode in HotSpot JVM specify which libraries should use the "host JVM" and which the "guest JVM". For example:
sbt:enso> runEngineDistribution 
  --vm.D=polyglot.enso.classLoading=Standard.Base:hosted,guest 
  --run test/Base_Tests/
  • says that Standard.Base should use hosted JVM and all other libraries should use the guest JVM

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • All code follows the
    Scala,
    Java,
  • Unit tests have been written where possible.
  • Verify extra nightly (e.g. snowflake) tests still run

@jdunkerley
Copy link
Member

Don't think we can merge this until we have a test which runs in JVM mode as well (at least until the new mode is default).

@JaroslavTulach
Copy link
Member Author

The removal of jvm: true flag is causing CI failures:

All of the tests failing in Generic_JDBC_Tests. It's great to see our support for jvm: true mode is tested. Now it is time to find out what it would take to load the classes in this test by the dual JVM mode.

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Aug 20, 2025

Boundary between Standard.Database and Standard.Generic_JDBC

@GregoryTravis, we have a problem! I can now modify:

diff --git test/Generic_JDBC_Tests/package.yaml test/Generic_JDBC_Tests/package.yaml
index 488977689b..4a55b41bc6 100644
--- test/Generic_JDBC_Tests/package.yaml
+++ test/Generic_JDBC_Tests/package.yaml
@@ -5,4 +5,3 @@ license: MIT
 author: [email protected]
 maintainer: [email protected]
 prefer-local-libraries: true
-jvm: true

and it correctly uses dual JVM to load classes. The only problem is: the Generic_JDBC_Tests package is loading no classes! It contains a JAR with classes, but those classes are loaded by Standard.Database.

If I put Standard.Database into one JVM and Generic_JDBC_Tests into another JVM - they do not see each other classes!

Putting Standard.Database and Standard.Generic_JDBC into the "other JVM"

Following works OK:

enso$ ENSO_LAUNCHER=native,fast,-ls sbt  buildEngineDistribution
enso$ ./built-distribution/enso-engine-*/enso-*/bin/enso --run test/Generic_JDBC_Tests
WARNING: Package Standard.Database forced to guest classloading. Use --jvm when encountering problems.
SEVERE: Using experimental OtherJvm support!
WARNING: Package Standard.Table forced to guest classloading. Use --jvm when encountering problems.
                                                                      
22 tests succeeded.
0 tests failed.
0 tests skipped.
0 groups skipped.

even when there is no jvm: true in the package.yaml.

It works because of the fast switch:

  • only Standard.Base is AOT compiled
  • all the other involved libraries - Standard.Table, Standard.Database and generic JDBC with tests
  • are loaded by the "other JVM" - e.g. HotSpot JVM
  • and thus they see each other classes.

Summary

We have a standard library problem. What can we do about it?

@GregoryTravis
Copy link
Contributor

Summary

We have a standard library problem. What can we do about it?

To use the Generic JDBC driver, Standard.Database has to run in JVM mode, because it is what runs the driver. Having only the test in JVM mode isn't enough. Can this test be configured to run this way?

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Aug 20, 2025

... Can this test be configured ...?

  • Alas, it is not about this test, but about how do we want to run our standard libraries?
    • Should they run all in native image mode (current state with bloated enso binary)?
    • Should some of them run in native image mode and some of them in HotSpot JVM mode?
      • "by default" - e.g. without any --jvm or jvm: true option
    • If so, where do we create the boundary between these two sets of standard libraries?
    • Do we want Standard.Table with Excel, Postgress, SQLite run in native image mode?
    • How do we refactor the code, so std-table, std-database and co. don't have to run in the same JVM, but can run in two JVMs?

To use the Generic JDBC driver, Standard.Database has to run in JVM mode, because it is what runs the driver.

  • Can we refactor it?
  • Can Standard.Generic_JDBC load the driver class, not Standard.Database?
  • What Standard.Generic_JDBC needs from Standard.Database?
    • why it tries to pass some Java objects to Standard.Database?
  • Can such Java code be moved to Standard.Generic_JDBC?
  • Can those two projects only "talk" via Enso interface?

@GregoryTravis
Copy link
Contributor

  • Do we want Standard.Table with Excel, Postgress, SQLite run in native image mode?

But Postgres and SQLite are used from Standard.Database primarily, not just Standard.Table.

Is native mode faster to run? Or just to load? If it is faster to run then I would assume we want as much as possible to run in native mode.

  • Can we refactor it?
  • Can Standard.Generic_JDBC load the driver class, not Standard.Database?

It could load the JDBC driver, but the JDBC driver is used directly by code in Standard.Database.

What are the abilities and limitations of the bridge between the two JVMs? What can be shared between them, and what can be passed back and forth?

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Aug 21, 2025

Native mode (e.g. AOT) vs. JIT

Is native mode ...

The native mode has different performance characteristics.

AOT vs. JIT

... faster to run? Or just to load?

  • native mode is fast to start.
  • JIT is usually easier for delivering better peak performance.

want as much as possible to run in native mode.

  • right now our huge problem is packaging
  • due to high number of 3rd party libraries (Azure, Google, AWS) our native image packaging isn't small
  • in fact, it is too huge to be built on the CI
  • it is so huge that it blocks integration of needed PRs like Launching ydoc-server together with language-server #13178

if (this.findLibraries != null) {
try {
var iop = InteropLibrary.getUncached();
var mayBePath = iop.execute(this.findLibraries, libName);
Copy link
Member Author

@JaroslavTulach JaroslavTulach Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 6af4a5a changeset we can execute:

sbt:enso> runEngineDistribution --jvm
   --vm.D=polyglot.enso.classLoading=Standard.Base:hosted,guest
   --run test/Image_Tests
174 tests succeeded.
0 tests failed.
0 tests skipped.
1 groups skipped.

because now the "dual JVM" can OpenCV.loadShared() without any issues.

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Aug 28, 2025

Snowflake Tests

  • running snowflake tests fails
  • they are again trying to pass a mocked storage from guest Java to hosted Java
️ 
️    [org.enso.table.data.column.storage.ColumnStorage]
️ 
️ without it being registered for runtime reflection. Add [org.enso.table.data.column.storage.ColumnStorage] to the dynamic-proxy metadata to solve this problem. Note: The order of interfaces used to create proxies matters. See https://www.graalvm.org/latest/reference-manual/native-image/metadata/#dynamic-proxy for help.
️         at <java> org.graalvm.nativeimage.builder/com.oracle.svm.core.reflect.MissingReflectionRegistrationUtils.errorForProxy(MissingReflectionRegistrationUtils.java:108)
️         at <java> org.graalvm.nativeimage.builder/com.oracle.svm.core.reflect.proxy.DynamicProxySupport.getProxyClass(DynamicProxySupport.java:180)
️         at <java> [email protected]/java.lang.reflect.Proxy.getProxyConstructor(Proxy.java:64)
️         at <java> [email protected]/java.lang.reflect.Proxy.newProxyInstance(Proxy.java:924)
️         at <java> org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotObjectProxyHandler.newProxyInstance(PolyglotObjectProxyHandler.java:125)
️         at <java> org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotHostAccess.toObjectProxy(PolyglotHostAccess.java:120)
️         at <java> org.graalvm.truffle/com.oracle.truffle.host.HostToTypeNode.asJavaObject(HostToTypeNode.java:640)
️         at <java> org.graalvm.truffle/com.oracle.truffle.host.HostToTypeNode.convertImpl(HostToTypeNode.java:212)
  • because the requested type is interface, the Truffle system tries to "cast" an instance of OtherJvmObject to the hosted interface
  • that fails because of missing reflective configuration
  • however with that information it could even succeed!
    • ColumnStorage.uniqueKey would delegate to TruffleObjects `invokeMember("uniqueKey")
    • ColumnStorage.iterator would delegate to TruffleObject's invokeMember("iterator") and then dynamically proxy the java.util.Iterator interface
    • proxying getStorageType will likely fail as that's a sealed interface...
  • let's try e5b6c2e and this run
  • this can be reproduced locally by running
enso$ ENSO_LAUNCHER=test sbt
sbt:enso> runEngineDistribution 
  --env ENSO_SNOWFLAKE_ACCOUNT=PUSMBUI-DP01445
  --env ENSO_SNOWFLAKE_DATABASE=CI_TEST_DB
  --env ENSO_SNOWFLAKE_USER=jaroslavtulach
   --env ENSO_SNOWFLAKE_PASSWORD=xyz
   --run  test/Snowflake_Tests Upload_Spec

Friday Update

There are DataTimeParsingExceptions. I've seen them locally too...

❌ should round-trip timestamptz column, preserving instant but converting to UTC
An unexpected panic was thrown: java.time.format.DateTimeParseException: Text '2022-05-04 15:30:00.000000000 Z' could not be parsed, unparsed text found at index 21
❌ will round-trip timestamp column without timezone by converting it to UTC
An unexpected panic was thrown: java.time.format.DateTimeParseException: Text '2022-05-04 15:30:00.000000000' could not be parsed, unparsed text found at index 21

_ : NullType -> Value_Type.Null
_ : AnyObjectType -> Value_Type.Mixed
proxy ->
if proxy.isNumeric.not then Error.throw "Unknown object "+proxy.to_text else
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • temporary serde to mitigate the only failure in tests
[FAILED] [Snowflake] (Upload_Spec) Uploading an in-memory Table: [10/11]
    - [FAILED] should not create any table if upload fails [1593ms]
        Reason: An unexpected panic was thrown: (Inexhaustive_Pattern_Match.Error IntegerType[bits=BITS_64])
        at <enso> Storage.to_value_type(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Storage.enso:34-52)
        at <enso> In_Memory_Column_Implementation.inferred_precise_value_type(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Column_Implementation.enso:633:9-42)
        at <enso> Column.inferred_precise_value_type(distribution/lib/Standard/Table/0.0.0-dev/src/Column.enso:2233:9-60)
  • having official serde would be better, @jdunkerley
  • something like StorageType.from(proxy.toString())

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Aug 28, 2025

We seem to run Table_Tests in the other JVM mode - loading classes from the other JVM than Standard.Table. As a result there is 26 failures:

/runner/_work/enso/enso/target/test-results/Table_Tests/JUnit.xml 7134✅ 26❌ 178⚪ 1116s

either we need to get the dual mode working for Table_Tests or convince the system for them to use single JVM. Again, seems to be related to internal communication with caches and environment variables:

Response caching

Hopefully solved by 47f94a0.

h x = Warning.attach "h("+x.to_text+")" "{x="+x.to_text+"}"
i x = Warning.attach "i("+x.to_text+")" Nothing
f x =
Warning.attach "f("+x.to_text+")" <| Pair.new "A" x+10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any semantic effect or is it formatting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easier to put there a breakpoint in VSCode when the body is on separate line. The semantics remain unchanged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge, this should only be formatting. They are still just methods.

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am giving you a blind check approval. I failed to keep up with the whole dual JVM framework implementation. Once I find some decent time and energy, I will give you some post integration feedback.

@JaroslavTulach JaroslavTulach changed the title Avoid including test classes in ENSO_LAUNCHER=test native image Use Dual JVM mode to run StdLib Native tests Aug 30, 2025
@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Aug 30, 2025
@mergify mergify bot merged commit e8e7d9d into develop Aug 30, 2025
272 of 285 checks passed
@mergify mergify bot deleted the wip/jtulach/DualJvmModeForGenericJdbcTests branch August 30, 2025 13:28
mergify bot pushed a commit that referenced this pull request Sep 3, 2025
- To effectively exchange huge chunks of memory among the _"dual JVMs"_  ...
- introduced by #13570
- ... as needed by _In-Memory table_ implementations, let's rely on _direct byte buffers_
- with this PR we have a way for the two _"dual JVMs"_ to exchange and share the same chunk of memory

# Important Notes
- care must be taken to work properly with GC & free
- holding just the [ByteBuffer instance](#13904 (comment)) doesn't prevent other JVM to GC and release its memory region
- holding a pointer to the original `Value` object while working with the `ByteBuffer` should be enough to prevent GC
core ++ stdLibsJars ++ extraNITestLibs.value
},
extraNITestLibs := Def.taskDyn {
if (GraalVM.EnsoLauncher.test) Def.task {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • those mysterious failures in Update to GraalVM 25 #14019
  • could be caused by the dual JVM mode used when testing
  • that could be verified by reverting back these lines
  • then the test helpers would be included again in enso executable when ENSO_LAUNCHER=test mode is on (hopefully)

@Akirathan Akirathan mentioned this pull request Oct 8, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs-API-change-Base Marks a PR that changes the public API of Standard.Base CI: Clean build required CI runners will be cleaned before and after this PR is built. CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants