Skip to content

Conversation

Nadahar
Copy link
Contributor

@Nadahar Nadahar commented Oct 5, 2025

This isn't a traditional PR and might never become one. It's a way to illustrate some ideas so that others can judge if they might be useful. The code isn't tested, and isn't intended to be complete in any way. It's closer to "compilable pseudocode".

It's somewhat tricky to get the logic in the LockingCache right. I've made an attempt, but haven't tested or debugged it, so it might be flawed and need further refactoring if the idea is accepted as worthwhile.

This contains two potential implementations for shared caches:

  • ObjectCache is meant to store serialized objects. Objects would have to be serialized and deserialized on storage and retrieval. This would have to be done by the individual scripting language add-on, and objects are unlikely to be valid across different scripting languages. The deserialization procedure would have to have some way to implement that deserialization failed/that the object is invalid, preferably a way that differs from if the object simply doesn't exist in the cache.
  • LockingCache is meant to store (mutable) objects that can be locked, processed and then unlocked. This could be used to "coordinate" several scripts, for example with counters. Cache entries are locked on an individual basis, so one entry can remain locked while other entries can be processed normally. While locked, exclusive access to the object is obtained, so that decisions can be made based on the current value without the possibility that the value changes in the meanwhile, thus avoiding TOCTOU. What is especially tricky here is preventing potential deadlocks if the entry is removed from the cache while scripts still potentially have references to it. Once it has been removed from the cache, it's no longer possible to get the Lock instance, and thus it can't be unlocked. Therefore, I've tried to make logic that tries to make that impossible, but I'm not sure if it's actually "waterproof".

@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/rules-and-concurrency/166577/49

Ravi Nadahar added 4 commits October 5, 2025 22:31
Signed-off-by: Ravi Nadahar <[email protected]>
Signed-off-by: Ravi Nadahar <[email protected]>
Signed-off-by: Ravi Nadahar <[email protected]>
Signed-off-by: Ravi Nadahar <[email protected]>
@Nadahar Nadahar force-pushed the script-cache-experimentation branch from 77660c2 to 7456138 Compare October 5, 2025 20:39
@jimtng
Copy link
Contributor

jimtng commented Oct 6, 2025

  • LockingCache is meant to store (mutable) objects that can be locked, processed and then unlocked.

This is what #4704 was meant for.

  • ObjectCache is meant to store serialized objects. Objects would have to be serialized and deserialized on storage and retrieval. This would have to be done by the individual scripting language add-on, and objects are unlikely to be valid across different scripting languages.

IMO, this is kinda unnecessary. People can just store json string if they deem that necessary.

Cross-language usage is probably extremely rare, but in such cases, json can be used and every language can deal with it because json is json.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 6, 2025

This is what #4704 was meant for.

That might be, you didn't really document what it was for, I assumed it was done as a convenience function. But, it doesn't work as thread-safety: While only one thread is allowed to modify the cache object while the remappingFunction is applied, this object might already be references by other threads that have already "retrieved" it, and they are free to modify the object while the remappingFunction function is running. Also, it uses the cache's lock to "protect" the remappingFunction, which means that the entire cache is locked for the duration of the function. No other, unrelated, values can be stored or retrieved until this function returns. If somebody puts blocking code in the function, you have effectively blocked everything that uses the cache.

This was the whole point with the solution I tried to make here: Let the cache entries be lockable on an individual basis, while letting the cache continue to operate for unrelated entries.

IMO, this is kinda unnecessary. People can just store json string if they deem that necessary.

Yes they can. But do they? Do they understand why they should? It's also very easy to do this "for them", and the point is that using this cache actually is thread-safe. Because of the serialization/deserialization, the same object is never returned. It's always "a new copy". Then same when it's put into the cache. As a result, everything will operate on their own copy, and there will be no threading issues.

Let's not forget that the current shared cache isn't safe to use for anything else than objects that are in themselves thread-safe, which means basically Java objects written to be thread-safe. Timers will be if #5051 is merged, but not a lot of objects in general are thread-safe. Any objects made by any of the "no concurrency" languages can be thread-safe, since they lack the mechanisms to implement thread-safety.

The current shared cache is thus in reality only safe for Timers and primitive types, the reason that it's safe for primitive types is that they are inherently immutable in Java. You can never modify a Java String, the language doesn't let you. Any modification results in a new String. It's the same for Integer, Long, Double etc. But, it's not true for everything, Calendar for example is mutable can isn't safe for use with the current shared cache.

These ideas came as I thought about how to find safe ways to do the things that are unsafe with the current shared cache. The work of implementing this in OH is very minor, the biggest "problem" is that people will keep using the "unsafe" shared cache for things it can't handle for a long time anyway. But, with these other solutions, at least there's a way that can be documented where it is safe to do these things, it's possible to explain how this can be done safely.

As of now, all you can do is really to say "You shouldn't do that", or to suggest that the user him/herself implements serialization or find some other creative solution using some other signalling mechanism for when they can modify things and not. And that most likely won't be thread-safe anyway, because it's not easy to ensure thread-safety without the proper tools.

@jimtng
Copy link
Contributor

jimtng commented Oct 6, 2025

This is what #4704 was meant for.

That might be, you didn't really document what it was for, I assumed it was done as a convenience function. But, it doesn't work as thread-safety: While only one thread is allowed to modify the cache object while the remappingFunction is applied, this object might already be references by other threads that have already "retrieved" it, and they are free to modify the object while the remappingFunction function is running. Also, it uses the cache's lock to "protect" the remappingFunction, which means that the entire cache is locked for the duration of the function. No other, unrelated, values can be stored or retrieved until this function returns. If somebody puts blocking code in the function, you have effectively blocked everything that uses the cache.

Good catch! I was copying how it was done in TrackingValueCacheImpl.get which does the same lock. So both get and compute will lock the entire shared cache if someone called them with a long-running remapping / supplier function!

@jimtng
Copy link
Contributor

jimtng commented Oct 6, 2025

Because of the serialization/deserialization, the same object is never returned. It's always "a new copy". Then same when it's put into the cache. As a result, everything will operate on their own copy, and there will be no threading issues.

Won't that cause confusion? "I tried modifying this object, and there were no errors, but why are my changes not "seen" by the others? I thought this was a shared cache".

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 6, 2025

Won't that cause confusion? "I tried modifying this object, and there were no errors, but why are my changes not "seen" by the others? I thought this was a shared cache".

Its use must be documented, there's no other way to achieve a way to allow operations like shared counters that I can think of. That said, I've named the methods lockAndGet() and lockAndPut() on purpose. There is no get() and put() here. The JavaDocs for these methods also explains that they must be followed up with a call to unlock(), but the JavaDocs probably never make it through so that they are shown in the scripting language editors? They could potentially be documented in a corresponding way for the scripting language in question by the scripting language add-on, so that this documentation was visible in editors.

Also, users that didn't unlock wouldn't see that "their changes were never applied". They would just experience that everything that tried to read that cache value would hang.

Sorry, I didn't read your comment well enough, so I answered as if you were talking about the locking cache. To address what you actually wrote:

That depends on the documentation too, I'd say. Maybe "cache" is the wrong name, maybe "storage" is more fitting. It would have to be pointed out that changes must be explicitly "put" in the storage, and refreshed with "get" as needed. But, if done like that, it should work well. This is also how the current "shared cache" work for primitives, which is some places described as the only thing you can really use it for. They won't update "automatically" if you change them locally, nor will cache updates be reflected locally, unless there is some "real magic" going on with the conversion. That is because they are copied/converted during retrieval and storage, so they aren't actually the same objects. And that, again, is why "they work" with the current shared cache.

The bottom line is that you can't have both: You must either "break the link" to the cached object, or you must use locking in some form. That can either be in the code of the object itself (like a thread-safe Java object), or it must be done using explit locking like LockingCache. The most convenient would be if you could achieve both, but that is only actually possible in single-threaded applications (because there are no actual concurrency).

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 7, 2025

Good catch! I was copying how it was done in TrackingValueCacheImpl.get which does the same lock. So both get and compute will lock the entire shared cache if someone called them with a long-running remapping / supplier function!

Yeah, neither of them are "ideal" for a thread-safe cache, but if the functions are quick, it won't hurt... there probably should be a warning somewhere about giving them long-running functions.

@rkoshak
Copy link

rkoshak commented Oct 7, 2025

I like both ideas. I worry about people causing problems for themselves with the LockingCache but I don't think that's a reason not to offer it as an option.

Is there a way for the cache to track when the script that accesses the cache starts running and returns to idle? It could lock and unlock the entry based on that and relieve the user from needing to do this manually. Though that does give less control to the end users in how long the entry remains locked, and I don't know if timers have a way to track them in a way useful for the cache. We'd need to auto-lock/unlock for timers too.

Something to think about but definitely not required for an MVP PR.

From an interface perspective, are you proposing making two new cache's, or augmenting the existing shared cache (there's no reason to touch the private cache I think). It looks like these are separate caches. I wonder if it would be easier to understand if we used arguments on the existing caches. I can see arguments either way. Probably best to keep them as separate caches.

Does there need to be a distinction between read access and write access? Is that even feasible? I'm not sure. But to read you don't necessarily need a lock for any longer than it takes to get the value, unless you need to ensure it doesn't change until you are done with it. Maybe the serialization/deserialization feature could be used in a lot of cases to avoid deadlocks. Though the complexity to the end users keeps growing.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 7, 2025

Is there a way for the cache to track when the script that accesses the cache starts running and returns to idle? It could lock and unlock the entry based on that and relieve the user from needing to do this manually. Though that does give less control to the end users in how long the entry remains locked, and I don't know if timers have a way to track them in a way useful for the cache. We'd need to auto-lock/unlock for timers too.

I can't quite answer what is and isn't possible. What I can say is that I think it's quite risky: lots of possibilities that it doesn't unlock because something doesn't happen in the order we planned. It would also partially defeat the purpose, because the idea is that you only lock it while you work on that specific value/object. Once you're out of the "locked block" (that's why the best practice is to always use try/finally with locks, where you do the unlocking inside the finally statement), things are free to move on in parallel to your script, so you don't "freeze everything".

I also did a little "trick" in this LockingCache implementation that's really out in the land of experimentation. I realized that it's a problem that the script variable still holds a reference to the shared object after leaving the "locked block" (that the way it is in Java too, but regardless). To avoid it being used later, I came up with the idea that unlock() could return something, and that users should always assign the variable that they used for the shared object to unlock(). It could return null, thus ensuring that there's no reference to it after unlock. But, my real "experiment" was that I made a twist to what unlock() returns. If the object implements Cloneable, a cloned instance (that isn't shared) will be returned (allowing further use, that's not "live"/won't be reflected in the cache), and if it doesn't null will be returned. I don't seriously think this is a good idea to do, because it will be way too complex for users to understand/handle, it's more that I wanted to see what is achievable. I haven't actually tested any of this, because I'm not sure what I need to do (if anything) in the scripting language add-ons to make this available.

From an interface perspective, are you proposing making two new cache's, or augmenting the existing shared cache (there's no reason to touch the private cache I think). It looks like these are separate caches. I wonder if it would be easier to understand if we used arguments on the existing caches. I can see arguments either way. Probably best to keep them as separate caches.

I'm not sure that I'm proposing anything that concrete at this stage. I'm primarily focused on the "challenges" with shared objects and concurrency, and possible ways to handle it. The private cache definitely don't need any of this, it doesn't need it, and works best the way it is. When it comes to putting some of these things into the existing shared cache, or making new ones, I'm not really sure what would be the best design, but I'm afraid that it could be very confusing to users if what they think of as "one cache" actually is several different that works entirely different. But, ultimately, that's a design decision, not so much a technical one.

Does there need to be a distinction between read access and write access? Is that even feasible? I'm not sure. But to read you don't necessarily need a lock for any longer than it takes to get the value, unless you need to ensure it doesn't change until you are done with it. Maybe the serialization/deserialization feature could be used in a lot of cases to avoid deadlocks. Though the complexity to the end users keeps growing.

Java does have a ReentrantReadWriteLock that allows concurrent reads but exclusive writes. It sounds great it first, but in reality I rarely use it (and so does others it seems). There are multiple reasons for this, like that it is magnitudes slower than other locks, but the most important issue, in my opinion, is that it's difficult to use. You must decide upfront (before the "lock block" if you want to acquire a read or write lock. You can't "upgrade" from a read to a write lock, or you can, but it will cause a deadlock. The reason is that the lock allows an unlimited number of threads holding a read lock, but only one holding a write lock. To acquire a write lock, all read locks must first be released. If you hold a read lock while waiting for a write lock, you'll never get it. It turns out that even code sections where you "just want to read" often has some condition where you need to write under specific circumstances. This means that by the time you've upgraded all the read locks to write locks where there's any possibility that you need to write, you've mostly gotten rid of all your read locks. A complicating factor in "the real world" is also that methods can acquire locks, so you don't always know/remember that if you call this, a write lock might be acquired, so I can't call that inside a read-locked section.

So, in reality, I only really use it in situations where things are read a lot, and only very rarely written. Then it can make sense. Except for that, I've experienced that locks aren't as expensive "as many seem to think" if you just think a bit about what you're doing inside the lock. If we're at all worried that this could be too complicated for users, and I think we should be, read/write locks just aren't worth it, IMO.

With locks in general, you never want to hold it longer than you need to. You should organize your logic so that, to the extent possible, you do what you need to do while you hold the lock, in the lock, and then you do the rest outside of it. I often make local copies inside the lock, and then do longer operations with the local copies after having released the lock. Those local copies won't be updated if the shared variable changes, but that turns out not to be a problem in many situations. The main purpose I had in mind with the LockingCache were things like "shared counters", where you evaluate some shared object/value, make a decision based on that, and reflect that in the shared object/value, and you need to make sure that multiple threads don't try to make decisions "in parallel", overwriting each other's decisions and making decisions based on "outdated" data.

In every situation where you don't need such "shared decision-making", I think it would be better to not use LockingCache, and use either the existing shared cache or the serialization cache, depending on the nature of the object/value.

The serialization/deserialization cache avoids all deadlock potential and is generally easy to use, as long as people realize that their local instance isn't "live" and must be stored after change and re-read to get the up-to-date state.

@rkoshak
Copy link

rkoshak commented Oct 7, 2025

I can't quite answer what is and isn't possible. What I can say is that I think it's quite risky: lots of possibilities that it doesn't unlock because something doesn't happen in the order we planned.

Then leaving the responsibility to the end user is down right frightening.

that's why the best practice is to always use try/finally with locks, where you do the unlocking inside the finally statement), things are free to move on in parallel to your script, so you don't "freeze everything".

Relying on best practice from a bunch of enthusiastic but inexperienced programmers (for a huge portion of OH's user base OH is their only exposure to programming at all) is going to guarantee disaster for most of them. That's why it would be better if we can implement best practice on their behalf.

Not to mention that there is no guarantee that a Rules DSL rule will always execute finally in the rule code. Some errors kill execution of the rule right there and do not float the exception up to the rule.

If we have to depend on the end users locking and unlocking I foresee a return to the bad old days of OH 1/2 where rules were not locked and users had to use ReentrantLocks inside their rules. It worked some of the time.

We already hammer on users that rules should run fast and exit. If you need to wait for something, use a timer. If the locking and unlocking were done on their behalf all we need to do is continue giving them that same advice we've always been giving them. They will still ignore it but at least we are not introducing (for most of them) a new concept they have top worry about.

That's why I think it's worth it to lock the entry for the whole run of the script that uses it.

because it will be way too complex for users to understand/handle

I agree, using the return value from unlock() is likely to be misunderstood and generate a lot of threads on the forum when it's not used or misused. There is no end to the creativity of OH users. Throw in vibe coding and the strangeness of code people ask for help with has become quite interesting.

If we can do it for them, even if it means most of the time the lock is held a few dozen msec longer than it strictly needs to be, would greatly increase the likelihood OH users would be able to successfully use them without having to hit up the forum for help.

if you just think a bit about what you're doing inside the lock.

Just remember that most OH users have neither the knowledge nor the skill to do that thinking. If we rely on the end users to make good decisions instead of bad a lot of users will fail. Working with locks is hard for experienced developers.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 7, 2025

Then leaving the responsibility to the end user is down right frightening.

Yes and no. It's much simpler for the user than it is to try to "handle it for them". But, clearly, it can be and probably would be misunderstood/misused. There would have to be documentation that explained what is essential to use the LockingCache, and when you would need it. I'd say that the "most important problem preventing measure" might be to make it clear that you shouldn't use it unless you actually need it.

If we found a waterproof way to unlock any locks locked by the script when it ended (which I'm not even sure is doable), there would still be a lot of implications. That script would then have to not do long-running or potentially blocking operations, even less things like "sleep", for the rest of the duration of the script. Another problem would be that if a "set of scripts" used two such "locking objects/values", every script would have to acquire the locks in the same order. If you have two values A and B, and two scripts, S and T, where S first used A, then B, while T first used B, then A, a deadlock would occur. I think using such a solution would require at least as much "understanding" and education of users, as explaining some rules that only applies for a limited block of code.

Obviously, issues like the one you mention with Rules DSL, where finally might never be executed, makes these languages broken and completely unsuitable to have anything to do with locks at all.

If we have to depend on the end users locking and unlocking I foresee a return to the bad old days of OH 1/2 where rules were not locked and users had to use ReentrantLocks inside their rules. It worked some of the time.

We already hammer on users that rules should run fast and exit. If you need to wait for something, use a timer. If the locking and unlocking were done on their behalf all we need to do is continue giving them that same advice we've always been giving them. They will still ignore it but at least we are not introducing (for most of them) a new concept they have top worry about.

That's why I think it's worth it to lock the entry for the whole run of the script that uses it.

I think this boils down to a larger problem that many great brains have tried to, but been unable to, "solve" to this day: The desire to find a way to make concurrency-able code where the author of the code doesn't have to understand how it works.

Most programming languages face this challenge, but the situation remains pretty much the same: The "simpler" languages opt to simply deny their users to do these things (JavaScript, Python, Ruby etc.). The more "advanced" languages stick to leaving the entire responsibility on the developers. There are some attempts at choosing a "middle road", like Go and Rust, but it doesn't really work: There are solutions that have "training wheels", but this always come with a cost. In the end, the language must either offer "the full system", like C, C++, Java etc., or accept limitations in what it can do. It's a tradeoff: the less complexity, the less possibilities and vice versa.

Programming languages have a lot of possibilities to "design easier to handle mechanisms" than we do here. They can enforce immutability and all kind of "rules" they want to avoid certain pitfalls. We have very little such "power" here, since both Java and the scripting languages don't adhere to any "rules" we try to impose. And, even with these much better control of the environment and ability to enforce "contracts", programming languages have only marginally managed to do things better, in some situations.

The LockingCache solution is the "safest"/simplest way I've been able to think of here. That doesn't mean that better ideas don't exist, or that the result is "simple to use". But, I think it's either something like LockingCache or simply to decide that "this can't be done".

The way it currently "works" isn't better than back when users used locks manually - it only has slightly different problems. With the current shared cache, you can do these things, but not in a thread-safe way, so sometimes this will fail. Maybe not most of the time, that heavily depends on what level of actual concurrency it is (how often does these rules/scripts actually run simultaneously, despite the fact that they theoretically can) in reality. But it clearly isn't "better" than using locks manually, because by doing that, at least it's possible to make it work consistently. But, it might lead to less frustrated users on the forum, because it's easier to do, and they might not notice when it goes wrong, or at least don't understand that this is the reason.

My idea was that maybe the best approach was to discourage doing things that requires locks if at all possible, and then offer a solution where it could actually be done "safely" for those that want the ability to do that, despite the complexity. Maybe that's wrong, maybe that's highly subjective and depends on your perspective. I can't really say other than that, to me, it looks like it might be the best overall approach.

@rkoshak
Copy link

rkoshak commented Oct 8, 2025

That script would then have to not do long-running or potentially blocking operations, even less things like "sleep", for the rest of the duration of the script.

Which is something we already recommend contantly.

Another problem would be that if a "set of scripts" used two such "locking objects/values", every script would have to acquire the locks in the same order. If you have two values A and B, and two scripts, S and T, where S first used A, then B, while T first used B, then A, a deadlock would occur. I think using such a solution would require at least as much "understanding" and education of users, as explaining some rules that only applies for a limited block of code.

I would assume we would be very clear in the docs for this to not do that. If you have more than one entry in this cache, use a map or array to put them all into one entry so only one lock is acquired. I imagine a warning could be written to the log in this case. Cluttering the logs with warnings are very good at changing user's behavior.

The number of users who would need a locking cache in the first place is relatively low (especially once the timer synchronization gets merged). The number who would need to share more than one synchronized value between two different rules is even lower. It is a pretty small population of users that such a limitation would impact.

The desire to find a way to make concurrency-able code where the author of the code doesn't have to understand how it works.

I'm not asking for that. I'm asking for simplification. I'm asking for a great reduction in the amount that the users need to understand and keep in their brain to use it mostly safely. Something where we can give the users three or four "rules" they can fallow and end up with reasonable code.

  1. when using the locked cache the value is locked for the entire runtime of the script, make sure the script exits fast
  2. never use more than one locked cache value in one rule
  3. only use the locked cache for values used to communicate between two rules, use the other caches for all other uses

That's not too hard to follow and understand nor is it too much for users to keep in their mind at the same time. But if they have to lock and unlock manually, we have to double or triple the amount of "rules", the number of edge cases explodes, and their overall understanding plummets.

@jimtng
Copy link
Contributor

jimtng commented Oct 8, 2025

Can't we just make the existing methods to perform locking in a more granular way, i.e. lock per key instead of the entire cache?

In Java's ConcurrentHashMap, it does note that your supplier/mapper function needs to finish its job quickly.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 8, 2025

Which is something we already recommend contantly.

While recommending avoiding sleeping should be generally applicable and not something that people need to do, avoid blocking operations is much harder. There is nothing inherently "wrong" with blocking operations. It's often not easy to know if you call something that might block either. A blocking operation will usually normally return quickly, until suddenly it doesn't. Any HTTP call for example is blocking, in fact, most communication operations are, unless you are using listeners with async callbacks (which makes things much more complicated).

I'd say that if you are to "ban" users from using blocking operations, you're significantly restricting what they can do with this.

I'm not asking for that. I'm asking for simplification. I'm asking for a great reduction in the amount that the users need to understand and keep in their brain to use it mostly safely. Something where we can give the users three or four "rules" they can fallow and end up with reasonable code.

1. when using the locked cache the value is locked for the entire runtime of the script, make sure the script exits fast

2. never use more than one locked cache value in one rule

3. only use the locked cache for values used to communicate between two rules, use the other caches for all other uses

I just don't understand that his is actually easier to use than an explicit unlock. To me, it seems easier to explain that they can only operate on that variable in the "locked section", and that they should exit that section before doing unrelated or potentially longer-running operations.

I was imagining a use case like this:

    let decisionObject = cache.locking.get('decisionObject', () => { counter: 0 });
    try {
        if (decisionObject.counter < 10 && time.ZonedDateTime.now().getHour() <= 15) {
            decisionObject.counter++;
        }
    } finally {
        decisionObject = cache.locking.unlock('decisionObject');
    }

However, when looking at the code now, I'm not convinced that this will work. I'd have to test this to be sure how it would play out, but I can't quite understand how they can manage to keep objects "live" through the "translation" to and from Java objects. As far as I understand, a JavaScript object will be converted into a Java Map. But, will these really be "connected" such that changes to either will be reflected in the other?

Because, if not, the pattern above wouldn't work. It would require a put after updating decisionObject, and with my current "design", that would cause a second "lock level", requiring two unlocks afterwards.

It's possible to change this, I could add a put() in addition to lockAndPut() (which should probably be named putAndLock() instead, because that's more factual), but that put() would then have to throw some kind of exception if the lock isn't already held. How does exceptions behave with JSR223? Is that a meaningful way to handle things, or does it just cause mayhem?

Can't we just make the existing methods to perform locking in a more granular way, i.e. lock per key instead of the entire cache?

We can, the question is if we should. My idea has been to make it as simple as possible, so only expose those that need locking to locking. Of course, the regular cache could simply be equipped with "optional" lock() and unlock() methods that applied on a per-entry basis, and let the users handle the rest. I just feel like that this is a lot to dump in their lap.

If the locking cache is separate, they could only use that for particular purposes where locking is needed, and when their focus is on how to do that, and forget about it the rest of the time.

It's hard to know, really.

I'd also like to add that since this is untested, I don't know if the code I've made behaves as expected.

@rkoshak
Copy link

rkoshak commented Oct 8, 2025

I'd say that if you are to "ban" users from using blocking operations, you're significantly restricting what they can do with this.

Avoid != ban

I just don't understand that his is actually easier to use than an explicit unlock.

Then you have no experience helping non-coders on the forum to be successful creating moderately complicated rules. But I'm not going to argue about it. You've made up your mind, and I've learned there is no shifting that.

How does exceptions behave with JSR223?

I've never seen one "suppressed" like they are in Rules DSL. But it's certainly possible for the individual add-ons to do that. Hopefully they don't but 🤷‍♂️ . It's hard to prove a negative.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 8, 2025

But I'm not going to argue about it. You've made up your mind, and I've learned there is no shifting that.

That's not true, but I do resist if I'm not convinced. As I've said from the start, I'm a bit hesitant of whether it's at all doable. You'd have to track what they do very precisely. But, I was contemplating one "possible solution", that I'm still not sure is doable, but that effectively tried to track any "locking object/value" that a script was in contact with, and when the script execution was over, it unlocked all levels of locks for that thread on the affected locks. It would allow users to explicitly unlock, but be a "safety net" in case they didn't, so that any locks "left hanging" would automatically be released.

@jimtng
Copy link
Contributor

jimtng commented Oct 9, 2025

Can you show some pseudocode / examples on how to deal with this locking issue, and why a separate lock/unlock is needed, from the user API perspective?

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 9, 2025

Can you show some pseudocode / examples on how to deal with this locking issue, and why a separate lock/unlock is needed, from the user API perspective?

    let decisionObject = cache.locking.get('decisionObject', () => { counter: 0 });
    try {
        if (decisionObject.counter < 10 && time.ZonedDateTime.now().getHour() <= 15) {
            decisionObject.counter++;
        }
    } finally {
        decisionObject = cache.locking.unlock('decisionObject');
    }

This is the pseudocode I gave above. As I also said above, I'm not so sure that it will work like that, but that all depends on exactly how the ScriptEngine <-> Java connection works. If you actually get the same object in the script as is stored in the cache, the above will work. I just can't quite figure out how they'd be able to make that work, so I suspect that what you'll receive is a copy. In that case, the above won't work, and you'd have to do a put() after modifying the object.

As to why it's needed, it goes to the core of concurrency: everything can change at any moment, so if you evaluate something and then take action, it won't necessarily be correct, because the evaluation you just made might not be correct anymore. It's called TOCTOU, and applies to concurrency in general, it's not particular to OH or scripts in any way.

Some languages have decided to "shield" its users for such dilemmas, like JavaScript (single-threaded only), Python (GIL) and Ruby (GIL). That means that users of these languages never have to deal with this, but it also means that they are severely limited in what they can do. Threads are a great way to compartmentalize tasks, which would otherwise be extremely complicated. In addition, it's required to be able to utilize more than one CPU core (CPU's typically have more and more cores, leaving the single-threaded languages further and further behind). When using a GIL, you can have multiple threads, but only one of them can do anything at any one time, so using threads is kind of pointless. A GIL wraps all your code in an "invisible lock", so even though you don't see it, it's there.

Java isn't like that. It supports full concurrency, luckily, because it would have been impossible to write something like OH in a single-threaded language. Just think of all the different things that go on at the same time, trying to "schedule" all this to run sequentially would have been a complete nightmare. The slightest "mishap" somewhere, would have brought everything to a grinding halt, not to mention that it would have run much slower to begin with. But, this is also where the problem comes in. We're trying to combine languages that can't handle concurrency with a system that actually does a lot of things concurrently. ScriptEngine instances "live in their own, isolated, world", so it doesn't normally matter so much what the other threads in the system are doing. You can run many scripts in parallel as long as they don't try to interact with each other, or other parts of OH. But, when you want things to run in parallel (like we do with rules/scripts), and you want them to coordinate with each other, you're forced to face concurrency in one form or another. It's simply "impossible" to not deal with it if you want things to work. You can ignore it, which seems to have partially been the approach here, but you will pay for it with "strange errors" and "strange behavior".

So, I don't know how to answer "why it's needed from a user API perspective" other than that if you want users to be able to make their independent rules/script exchange information or otherwise coordinate, you must address how to do it in a coordinated way. Locks/mutex'es is the most obvious way. There are some "alternative approaches" like using circular buffers that passes messages back and forth, but as far as I know, they are all vastly more complicated to use than locks.

My suggestions here are what I've been able to come up with for the easiest way we can actually make independent rules/scripts communicate without experiencing "random behavior" as a result of ignoring concurrency.

The risks surrounding concurrency are many, parts of it has to do with CPU caches and that the same variable can have different values in different caches, some have to do with that one thread can read while another is writing, and thus retrieve a partial or completely invalid value/object. Reordering is also a part of the puzzle, CPUs are free to reorder execution as they see fit, so things doesn't necessarily happen in the order that your code is written. Locks and other "concurrency tools" are also used to create barriers to reordering, where the CPU isn't allowed to "reorder across the barrier".

@jimtng
Copy link
Contributor

jimtng commented Oct 9, 2025

thanks for the explanation and sorry I haven't been following things closely and missed your example.

I asked AI about JRuby:

AI Overview
JRuby does not have a Global Interpreter Lock (GIL), also known as a Global VM Lock (GVL).
Unlike the reference implementation of Ruby (MRI/CRuby), which uses a GIL to ensure thread-safety of its internal C structures, JRuby runs on the Java Virtual Machine (JVM) and leverages the JVM's threading model. This allows JRuby to achieve true parallel execution of Ruby code across multiple CPU cores, as opposed to the concurrent but not parallel execution offered by MRI's GIL.
While JRuby still requires care regarding thread safety in your Ruby code (e.g., using mutexes for shared data), the absence of a GIL means that JRuby threads can run independently and concurrently, making it a suitable choice for applications that can benefit from parallelism.

@jimtng
Copy link
Contributor

jimtng commented Oct 9, 2025

How about using the existing shared cache implementation, is this the same thing?

cache.shared.compute('decisionObject', (key, value) => {
    var object = value;
    if (object === null) {
        object = { counter: 0 };
    }

    if (object.counter < 10 && time.ZonedDateTime.now().getHour() <= 15) {
        object.counter++;
    }
    return object;
});

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 9, 2025

I asked AI about JRuby:

Yes, careful with listening too much to the ML models 😉

All the JSR223 implementation "can run in parallel" because different Java threads can launch different "scripts". The "problem" is that since "standard Ruby" uses a GIL, the language itself has no mechanism to deal with concurrency, and the developers using it has no experience dealing with it. So, I'd say that while the ML answer isn't "directly wrong", it takes the part about mutexes/locks very lightly. This is the exact thing we're trying to deal with here, there is no problem if several JRuby scripts run in parallel if they have no interaction with each other or other threads in Core.

This leaves it up to us to figure out how to handle concurrency for the languages with no "native tools" for doing it. Another question is, if the language handles concurrency natively, are those mechanisms "compatible" with Java's mechanisms, since they have to "work together" here. I suspect that languages that are very close to Java like Groovy probably can cooperate nicely, but I'm not sure that it would be so easy if the language "is more different". That's pure speculation though.

How about using the existing shared cache implementation, is this the same thing?

cache.shared.compute('decisionObject', (key, value) => {
    var object = value;
    if (object === null) {
        object = { counter: 0 };
    }

    if (object.counter < 10 && time.ZonedDateTime.now().getHour() <= 15) {
        object.counter++;
    }
    return object;
});

No, that is "use the cache lock to protect my transaction". The cache lock is there to protect the Map holding the cache key/values itself from concurrency issues. It's only supposed to be locked quickly for getting things in and out of the cache. It is global for the entire cache. So, when you do the above, you lock the entire cache for the duration of the operation. Everything else, completely unrelated rules/scripts that tries to use the shared cache, will have to wait for this to finish. If they do the same thing, things will soon get very slow. It also has other implications. There are other locks at play here, locks protecting script execution for example. They will also be held for the time the scripts are waiting for this lock, so the whole thing can cascade throught the system and make OH as a whole barely move at all.

This is what I've been trying to avoid, and it also has other problems - because if you need to access this value from outside the compute() function, it will be unprotected by the cache lock (if you use get() to retrieve it, no lock is held after you receive it).

@jimtng
Copy link
Contributor

jimtng commented Oct 9, 2025

This one is from JRuby's own wiki, not AI generated: https://github.com/jruby/jruby/wiki/Concurrency-in-jruby

No, that is "use the cache lock to protect my transaction". The cache lock is there to protect the Map holding the cache key/values itself from concurrency issues. It's only supposed to be locked quickly for getting things in and out of the cache. It is global for the entire cache. So, when you do the above, you lock the entire cache for the duration of the operation. Everything else, completely unrelated rules/scripts that tries to use the shared cache, will have to wait for this to finish. If they do the same thing, things will soon get very slow. It also has other implications. There are other locks at play here, locks protecting script execution for example. They will also be held for the time the scripts are waiting for this lock, so the whole thing can cascade throught the system and make OH as a whole barely move at all.

Is this a bit exaggerated? One could say that the get...unlock approach can do the same if the code inside takes some time to execute and a lot of other parallel executions are trying to get...unlock the same key.

What I'm trying to say is that we could make the existing locking mechanism more granular so that it would only lock for the given key instead of taking a global lock. Then we won't need yet another class of cache + a new paradigm to do it.

This is what I've been trying to avoid, and it also has other problems - because if you need to access this value from outside the compute() function, it will be unprotected by the cache lock (if you use get() to retrieve it, no lock is held after you receive it).

Then do it inside compute(). Treat the whole compute() the same as you would with get....unlock block in your example above. If you make it granular then it won't lock the entire cache. It seems to be good enough for ConcurrentHashMap.

So far I haven't seen any need to implement/offer something new. Could you show a different scenario that could not be achieved using compute()?

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 9, 2025

This one is from JRuby's own wiki, not AI generated: https://github.com/jruby/jruby/wiki/Concurrency-in-jruby

I've read through it, and some of the things mentioned there I assume are Ruby-specific, since they don't mean a lot to me/I don't know what they are. It seems that "standard Ruby" has some concurrency tools in the "thread" library, which is surprising to me considering the GIL:

JRuby provides the same concurrency primitives as standard Ruby in the 'thread' library (loaded by default in 1.9 mode). Mutex, ConditionVariable, Queue, and friends all work as they do in MRI, but they are often crucial to writing threadsafe code in JRuby.

In addition to that, JRuby seems to have done some extras to improve the situation further (because the JVM under which it runs is multithreaded):

JRuby also provides a non-standard way to convert any object or class into a fully-synchronized data structure: the JRuby::Synchronized module. You can simply require require 'jruby/synchronized' and then include JRuby::Synchronized into any class or obj.extend(JRuby::Synchronized) any object. The result is that all methods on that class or object will be wrapped in simple synchronization; that is, a reentrant lock against the object itself.

JRuby::Synchronized is used to within the concurrent-ruby gem, which provides Synchronization::Object, Concurrent::Array, Concurrent::Hash, and many other concurrency abstractions for use in JRuby (and MRI).

So, JRuby isn't entirely without tools to manage. That said, there were quite a lot of caveats listed as well, like the core classes and the standard library, and even basic comparison operations which aren't "safe". It's unclear to me exactly what this means though, because it's not like Java's classes or objects are thread-safe either. Java started out (in Java 1.0) seemingly trying to make everything thread-safe, but this was soon abandoned. It's too slow and causes too many problems to do it that way. The clue is to only apply protections where they are actually needed, not "everywhere", and as such, I can't quite judge whether the lack of thread-safety in the standard library and core classes is a problem or not. My guess is that it isn't, at least most of the time.

But, I don't quite see that any of this makes a difference for OH's shared cache. If we were to lay the responsibility for synchronizing the use of the shared cache on the scripting languages, they'd all have to support it.

Is this a bit exaggerated? One could say that the get...unlock approach can do the same if the code inside takes some time to execute and a lot of other parallel executions are trying to get...unlock the same key.

It depends entirely. I've "investigated" several users cases recently where OH has effectively ceased to operate. The thread dumps reveal lots of threads that are stuck waiting for locks, and the core of such an "investigation" is to figure out what started the whole cascade. These will happen under the right circumstances. If there's a deadlock between a couple of threads, or if some locking operation is operating "so slowly" that other things pile up, you do get these situations. None of the cases I've looked at have had the shared cache as the starting point, but to me, using a global lock on the cache to run "transactions" is a huge red flag. It's just the kind of thing that causes such "pile-ups". All it takes for a complete deadlock to occur is that some of the code in one of the "transactions" do something that blocks while some other part of the system retrieves the result, and that other part of the system must wait for something that in the end ends up waiting for the cache lock to be released. Even without a complete deadlock, you can get such long delays that tasks "pile up" in thread pool queues because the tasks arrive faster than they can be resolved. To me, it's just plain common sense that you don't want to do that. The chance of the same happening when using per-entry locking is much smaller - although still possible of course.

Implicit locking has another huge downside as I see it - the user probably have no idea that it's going on, and have no way to do long-running or blocking operations outside the lock. With explicit locking, it's at least very clear to the user when the code is inside a lock and when it isn't.

What I'm trying to say is that we could make the existing locking mechanism more granular so that it would only lock for the given key instead of taking a global lock. Then we won't need yet another class of cache + a new paradigm to do it.

The problem is that I don't understand what this means - I don't know how to do that - except for what I have shown. It doesn't matter if it's done to the existing cache or a new one, that is a "design choice", but the "locking concept" remains the same. With the existing cache, there is no locking except for when values are retrieved or stored (except for compute()). The cache lock isn't there to ensure the integrity of the values in the cache, but the cache itself. So, once you have retrieved the value from the cache, there is no lock. If you modify it, and another script handles the same object, you're out of luck. The "protection" you get from the cache lock in compute() is also limited, because you can't access that value outside of the function you supply to compute(). It's hard for me to imagine exactly how you'd use a value, if you only use it inside compute().

Implementing per key/entry locking isn't just "more granular", it's fundamentally a different concept, because the cache lock only protects the cache. The key/entry/value lock has to have a start and an end. What @rkoshak suggested was basically that the lock would start whenever you accessed the value (get or put) and end when the script finishes execution. In addition to potentially being difficult to achieve technically (it must "never" fail, no matter what happens, or that cache value will remain locked forever), it means that all the other things that the script does after using the shared cache value, will impact the time the lock is held. There is no option to do long-running or blocking operations "outside the lock" unless you can do them before you access the value. And, it's all pretty fuzzy exactly what's going on for the author of the script. It also implicitly has the limitation that one script can only access one shared cache value in practice, because if you acquire a lock while already holding a lock, you must ensure that the two locks will always be acquired in the same order, or you have a potential deadlock.

It sounds much easier to me to handle explicit lock and unlock, than to have to deal with a lot of limitations to what you can do after you access a shared cache value, so I'm just not convinced that doing it "the implicit way" is actually simpler for the user - and it certainly isn't better when it comes to the risk for "lock cascades" since you breach the fundamental principle of holding the lock only for the time it's actually needed.

Then do it inside compute(). Treat the whole compute() the same as you would with get....unlock block in your example above. If you make it granular then it won't lock the entire cache. It seems to be good enough for ConcurrentHashMap.

I'm not sure that I understand what you mean here. ConcurrentHashMap, like our current shared cache, doesn't protect the entries it holds, it only protects the "storage" itself. If you store mutable objects in a ConcurrentHashMap and then mutate them once you have retrieved them, ConcurrentHashMap doesn't do anything to protect you. Also, compute() isn't "a tool" for ConcurrentHashMap. compute() is a fashion-statement, it's part of the ongoing "functional programming trend" whose prophets often manage to make so much noise that reason is drowns. ConcurrentHashMap was there long before compute(), and since it's a Map, and the people behind Java felt pressured to add some "trendy functional stuff" to their Collections, it had to apply to ConcurrentHashMap as well, even though it doesn't really fit. So, they did the only thing they could, they documented that "you shouldn't really use this unless your operation is very quick". It shouldn't be interpreted as a "blessing" of compute().

If your cache/map only holds immutable objects, it does work in a way. Because, you can retrieve an entry and inspect it, but you can't modify it. You can replace it with another immutable object, but that still won't modify the existing object for other threads that might hold a reference to it. In such a case, compute() can be what you need to avoid that another thread replaces the object while you make your "decision", and you can replace it with another immutable value if that is the decision. But, this only works with immutable objects. And immutability is, ultimately, an illusion. RAM is never immutable, it's the rules of the progamming langauge in question that enforces the immutability. In reality, the object could be changed by something that didn't adhere to the rules. When we're dealing with cross-language references, I'm not sure that immutability is a thing. There are no uniform rules to ensure immutability. I mean, a Java string is immutable. If you modify a Java string, you always make a new one. The old one won't change, and any reference to the old one will still contain the same characters. In Ruby, a string is mutable. How does this work? I don't know, but it just can't be the same object. My guess is that there is a conversion when you "cross the border" between the languages, but the exact details are unknown to me. The one thing I'm pretty about is that if you put a Java string in the shared cache, retrieve it with ruby and modify it, the cache won't reflect that. Because it can't. So, using LockingCache with strings would be kind of overkill, unless you have some very long-running "transactions" where you operate on these strings, which would be unsuitable to do under a global cache lock. But, for mutable objects, it's an entirely different story.

I've said from pretty early in this discussion, that to really assess what's needed and where, we'd need to know much more about what happens when objects/values crosses the language boundries. It seems reasonably clear that Java objects (other than primitives and String probably) are passed through as-is, and can be operated on as Java objects. Those that aren't either thread-safe or immutable, is thus "unsafe" to handle together with the shared cache. But, when it comes to primitives and "scripting language constructs", it's very much unclear to me what happens. I read that JavaScript objects are turned into Java maps. That imples a copy as far as I can understand, most likely both ways. If so, they can be safely handled without LockingCache, but they won't automatically update the value in the cache if modified either, they must be put() after modification.

So far I haven't seen any need to implement/offer something new. Could you show a different scenario that could not be achieved using compute()?

I have explained time and time again. Take some mutable object, put it in or retrieve it from the cache, and then modify it. What I can't answer is which objects that retain their "identity" and which ones are copied/converted on their way back and forth between Java and the scripting language.

compute() can only protect against TOCTOU, and only at a relatively high cost to contention. It does not protect mutable objects at all.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

I found Nashorn jsr223 engine notes. It seems to me like it might be entirely up to the individual scripting engine whether they convert/copy objects or just pass on the Java objects.

If that's the case, I really don't know how we can know what should be treated how. Objects that are converted/copied can be used with the existing shared cache, except for the TOCTOU issue, which can partially be "solved" using compute(), depending on circumstances.

Objects that are passed through can not be used with the existing shared cache unless they are thread-safe or immutable (and immutable might not be a thing in this context).

If there's no way to figure out what kind of object we're dealing with, there's no way to figure out what's the right way. The only way to be "safe" would be "the paranoid approach" and treat everything as if they were mutable objects, not copies. But that would complicate and slow down "most operations" (depending on how many objects are copied/converted) unnecessarily.

Is there a way, from within the scripting language/helper library, to determine if you're dealing with a Java object or not?

@rkoshak
Copy link

rkoshak commented Oct 10, 2025

Is there a way, from within the scripting language/helper library, to determine if you're dealing with a Java object or not?

In JS the answer is yes, though the helper library goes to great lengths to always give you JS objects.

Probably the simplest, though not foolproof way would be to just test if MyJavaObject.class !== undefined. It's more code but using typeof or something like that would probably be less of a hack.

I don't know how utils.dumpObject() does it but there is example code there which reports whether the passed in Object is Java or JS.

As far as I understand, in JS, String, primitives, and ZonedDateTime are the only things that are converted back and forth by the language (and ZDT is handled by the add-on, not GraalVM).

But you actually need to run some code to do the conversion. For example, when the event Object is not converted to JS, event.itemName will be a Java String (and JS Strings and Java Strings behave differently). But you can convert it to JS using event.itemName.toString(). Just accessing a member of a Java Object isn't enough to do the conversion by itself, but when it's returned by a function, even if that function is Java, the conversion happens. All I can say is it's magic from my side.

This weirdness is one of the reasons JS does so much to help users avoid Java as much as possible and why wrapping the event object for UI users finally is such a boon.

@jimtng
Copy link
Contributor

jimtng commented Oct 10, 2025

OK I see now. You're talking about the stored object itself and not the cache. Sorry I'm slow to get this.

I've just tested:
Storing a map object in jruby, the map object itself is mutable, and it can be modified in both jruby and js. So they both work directly off the map object.

In a jruby-only implementation, locking the object can be done by creating and storing the mutex into shared cache. However this does add an extra level of complexity that your proposed locking cache solution offers to simplify. This is moreso when dealing with cross-language locking, which although could still be doable without, would definitely benefit from the convenience / simplicity.

@jimtng
Copy link
Contributor

jimtng commented Oct 10, 2025

There is a problem we identified with JRuby, not related to threading. Each JRuby (or for this matter any language) script, whether it is a file, or UI script, is created inside a separate (Jruby) scripting engine. JRuby objects created in one engine cannot be used in another engine because each engine maintains a separate context space and each would have unique hashcode. The JRuby object hashcode from one engine isn't valid / recognised in another engine.

The previous test I mentioned above where Ruby stores a hash object into shared cache and it can be accessed in JS - well, guess what, it can't be accessed by another Ruby script properly!

That's why we warn users not to store complex objects into shared cache. This limitation doesn't apply to Java objects, hence why it's possible to store the timer object, which is a java object.

It sounds like JS has a somewhat similar limitation, albeit not exactly for the same reasons, but I'm not sure - @rkoshak / @florian-h05 can you confirm?

I tried storing a JS Object in the shared cache and got this warning:

It is not recommended to store the JS object with the key 'test1' in the shared cache, as JS objects must not be accessed from multiple threads. Multi-threaded access to JS objects will lead to script execution failure.

So the question is: how often will we encounter the average user storing and working with "mutable objects" in the shared cache? Are there any common use cases for this other than manipulating timers?

@rkoshak
Copy link

rkoshak commented Oct 10, 2025

sounds like JS has a somewhat similar limitation, albeit not exactly for the same reasons, but I'm not sure - @rkoshak / @florian-h05 can you confirm?

Yes, a similar limitation but not the same.

Each Script is in it's own context which is the same. But a hash issue like you describe isn't a the problem. You can access complex JS Objects accessed from two different rules in JS. But the problem is that JS only lets one Java Thread touch it at a time. If one rule is using the object, a multi threaded exception will be thrown when a second rule tries to use it. But as long as they don't try to access it at the same time it's all ok

But of course there's no way to guarantee that *never" happens.

tried storing a JS Object in the shared cache and got this warning:

That warning was added based on an issue I filled when I asked if it were possible to lock the entries in the cache as is described here. The answer was "no, but will add a warning telling users not to put anything but primitives into the cache".

Are there any common use cases for this other than manipulating timers?

I have some classes that don things with timers which are complicated JS Objects I'd like to be able to share. But I work around the limitation.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

The JRuby object hashcode from one engine isn't valid / recognised in another engine.

The previous test I mentioned above where Ruby stores a hash object into shared cache and it can be accessed in JS - well, guess what, it can't be accessed by another Ruby script properly!

This was what the serializing cache was meant to address. When serializing objects to JSON before caching, the object retrieved wouldn't be the same instance, so no issues should exist neither for "invalid" hash codes nor Graal's "sharing forbidden" policy. It would allow sharing objects of any complexity, as long as the scripting language in question would be able to serialize it to JSON. I think the scripting language/helper library would have to be responsible for serialization/deserialization, I imagine that doing that from Java on behalf of the scripting languages would be a huge mess. So, "behind the scenes", it would be a pure Java string cache, holding JSON objects. It doesn't need locking, since the objects inserted and retrieved are always copies. For the same reason, these objects wouldn't be "live", so you would have to put() them after having modified them. But, you could still do that in compute() if you wanted an "atomic transaction" - though it would use the cache lock with the same requirement for "quickness" as discussed above.

There is always some drawback with every solution. It would be possible to equip the JSON cache with locking of course, but I don't think that would be worth it, given that you only need locks for "transaction control", not for "object protection".

So the question is: how often will we encounter the average user storing and working with "mutable objects" in the shared cache? Are there any common use cases for this other than manipulating timers?

I think those kinds of questions are hard to answer. People tend to "evolve" their use according to what is possible/works, so observation of current practices might have a hefty bias.

The reason I suggested potentially two new caches, instead of modifying the existing one, is partially because I think all these things might be difficult to combine into one in a way that wouldn't be very confusing to use, and partly because it would mean that nothing would change with the existing cache. That way, it wouldn't break anything, it would only be additional opportunities for those that have the initiative and interest to learn how to use them. Everybody else could just keep going as they do, within the limitations that exist there.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

I've wondered how the script engines could know what objects were "native" and which were Java. I put an object into the cache from JS, and then attached my debugger to the Java cache.

Here is the object creation:

rules.JSRule({
  name: 'Cache Test',
  execute: (event) => {
    let obj = cache.shared.get('obj', () => {
      return {
      name: 'Complex Object',
      rating: 5.2,
      tags: ['tag1', 'tag2'],
      created: new Date(),
      nested: { a: 1, b: 2 },
      level: 15
    }});
    utils.dumpObject(obj);
  }
})

..and here is the result:

bilde

It is thus no longer a mystery how it can keep track of that. It's not stored as a traditional object, it holds all kind of references to the script engine context and whatnot. It's not surprising how JS can track shared access of JS objects, or why Ruby refuses to share these across different rules/scripts. They aren't "data objects", they are more like instances that give meaning in that particular engine and context.

What I don't understand is how @jimtng could work with the same object from both JS and Ruby. But, that was maybe a Java map?

@jimtng
Copy link
Contributor

jimtng commented Oct 10, 2025

This was what the serializing cache was meant to address. When serializing objects to JSON before caching, the object retrieved wouldn't be the same instance, so no issues should exist neither for "invalid" hash codes nor Graal's "sharing forbidden" policy. It would allow sharing objects of any complexity, as long as the scripting language in question would be able to serialize it to JSON. I think the scripting language/helper library would have to be responsible for serialization/deserialization, I imagine that doing that from Java on behalf of the scripting languages would be a huge mess. So, "behind the scenes", it would be a pure Java string cache, holding JSON objects. It doesn't need locking, since the objects inserted and retrieved are always copies. For the same reason, these objects wouldn't be "live", so you would have to put() them after having modified them. But, you could still do that in compute() if you wanted an "atomic transaction" - though it would use the cache lock with the same requirement for "quickness" as discussed above.

So what's the difference between serializing cache vs the current shared cache?

What I don't understand is how @jimtng could work with the same object from both JS and Ruby. But, that was maybe a Java map?

Yes, the map was created in JRuby. JRuby's native Hash object is basically a Java map (it is an implementation of java.util.Map), then accessed by JS. The JS could also manipulate the object.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

So what's the difference between serializing cache vs the current shared cache?

The only technical difference is that the serializing cache only accepts Java String, while the current shared cache accepts Java Object (which means basically anything). Java strings are immutable, so it is impossible to store anything mutable in the serializing cache. So, they are mostly identical.

The serializing cache would need "help" from the scripting language for the actual serialization/deserialization though. I didn't envision that users should do the serialization themselves, but that it was handled automatically when things were stored or retrieved. If that's not possible to do, it's pointless.

The fact that it would do implicit serialization/deserialization would require that you access it in a "different way" than with the existing shared cache, because you still need to handle other types of values/objects, like Java objects or primitives. So, it might as well be a different cache. Also, just adding the serialization/deserialization as "new methods" to the existing shared cache would "introduce complications" in that you'd have to throw an exception (or is there another way?) if an object that wasn't a Java string would be requested for retrieval (not to mention that the deserialization of "regular strings" would fail).

So, I think it would be most practical to have it as a separate entity, but it's completely possible to just extend the existing cache.

I need input from you "scripting language experts" to know if it's even possible to do this implicit serialization/deserialization, and how it could best be solved if possible.

@jimtng
Copy link
Contributor

jimtng commented Oct 10, 2025

I need input from you "scripting language experts" to know if it's even possible to do this implicit serialization/deserialization, and how it could best be solved if possible.

It's pretty easy to do this automatically, or when requested, within jruby helper library, directly onto the existing shared cache. There are several mechanisms we can introduce that would be compatible with existing mechanisms for handling similar things. It won't need a separate cache class for this.

The same could be done for locking mechanisms as well should the need arise, although that would probably limit the cross-language interoperability, but IMO cross-language interoperability is probably not something you'd need often. None of the current jruby helper library maintainers actually use shared cache. We only added the support for feature parity.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

It's pretty easy to do this automatically, or when requested, within jruby helper library, directly onto the existing shared cache. There are several mechanisms we can introduce that would be compatible with existing mechanisms for handling similar things. It won't need a separate cache class for this.

As I said above, I didn't think that there was a technical requirement to keep them separate. I think it might be more logical to work with, and it would limit the exceptions one might receive. But, these are just suggestions, if people think that it's easier to work with as a part of the existing cache, that's doable as well.

I was just about to try to figure out how to insert an object, like the Ruby hash, and potentially also a more "traditional object" into the shared cache from Ruby, so that I can check out how that looks in the debugger. It takes me quite some time to figure out these syntaxes, so maybe you could just share something I can copy/paste to do that?

@jimtng
Copy link
Contributor

jimtng commented Oct 10, 2025

shared_cache["ruby_hash"] = { "key1" => "value1" }
shared_cache["java_map"] = java.util.HashMap.new({ "key1" => "value1"})

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

shared_cache["ruby_hash"] = { "key1" => "value1" }
shared_cache["java_map"] = java.util.HashMap.new({ "key1" => "value1"})

Don't I have to do anything to make the shared cache "available" (Import/require etc.)?

Also, is there something like a "Ruby Object" with key/value pairs similar to JS, or isn't that a thing? A record, struct, some similar concept?

@jimtng
Copy link
Contributor

jimtng commented Oct 10, 2025

Don't I have to do anything to make the shared cache "available" (Import/require etc.)?

no. Just those two lines, nothing else.

Also, is there something like a "Ruby Object" with key/value pairs similar to JS

No there isn't. Ruby has Hash which is the same as java's Map interface. In fact in JRuby it's an implementation of java's Map interface.

@rkoshak
Copy link

rkoshak commented Oct 10, 2025

     It would allow sharing objects of any complexity, as long as the scripting language in question would be able to serialize it to JSON.

What about a mix of JS/jRuby/whatever and Java? Those timer management classes I mentioned are written in JS but the timers themselves are Java.

While that would be unsuitable for this serialized cache, I can see other use cases where the complex Object might be a mix of Java and JS, particularly in scripting languages that don't do so much to hide the Java from the end users.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

No there isn't. Ruby has Hash which is the same as java's Map interface. In fact in JRuby it's an implementation of java's Map interface.

In that case, the serializing cache might be of much relevance to Ruby.. except for perhaps for sharing hash'es?

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

What about a mix of JS/jRuby/whatever and Java? Those timer management classes I mentioned are written in JS but the timers themselves are Java.

I can't quite answer that until I can see how the objects look in Java. Could you come up with some code that would let me create such a "mixed object", and I can store it in the cache and inspect it from Java? My guess is that these are still JS objects though, so they might be a hard nut to crack given that Graal won't allow these to be shared.

Wrapping them in a Java map won't help either as far as I can tell, because the JS "object entry" would still be the JS object. I think maybe you'd have to manually serialize the JS object, and then put it in the shared cache, leaving the Timer outside the serialization.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 10, 2025

Here is how the Ruby hash looks like from Java:

bilde

So, while it might be based on Java Map, it's definitely quite custom, and especially the metaClass entry has a lot of nodes, some of which I'm sure holds references to the rule engine/context. So, these probably have pretty much the same restriction as JS objects. Serialization seems to be the way to share them.

The Ruby created Java HashMap seems to be a standard Java HashMap, which internally is just a HashTable:

bilde

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 11, 2025

The whole serializing cache functionality could be implemented using the existing shared cache if handled by the helper libraries on storage, but I don't know how they could handle retrieval.

My thinking is that when asked to put() something, the helper lib could evaluate the object/value, and if it turned out to be a "script native object", it could automatically serialize it to JSON and store it as a Java string. The problem is retrieval - hot can the helper library know that it should try to deserialize it? It can't just deserialize every Java string. Maybe some "magic bytes", similar to a BOM, could be added to the start of the string, and used as a trigger for deserialization?

If the deserialization trigger could be solved, this could be a neat way to handle scripting language native objects. It doesn't handle what the locking cache addresses, but it does handle what the serializing cache does.

@rkoshak
Copy link

rkoshak commented Oct 11, 2025

Could you come up with some code that would let me create such a "mixed object

Add to your existing JS Class an entry that is a Java Object.

      return {
      name: 'Complex Object',
      rating: 5.2,
      tags: ['tag1', 'tag2'],
      created: new Date(),
      javaObj: new Java.type('java.time.LocalDatetime').now(),
      nested: { a: 1, b: 2 },
      level: 15
    }});

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 11, 2025

Add to your existing JS Class an entry that is a Java Object.

I'm getting an error:

Failed to execute rule Cache-Test-ebea05b6-73f7-4789-9a5f-4803c459dc78: TypeError: function type() { [native code] } is not a constructor function: TypeError: function type() { [native code] } is not a constructor function
        at <program> (test.js:36)
        at get (openhab.js:2)
        at execute (test.js:30)
        at doExecute (openhab.js:2)

LocalDateTime.now() is a static factory method, not a constructor, so maybe it can't be used with new? I tried without new too, but then I get a different error:

Failed to execute rule Cache-Test-5ad539bb-59c7-46ad-bd6e-5be2ca025282: TypeError: Access to host class java.time.LocalDatetime is not allowed or does not exist.: TypeError: Access to host class java.time.LocalDatetime is not allowed or does not exist.
        at <program> (test.js:36)
        at get (openhab.js:2)
        at execute (test.js:30)
        at doExecute (openhab.js:2)

Ah... it was LocalDatetime while it should be LocalDateTime

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 11, 2025

I still get an error now if I use "new":

Failed to execute rule Cache-Test-d854fd3b-eda0-4053-88a0-1a42ae97ff49: TypeError: instantiate on java.time.LocalDateTime.now failed due to: Message not supported.: TypeError: instantiate on java.time.LocalDateTime.now failed due to: Message not supported.
        at <program> (test.js:37)
        at get (openhab.js:2)
        at execute (test.js:30)
        at doExecute (openhab.js:2)

If I drop "new" and use this:

rules.JSRule({
  name: 'Cache Test',
  execute: (event) => {
    cache.shared.remove('obj');
    let obj = cache.shared.get('obj', () => {
      let LocalDateTimeClass = Java.type('java.time.LocalDateTime');
      return {
      name: 'Complex Object',
      rating: 5.2,
      tags: ['tag1', 'tag2'],
      created: new Date(),
      javaObj: LocalDateTimeClass.now(),
      nested: { a: 1, b: 2 },
      level: 15
    }});
    utils.dumpObject(obj);
  }
})

.. I don't get an error. But, I don't see the javaObj entry in the debugger either, so I'm not sure that field/property? is actually created.

@Nadahar
Copy link
Contributor Author

Nadahar commented Oct 11, 2025

I think it actually worked without "new", it just didn't show up where I expected it:

bilde

Still, this is very much a "JS Object" with a child Java object, so you can't share it in the cache. I think it's a dead end if you do it the other way too, embed a JS object inside a Java object - it will probably work, but there will most likely be problems as soon as you try to access the JS object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants