- 
                Notifications
    You must be signed in to change notification settings 
- Fork 23
Valkey triggers proposal #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
3feff43
              037ec1e
              d3db587
              969c451
              eb0ffe7
              2ee0409
              0ee10e8
              5f6640b
              5aa4d29
              5bab53b
              321b769
              c37e26a
              cd4cfef
              b905acf
              57a0df4
              3f720ce
              4329189
              3273a7f
              efd9a2a
              4eca10c
              cebd676
              5db2bbc
              2ea37a9
              2868333
              3becac9
              0b74502
              c682a75
              059dede
              4768e32
              c9db2e5
              d1af248
              cbc13b6
              bb1bddf
              85d89e7
              1c9181e
              8f1bfe2
              9fb30ae
              0570091
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,234 @@ | ||||||
| --- | ||||||
| RFC: <PR number> | ||||||
| Status: <Proposed> | ||||||
| --- | ||||||
|  | ||||||
| # Valkey Triggers RFC | ||||||
|  | ||||||
| ## Abstract | ||||||
| Valkey triggers are persistent procedures which are executed in response to some keyspace event. | ||||||
| Today [Valkey keyspace Notifications](https://valkey.io/topics/notifications/) offer the ability to publish events to dedicated pub/sub channels. | ||||||
| Valkey triggers add the ability to place extending logic on such keyspace events and to localize actions on the server side reducing the need to monitor and react to events on the application side. | ||||||
| We propose to extend the existing [Valkey Functions](https://valkey.io/topics/functions-intro/) with the ability to define trigger function call on specific keyspace event. | ||||||
|         
                  ranshid marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||||||
| As being integrated in the Valkey Functions infrastructure, Triggers are persisted as part of the user data. | ||||||
|         
                  ranshid marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||||||
|  | ||||||
| ## Motivation | ||||||
|  | ||||||
| 1. ***Localized actions.*** | ||||||
| Today different application use cases rely on [Valkey keyspace Notifications](https://valkey.io/topics/notifications/) in order to manage the database. For example, consider a case were a key is referenced in different places (like SortedSet and lists). In order to support TTL logic a cleanup needs to be made when the key is evicted to remove it from the different lists or sets. It is possible to have the application listen for eviction events and perform the cleanup, however this would require the application to rely on non-persistent subscribe connections, implement application side logic and extra network hops, which can cause the cleanup to be operated long after the key was evicted. | ||||||
|         
                  ranshid marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||||||
| For example lets take a commonly used pattern of scheduled tasks. | ||||||
| In such cases user usually register tasks in a ZSET with the matching executing time as the score. | ||||||
|         
                  ranshid marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||||||
| Usually what is being done is setting a puller job in the application side which periodically reads all items in the set which has score smaller than the current timestamp, and issue an eval/fcall back to redis with the relevant task to run. | ||||||
|         
                  ranshid marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved          | ||||||
| Usually what is being done is setting a puller job in the application side which periodically reads all items in the set which has score smaller than the current timestamp, and issue an eval/fcall back to redis with the relevant task to run. | |
| Usually what is being done is setting a puller job in the application side which periodically reads all items in the set which has score smaller than the current timestamp, and issue an eval/fcall back to Valkey with the relevant task to run. | 
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image could easily be replaced by the actual code. Then the code would be searchable. Replace instances of "redis" in the code with "server".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please. The use of images is also inaccessible.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went wrong with the line wrapping here.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on previous image.
The function is on_event and the trigger is on_set.  Will it be possible for them both to be on_set?
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very important and subtle issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users who are using ACLs are probably used to to taking such decisions. It's not like the user will define a trigger and will just not consider the access requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the logic more restrictive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I used the wrong phrase. Managing modules require much more effort to maintain.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are triggers limited only to keyspace events? Could you also trigger on an event through some arbitrary channel/pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SO I thought of the keyspace events since they are already defined and known to users, so I think it would make it easier for users to adopt. I am open to other suggestions.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on previous images.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm a little confused here. It appears that you're using something different from the keyspace/keyevent notification format? Why not the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event description is identical to the channel name used to report the keyspace events.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this deserves a much more detailed discussion. What is atomicity in general? What atomicity promises does Valkey currently make? How will these promises change when triggers are being used?
According to Atomicity, "Atomicity is the guarantee that series of database operations in an atomic transaction will either all occur (a successful operation), or none will occur (an unsuccessful operation)."
My understanding is that Valkey only offers a very limited form of atomicity. For simple commands, Valkey matches the general definition. MULTI-EXEC transactions can fail in the middle, so that changes prior to the failure happen, while changes after the failure do not. Can a trigger cause a MULTI-EXEC transaction to fail in the middle?
Could a trigger cause a simple command to fail in the middle?  For example, could a trigger be used to check that each new element of list L is an integer, so that LPUSH L 1 X 2 would fail?  If so, would 1 be pushed and not X or 2?
For example, would it be possible to define a trigger that causes SET X 9 to fail?  If so, then the command MSET A 1 X 9 B 2 would also fail.  Should this have no effect, set A only, or set A and B?
It seems to me that in order to achieve any kind of atomicity, the trigger must be executed inside the same lock that guards the database update and scheduling of pubsub event notifications. Note: modules provide callbacks that will also be executed before this lock is released while pubsub event notifications are delivered after this lock is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that Valkey only offers a very limited form of atomicity. For simple commands, Valkey matches the general definition. MULTI-EXEC transactions can fail in the middle, so that changes prior to the failure happen, while changes after the failure do not. Can a trigger cause a MULTI-EXEC transaction to fail in the middle?
It is not just about "real fail" it is about logical failure. MULTI-EXEC can assume nothing changed once they were issued (using the key watch) however the key might still be changed via triggers while the transaction is executed. Maybe this is wanted and maybe not.
It seems to me that in order to achieve any kind of atomicity, the trigger must be executed inside the same lock that guards the database update and scheduling of pubsub event notifications. Note: modules provide callbacks that will also be executed before this lock is released while pubsub event notifications are delivered after this lock is released.
So it is true that the module notifications are "delivered" once the action is performed. the keyspace notifications are also "written" at the same time, however the application receives them only later on and needs to handle the potential lagging in it's knowledge of the data state. In both options I described, the triggers are executed under the same "lock" meaning as part of the same "ExecutionUnit", the only question is if it should be done in the middle of the ExecutionUnit or at the end of it.
IMO keeping the same way application reacted to keyspace operations without having to operate a complete late roundtrip following keyspace notification would greatly simplify the understanding of this feature, while also providing the ability to make sure it will be executed in the same ExecutionUnit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| It means allowing a write call performed in the middle of operating another command. Another (technical) issue is related to nested scripts. Currently there is a [protection](https://github.com/valkey-io/valkey/blame/unstable/src/script_lua.c#L891) in Valkey code preventing to execute lua script from a scripts. | |
| it means allowing a write call performed in the middle of executing another command. Another (technical) issue is related to nested scripts. Currently there is a [protection](https://github.com/valkey-io/valkey/blame/unstable/src/script_lua.c#L891) in Valkey code preventing the execution of lua scripts from inside scripts. | 
You wrote: "keyspace events are triggered immediately after the operation which triggered them". So, "If we allow the triggers to be executed when the keyspace event has been triggered", then the trigger is executed after the operation. That means any writes inside the trigger-function are not "in the middle of executing another command". Perhaps I have misunderstood something. Please provide more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I was not clear. Lets take the msetnx command (for example) the command will issue keyspace event after each key set.
Now lets say I have:
MSETNX a 1 b 2 c 3 and I have a trigger which is set to do setnx c 0 whenever 'a' is set.
so if we allow the trigger code to run while the event is reported we will get that: a=1, b=2 and c=0
If we allow the trigger to run only at the end of the executionUnit we will get: a=1, b=2 and c=3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with [WATCH]MUTLI/EXEC? Does it interrupt a transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to discuss this section. IMO we should NOT break atomicity with triggers, although this was the way I implemented the PoC. allowing triggers during scripts and transactions can indeed lead to much confusion but alternatively, queuing them and allow them to be executed after the transaction can be problematic since the trigger context might be lagging behind the data. I agree this part is the most problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want both pre and post triggers? A pre-trigger would be helpful for enforcing constraints.
Should we provide access to the value of the key, not just its name, either before or after the command was executed (or possibly both)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought so, but for the first stage we can consider simplifying things and evolve later?
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on you examples, "the way applications are currently reacting to keyspace events" is completely external to the Valkey server. I think we need an in-depth discussion of how closely we want triggers to "follow" that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncomfortable with this approach. The idea that a trigger will sometimes not be executed seems very confusing.
I agree that fully recursive triggers offer yet another way to shoot yourself in the foot, but I would like a clearer idea of who we expect to be writing triggers and how they will be used. It would be helpful to have a much richer set of examples that go into the same kind of depth as your scheduling example.
If you simply want to prevent loops, another option (which still makes me uncomfortable) would be to track the call stack and refuse to execute a trigger that is already executing.
No matter what we do, as long as the language used to define triggers is sufficiently powerful, there will be crashes, deadlocks, and other very bad things. It seems to me that we must draw a distinction between triggers in development and triggers in production. We should expect and allow developers to code bad triggers while they are building and testing an application and provide good support (as you describe below) for debugging.
If a bad trigger reaches production (perhaps it triggers itself, causing an infinite loop), what should happen? If it simply aborts the current transaction, will it leave the data in a consistent state? Should it save the data, then terminate the server? When a bad module is loaded in production we make no promises at all - should we do the same for bad triggers? If we are making promises with respect to bad triggers, what exactly are those promises (e.g., they will never crash the server, or they will never enter infinite loops, etc.)?
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| By default the scope of the user will by the superuser which will not be limited by any authentication restrictions. | |
| By default the scope of the user will be the superuser, which will not be limited by any authentication restrictions. | 
This seems exceedingly dangerous. I would be more comfortable with no default, so that the developer is more likely to be aware of the risks. Or we could default to a user without write permissions, so that the developer must do something to explicitly authorize entry into a higher-risk domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should default to least privilege rather than greatest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should happen if the trigger issues a command that is denied by the ACL restrictions?
The register_trigger function looks like this in one of the images:
register_trigger(trigger-name, keyspace-event, function)
(Note: because it is an image, you cannot search for it.
register_trigger could look at the keyspace-event and require that an ACL user is specified when the event does not have an associated ACL user (e.g., eviction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should happen if the trigger issues a command that is denied by the ACL restrictions?
So the trigger will fail in this case.
register_trigger could look at the keyspace-event and require that an ACL user is specified when the event does not have an associated ACL user (e.g., eviction).
I am sorry - you are right, these images were taken from over 2 years ago when I created a PoC which did not have the ACL addition yet. You are right that the register_trigger should be able to accept a username to run in it's context. this does create some dependency on the ACL and function loading though....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply TRIGGERS [ON|OFF]
How about a command to enable or disable specific triggers?  TRIGGER NAME [ON|OFF]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted this as a way to prevent triggering on a specific client which is probably an admin client or something called from a script. in order to generically disable triggers we can have a global config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a command to enable or disable specific triggers? TRIGGER NAME [ON|OFF]
We can extend that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All or nothing? What if someone wanted to disable some triggers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can extend this in the future if we want. the point of this ability is that the developer/admin wants to access the data and have no interference of triggers while doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a "specific client" in this sentence? Is it intended to only disable triggers that would have been fired by the current client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they should be reset at the same time that function-statistics are reset.
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context I take the user to be a developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
        
          
              
                  ranshid marked this conversation as resolved.
              
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't the log sufficient?
Based on how errors are counted, it seems that a triggered function might attempt to report an error that should be delivered to the client. This suggests that the message will not normally be delivered. What would be delivered?
I have not worked with LUA or EVAL in the past. It seems likely that the answers to some of my questions would be obvious if I had. Please point me at any reading you think would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the trigger execution is hidden from the user we cannot just return it as an error.
For the case when the trigger is issued based on internal events (eg evictions) there is no request context to issue the error to.
For case were the event was issued as part of the complex command flow (eg script or transaction),
we have 2 options to consider trigger execution:
- without breaking the action atomicity (in which case the trigger will be executed AFTER the command was executed) in which case the response to the command was already sent or written to the COB, so not much point or ability to return it to the client.
- we can execute the trigger as soon as the event is reported, practically breaking the atomicity and in this case I agree there is a problem we need to consider how to handle trigger failure, since it potentially already changed the data.
Uh oh!
There was an error while loading. Please reload this page.