Skip to content

Conversation

@rjd15372
Copy link
Member

@rjd15372 rjd15372 commented Nov 20, 2025

This PR restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This change
enables the possibility of a backwards compatible Lua engine upgrade, as well
as, the flexibility in building Valkey without the Lua engine.

Important: from a user's point of view, there's no difference in using
the EVAL of FUNCTION/FCALL scripts. This PR is fully backward compatible
with respect to the public API.

The main code change is the move and adaptation of the Lua engine source
files from src/lua to src/modules/lua. The original Lua engine code is
adapted to use the module API to compile and execute scripts.

The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP values
was done directly from the client buffer, in the new implementation the
parsing is done from the ValkeyModuleCallReply object and respective API.

The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running make install the Lua engine module is installed in the
default system library directory.
There's a new build option, called BUILD_LUA, that if set to no allows to
build Valkey server without building the Lua engine.

This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.

Fixes: #1627

@rjd15372 rjd15372 self-assigned this Nov 20, 2025
@rjd15372 rjd15372 added the enhancement New feature or request label Nov 20, 2025
@rjd15372
Copy link
Member Author

This PR is rebased on top of #2836 . When #2836 is merged, I'll update this PR.

This commit restructures the Lua scripting functionality by extracting
it from the core Valkey server into a separate Valkey module. This change
enables the possibility of a backwards compatible Lua engine upgrade, as well
as, the flexibility in building Valkey without the Lua engine.

**Important**: from a user's point of view, there's no difference in using
the `EVAL` of `FUNCTION/FCALL` scripts. This PR is fully backward compatible
with respect to the public API.

The main code change is the move and adaptation of the Lua engine source
files from `src/lua` to `src/modules/lua`. The original Lua engine code is
adapted to use the module API to compile and execute scripts.

The main difference between the original code and the new, is the
serialization and deserialization of Valkey RESP values into, and from,
Lua values. While in the original implementation the parsing of RESP values
was done directly from the client buffer, in the new implementation the
parsing is done from the `ValkeyModuleCallReply` object and respective API.

The Makefile and CMake build systems were also updated to build and
integrate the new Lua engine module, within the Valkey server build
workflow.
When the Valkey server is built, the Lua engine module is also built,
and, the Lua module is loaded automatically by the server upon startup.
When running `make install` the Lua engine module is installed in the
default system library directory.
There's a new build option, called `WITHOUT_LUA`, that if set allows to
build Valkey server without building the Lua engine.

This modular architecture enables future development of additional Lua
engine modules with newer Lua versions that can be loaded alongside the
current engine, facilitating gradual migration paths for users.

Signed-off-by: Ricardo Dias <[email protected]>
@zuiderkwast
Copy link
Contributor

Wow!

Some initial comments:

  • For the build option, why not use the name BUILD_LUA (default yes), to align with the build options BUILD_TLS and BUILD_RDMA?
  • For backward compatibility, the lua module needs to load by default. To disable this auto-loading, we need a new config, don't we? I can't see it mentioned here.
  • Can the lua module be unloaded using MODULE UNLOAD?

@rjd15372
Copy link
Member Author

  • For the build option, why not use the name BUILD_LUA (default yes), to align with the build options BUILD_TLS and BUILD_RDMA?

Sure, that makes more sense. I'll update the PR.

  • For backward compatibility, the lua module needs to load by default. To disable this auto-loading, we need a new config, don't we? I can't see it mentioned here.

I was thinking that if the lua module is built then it would always be loaded by default. But we can add a new config to disable auto-loading even if the lua module is built.

  • Can the lua module be unloaded using MODULE UNLOAD?

Yes

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Nov 20, 2025

I was thinking that if the lua module is built then it would always be loaded by default. But we can add a new config to disable auto-loading even if the lua module is built.

Yeah, we can discuss it with the core team.

I believe in many contexts, such as in pre-built containers, the module is already built. Many users don't build their own binaries.

@rjd15372
Copy link
Member Author

I think it makes sense to have that config option to disable auto-load.

@rjd15372
Copy link
Member Author

I think it makes sense to have that config option to disable auto-load.

@zuiderkwast I vote for implementing such option in a follow up PR. This PR is already huge to review.

Signed-off-by: Ricardo Dias <[email protected]>
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 81.03448% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.87%. Comparing base (48e0cbb) to head (283ac27).

Files with missing lines Patch % Lines
src/module.c 74.28% 9 Missing ⚠️
src/scripting_engine.c 92.30% 1 Missing ⚠️
src/server.c 87.50% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2858      +/-   ##
============================================
+ Coverage     72.47%   73.87%   +1.40%     
============================================
  Files           129      125       -4     
  Lines         70537    68775    -1762     
============================================
- Hits          51121    50810     -311     
+ Misses        19416    17965    -1451     
Files with missing lines Coverage Δ
src/config.c 78.76% <ø> (+0.32%) ⬆️
src/eval.c 87.46% <100.00%> (ø)
src/module.h 0.00% <ø> (ø)
src/replication.c 86.01% <ø> (-0.05%) ⬇️
src/script.c 89.03% <ø> (+8.89%) ⬆️
src/scripting_engine.c 56.88% <92.30%> (+3.51%) ⬆️
src/server.c 88.69% <87.50%> (+0.29%) ⬆️
src/module.c 25.48% <74.28%> (+15.71%) ⬆️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Ricardo Dias <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments. I'll do another pass later. The change is not as huge as the +/- numbers indicate. Many lines are moved to other files.

The build options should be mentioned in the README.md.

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty solid in general.

My main concerns are about the build-time changes. Is there a risk that it will break compilation or installation for any user?

If Valkey is build with BUILD_LUA=no, then it doesn't automatically load a lua module even if such module is installed in the system later. Maybe it's not bad, but I just want to mention it.

@valkey-io/valkey-committers Does anyone else want to take a look, especially on the compile-time changes (Makefiles etc.)?

@rjd15372
Copy link
Member Author

My main concerns are about the build-time changes. Is there a risk that it will break compilation or installation for any user?

I tried my best to not break it, but a another pair of eyes is needed to improve confidence on the changes.

If Valkey is build with BUILD_LUA=no, then it doesn't automatically load a lua module even if such module is installed in the system later. Maybe it's not bad, but I just want to mention it.

Right. But if later someone installs the module separately, it can add a loadmodule line to valkey.conf to load it automatically.

Signed-off-by: Ricardo Dias <[email protected]>
@zuiderkwast zuiderkwast moved this to In Progress in Valkey 9.1 Nov 28, 2025
@zuiderkwast zuiderkwast added run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) release-notes This issue should get a line item in the release notes labels Dec 9, 2025
@zuiderkwast
Copy link
Contributor

Do you know why the external tests are failing? I can see this in the logs:

Notice: nested start_server statements in external server mode, test must be aware of that!

When I check tests/unit/type/stream-cgroups.tcl, I see there's a nested start_server on line 1012, but it's not new. Anyway, I don't think nested start_server works on external server tests so we should add the tag external:skip, but in a separate PR. But I wonder why it started failing now and not in unstable.

@rjd15372
Copy link
Member Author

rjd15372 commented Dec 9, 2025

Do you know why the external tests are failing? I can see this in the logs:

I'm investigating. This is not the first time that fails in the same assertion, it should be something caused by this PR.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few random comments and questions. See below.

LUA_MODULE_INSTALL=install-lua-module

current_dir = $(shell pwd)
FINAL_CFLAGS+=-DLUA_ENGINE_ENABLED -DLUA_ENGINE_LIB=libvalkeylua.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "ENGINE" part of these names add? I imagine just LUA_ENABLED and LUA_LIB are more concise and clear.

Suggested change
FINAL_CFLAGS+=-DLUA_ENGINE_ENABLED -DLUA_ENGINE_LIB=libvalkeylua.so
FINAL_CFLAGS+=-DLUA_ENABLED -DLUA_LIB=libvalkeylua.so


LIBS= $(DEPS_DIR)/lua/src/liblua.a $(DEPS_DIR)/fpconv/libfpconv.a
SRCS= $(wildcard *.c)
OBJS= $(SRCS:.c=.o) sha1.o rand.o
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lua module depends on sha1 and rand from Valkey and fpconv from Valkey deps? (You removed dependency to sds, adlist and stuff like ll2string though.) Is this a problem? An alternative is to provide SHA1, random and float-parsing in the module API.

Let's create a modules/lua/README.md file that explains how this module is independent (or not) to the valkey's source code and other relevant information about it. I can imagine future contributors might add dependency on valkey internals again unless there is some text describing the ideas here.

How do we handle these dependencies in an external Lua 5.4 module? Do we copy these dependencies to that module? (If the module and the core use different versions of e.g. fpconv, would it work?)

* Based on the following article (that apparently does not provide a
* novel approach but only publicizes an already used technique):
*
* https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link doesn't seem to work anymore (or it requires login). We can skip it or use wayback machine. I found a link that works:

Suggested change
* https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920 */
* https://web.archive.org/web/20150427221229/https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920 */

We could also just delete the link.

Comment on lines +239 to +240
old_s = s;
s = ValkeyModule_CreateStringPrintf(NULL, "%s %g", prefix, (double)lua_tonumber(lua, idx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the TNUMBER case, isn't there an extra space added in the format "%s %g"? Before this change, it used sdscatprintf with the format "%g" which doesn't insert a space before the number.

I see old_s is freed in the end of the function. I think the code is easier to follow if make this variable local to this case and free it locally. Same with the prefix variable, either move it to the local scope or we don't need it at all?

Suggested change
old_s = s;
s = ValkeyModule_CreateStringPrintf(NULL, "%s %g", prefix, (double)lua_tonumber(lua, idx));
ValkeyModuleString old_s = s;
s = ValkeyModule_CreateStringPrintf(NULL, "%s%g",
ValkeyModule_StringPtrLen(s, NULL),
(double)lua_tonumber(lua, idx));
ValkeyModule_FreeString(NULL, old_s);


static void _serverPanic(const char *file, int line, const char *msg, ...) {
fprintf(stderr, "------------------------------------------------");
fprintf(stderr, "!!! Software Failure.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fprintf(stderr, "!!! Software Failure.");
fprintf(stderr, "!!! Software Failure. Press left mouse button to continue.");

It's from Amiga. Left button doesn't make sense anymore but it's part of the original message. Do you think it's silly to keep it?

Alternative:

Should we add a ValkeyModule_Panic wrapper, similar to ValkeyModule_Assert?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[NEW] Move Lua scripting engine into an external Valkey module

2 participants