Skip to content

Programmable HTTP Access Phase and Request Body Reading#1044

Draft
xeioex wants to merge 2 commits intonginx:masterfrom
xeioex:nginx_body_routing_done
Draft

Programmable HTTP Access Phase and Request Body Reading#1044
xeioex wants to merge 2 commits intonginx:masterfrom
xeioex:nginx_body_routing_done

Conversation

@xeioex
Copy link
Copy Markdown
Contributor

@xeioex xeioex commented Apr 3, 2026

Programmable HTTP Access Phase and Request Body Reading

Overview

Two new capabilities let JavaScript handlers participate in request
processing before the content phase begins:

  • js_access -- registers a handler in the HTTP access phase for
    authorization, routing, and request preprocessing.
  • r.readRequestText(), r.readRequestArrayBuffer(),
    r.readRequestJSON()
    -- async methods that read and cache the
    request body, available in any HTTP handler.

Together they enable decisions based on headers, arguments, variables,
and the request body -- all resolved before content generation or
proxying starts.

js_access Directive

js_access <module.function>;

Context: http, server, location

The handler runs in NGX_HTTP_ACCESS_PHASE, before content handlers
(js_content, proxy_pass, fastcgi_pass, etc.) and after built-in
access checkers (allow/deny, auth_basic, auth_request).
Configuration inherits from outer to inner blocks.

  • Subrequests are skipped -- only the main request invokes the
    handler.
  • A synchronous handler can set variables, call r.return(status)
    to reject the request, or simply return to continue to the next phase.
  • An async handler (returning a Promise) suspends the request until
    the Promise settles, enabling ngx.fetch(), r.subrequest(),
    setTimeout(), and body reading without blocking the event loop.
  • An unhandled exception produces 500 Internal Server Error.
  • If the handler does not call r.return(), processing continues
    normally to the content phase (the handler returns NGX_OK to the
    access phase checker).
  • r.decline() signals "no opinion" (NGX_DECLINED), deferring the
    access decision to other checkers. Useful with the satisfy any
    directive when the handler should not grant or deny by itself.
  • r.return(301|302|303|307|308, url) sends a redirect directly
    to the client, enabling authentication flows (OIDC, SAML) that
    redirect unauthenticated users to an identity provider.

Interaction with satisfy

js_access participates in the satisfy directive like other access
phase modules:

  • satisfy all (default) -- all access checkers must allow. A
    handler that completes normally returns NGX_OK (allow); r.return(403)
    denies; r.decline() passes through without granting or denying.
  • satisfy any -- at least one checker must allow. NGX_OK from
    js_access grants access even if deny all is configured.
    r.decline() expresses "no opinion", letting other modules decide.
JS handler action Phase return satisfy all satisfy any
normal return NGX_OK allow, next checker grant, skip rest
r.return(403) 403 deny save error, try next
r.decline() NGX_DECLINED next checker next checker

Request Body Reading

Three async methods read and cache the request body:

Method Returns
r.readRequestText() Promise<string>
r.readRequestArrayBuffer() Promise<ArrayBuffer>
r.readRequestJSON() Promise<object>
  • The body is read once and cached; repeated calls return the same data.
  • Concurrent reads from different methods are rejected with an error.
  • The body remains available for downstream phases (proxy_pass
    forwards it unchanged).
  • Works with chunked transfer encoding, large bodies (respects
    client_body_buffer_size), client_body_in_file_only, and
    client_max_body_size enforcement.
  • Available in both js_access and js_content handlers.

Examples

Authentication with an External Service

async function auth(r) {
    let resp = await ngx.fetch(
        `http://auth-service/check?token=${r.args.token}`);

    if (resp.status !== 200) {
        r.return(resp.status);
        return;
    }

    r.variables.user = await resp.text();
}
location /api/ {
    js_access auth.auth;
    proxy_pass http://backend;
}

The access handler calls an auth service via ngx.fetch(). On failure
the request is rejected immediately; on success a variable is set for
downstream use. The content handler (proxy_pass) only runs after
authentication succeeds.

Authentication with a Subrequest

async function auth(r) {
    let reply = await r.subrequest(
        '/auth_check?token=' + r.args.token);

    if (reply.status !== 200) {
        r.return(reply.status);
        return;
    }

    r.variables.user = reply.responseText;
}

Same pattern using an internal subrequest instead of an outbound fetch.

Dynamic Upstream Routing

function route(r) {
    r.variables.upstream = (r.args.dest === 'one')
        ? '127.0.0.1:8081' : '127.0.0.1:8082';
}
js_var $upstream;

location /route {
    js_access test.route;
    proxy_pass http://$upstream;
}

The access handler computes a routing variable synchronously; proxy_pass
evaluates it after the access phase completes.

Body-Based Access Control

async function body_gate(r) {
    let body = await r.readRequestJSON();

    if (body.role === 'admin') {
        r.return(403);
        return;
    }

    r.variables.foo = body.method + ':' + body.name;
}
js_var $foo;

location /api {
    js_access policy.body_gate;
    proxy_pass http://backend;
}

The request body is parsed as JSON in the access phase. The policy
decision is made before the request reaches the backend. The body is
preserved and forwarded to proxy_pass unchanged.

Body-Driven Routing

const backends = {
    us: '127.0.0.1:8081',
    eu: '127.0.0.1:8082',
};

async function route_by_body(r) {
    let body = await r.readRequestJSON();

    r.variables.upstream = backends[body.region]
                           || '127.0.0.1:8083';
}
js_var $upstream;

location /route {
    js_access routing.route_by_body;
    proxy_pass http://$upstream;
}

Combines body reading with dynamic routing: the request is parsed once
and the upstream is selected based on a field in the payload.

Variable Enrichment

function enrich(r) {
    r.variables.foo = 'client=' + r.remoteAddress
                      + ';method=' + r.method;
}
js_var $foo;

server {
    js_access utils.enrich;

    location / {
        proxy_set_header X-Request-Info  $foo;
        proxy_pass http://backend;
    }
}

A server-level access handler enriches every request with computed
variables. Downstream locations consume them for header injection,
logging, or routing without duplicating the logic.

Access-Phase Redirect

async function auth(r) {
    let resp = await ngx.fetch(
        `http://auth-service/validate?token=${r.args.token}`);

    if (resp.status !== 200) {
        r.return(302, 'https://idp.example.com/login?rd=' + r.uri);
        return;
    }

    r.variables.user = await resp.text();
}
location /protected/ {
    js_access auth.auth;
    proxy_pass http://backend;
}

When authentication fails the handler redirects the client to an
identity provider. On success the request continues to proxy_pass.

Combining with satisfy any

function check_api_key(r) {
    if (r.headersIn['X-API-Key'] === 'secret') {
        return;       /* NGX_OK -- grant access */
    }

    r.decline();      /* no opinion, let other checkers decide */
}
location /api/ {
    satisfy any;

    allow 192.168.1.0/24;
    deny all;

    js_access auth.check_api_key;
    proxy_pass http://backend;
}

Requests with a valid API key are allowed regardless of IP. Without a
key the handler declines and the IP-based allow/deny rules apply.


Questions for Reviewers

This is a draft. We are collecting feedback before finalizing the API.
Please share your thoughts on any of the following:

  1. Naming -- are readRequestText(), readRequestArrayBuffer(), and
    readRequestJSON() clear and consistent? Would you prefer shorter
    names or a different convention?

  2. Use cases -- which of the examples above match problems you are
    solving today? Are there scenarios you would need that are not
    covered?

  3. Body reading in access phase -- reading the request body before
    proxying adds latency and buffering. Is this trade-off acceptable
    for your workloads? Would you want configuration knobs (e.g. max
    body size for access-phase reads)?

  4. Error handling -- the handler can call r.return(status) to
    reject a request or let an unhandled exception produce 500. Is this
    sufficient, or do you need more control over error responses from
    access handlers?

  5. Interaction with other phases -- do you foresee issues combining
    js_access with auth_request, js_content, header/body filters,
    or other modules?

  6. Missing features -- is there anything you would expect from an
    access-phase handler that is not present here?

@xeioex xeioex changed the title Nginx body routing done Programmable HTTP Access Phase and Request Body Reading Apr 3, 2026
@xeioex xeioex force-pushed the nginx_body_routing_done branch from 930f611 to 3145d7c Compare April 3, 2026 02:12
xeioex added 2 commits April 6, 2026 16:42
The js_access directive registers a JavaScript handler in the
HTTP ACCESS phase.
Added async methods
    - r.readRequestText() as string
    - r.readRequestBuffer() as ArrayBuffer
    - r.readRequestJSON() as object.

that return Promises resolving with the request body wrapped
as a corresponding type.
@xeioex xeioex force-pushed the nginx_body_routing_done branch from 3145d7c to a6f3608 Compare April 6, 2026 23:42
@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 6, 2026

Hi @lancedockins,

you may find it useful.

@lancedockins
Copy link
Copy Markdown

lancedockins commented Apr 7, 2026

Thanks, @xeioex. This is awesome. Since I first raised that suggestion, we had to find a workaround to do the routing and protection that we needed but I'd certain prefer to do this via an access phase filter than via a workaround.

Off the cuff, the only thing that sounds like it's overtly missing from this sort of implementation is a body "parser". OpenResty has this for url encoded body types:
https://github.com/openresty/lua-nginx-module?tab=readme-ov-file#ngxreqget_post_args

While it's certainly great if you can constrain your use of this function to something like JSON bodies, that's not really up to us. We have to work with POST bodies of varying types.

That might dictate the need for a few other attributes or utility functions. It certainly did for us when we created this type of logic.

In our case, we needed this regardless of form submission type - not because we're concerned about the attachments that come with the uploads but because we care about the standard field types that precede those attachments. Because of that, we had to write our own body parse logic that determines whether a POST is url encoded or multipart and then populates an object with key value pairs based on that non-attachment data. For our purposes, we "discard" the attachment in the sense that we don't try to parse that into a JSON object (obviously we don't strip the attachment). If you were to institute this, you might need to add a configurable threshold to limit the size of POST bodies that you try to parse. That's what we did. We had to construct body size detection functions that return the size or false depending on whether they exceed the threshold or not.

Ultimately we were able to achieve all of our existing implementations through NJS and creative Nginx configuration. But I suspect that it would be better and faster as a proper C implementation.

Functions that I would consider important for this feature to be "complete":

  • Some sort of body parse function for plain text POST bodies - either for url encoded POST types or the url encoded portions of multipart form submissions
  • A body size attribute or function
  • A configurable threshold for body parsing to prevent CPU exhaustion trying to parse excessively large POST bodies
  • A POST body hashing function that can be used to calculate something like an MD5 hash of the POST body. Obviously this would work in tandem with the POST body threshold limit so that it doesn't try to hash excessively large POST bodies. If you're trying to use the access phase filters for security or access gating, that hash can be useful

Beyond that, I don't really see any issues that would prevent us from using this in the way that we intend.

The only question that I have with what you've shared is the sync vs async nature of some elements here. It appears that you're saying that any "body read" would require an async call. That makes sense given the I/O involved, but is it correct to say that it's only async insofar as it needs to be to read or process the backend I/O? In other words, it's still ultimately a blocking filter? It just refrains from blocking other request processing that could occur within the access phase? But hypothetically if the backend I/O was particularly slow, it would eventually net out to a situation where everything that is non-blocking in the access phase completes and the backend I/O call ultimately becomes a blocking event in the access phase rather than passing the request onto the next phase? Given the intent of this sort of filter, that would be my assumption as you obviously wouldn't want a slow backend process to function like an access phase bypass.

I may have some other thoughts here. But those are my initial thoughts and those are coming from heavy use of this type of functionality in our stack (albeit via workarounds and custom NJS code).

@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 8, 2026

@lancedockins

, it's still ultimately a blocking filter?

No, this is a non-blocking filter. sync vs async is purely JS syntax part. using await readRequestText() required the access handler to be declared as async function.

@lancedockins
Copy link
Copy Markdown

@lancedockins

, it's still ultimately a blocking filter?

No, this is a non-blocking filter. sync vs async is purely JS syntax part. using await readRequestText() required the access handler to be declared as async function.

So just to be clear, when you say that it is non-blocking, does that mean that race conditions are possible? Like hypothetically if your access filter was slow, could the content phase be reached such that the access filter never declines access even if it would ultimately resolve to do so? An access filter that could be bypassed strictly by a race condition would not be an access filter that I would trust sufficiently to use it. So that is a significant detail that I want to understand correctly.

@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 8, 2026

@lancedockins

Thanks, this is very helpful feedback.

A few points from the current implementation/API surface:

  • Body hashing is also already possible in JS today after reading the body, using either the legacy crypto module (md5, sha1, sha256) or WebCrypto for SHA-family digests.

I am planning to add FormData()-like api to support both application/x-www-form-urlencoded and multipart/form-data as r.readRequestForm()

It should close the Form gap.

A POST body hashing function that can be used to calculate something like an MD5 hash of the POST body.

How do you imagine it to work? read the body -> hash or something else?

A body size attribute or function

will let body = await r.readRequestBody(); body.length be enough? We Content-length we know it already from r.headersIn{}, with chunked encoding we have to read it first.

I am planning to add a streamable body API, but not in this scope. maybe this will help.

@lancedockins
Copy link
Copy Markdown

Thanks, @xeioex.

Understood on body hashing. That's what we do now so I guess that's technically a duplicate feature request. This API can certainly live without that. Basically we do read the body and then hash it (excluding attachments).

I don't think that everyone will always need or want the body w/ its attachments. Speaking for our use, we mainly need the key/value data in the form POST rather than the attachments, but there are certainly situations where the attachment part of the client body could be important too so there might need to be a way to retrieve the attachment (optionally) for some users. Since Lua doesn't include that in their get_post_args function either and it seems like more of a niche need, though, I don't know that it strikes me as a requirement so much as a nice eventuality.

Regarding FormData or something similar that sort of parsing functionality is absolutely critical for this from my perspective so I think that that's a great and necessary addition. Admittedly, I'm not a fan of the FormData API. It's overly cumbersome as it forces key/value pair requests through getters and setters rather than just treating the data like an object that inherently contains key value pairs that you can call via objectname.property or object name['property']. I think it's geared more towards form data manipulation rather than basic read operations. A standard JS object would be a lot more natural here unless you're intending to support POST body data writes for things like subrequests (e.g. modify the POST body data before submitting it to a subrequest). I can't personally think of a use case for modifying the POST body before submission upstream. That seems like more of a bad idea or niche case than not to me.

Regarding body size, technically your body.length recommendation would work but it wouldn't actually fulfill our use case to do it that way. Right now we have to use client_body_in_file_only and the file system API to determine actual POST body sizes. Then we choose whether to parse or not based on its size. Essentially, we're trying to avoid expending significant CPU time and memory on body parsing for very large client POST bodies. If I'm following your explanation correctly, it sounds like readRequestBody() would first read the data of the POST into memory and then require a size calculation. So if the end goal is to avoid excessive CPU and memory use from pulling the full POST body into memory and parsing, that method would not achieve that. Given that, I suppose that if there was a configurable attribute for this functionality that you could use to determine whether NJS should try to parse or read the body or not, that is something that you could solve via configuration for this module rather than by exposing the size of the POST body as a property. But I can think of other situations where I would still want to know the POST body size. To your point, though, I suppose Content-Length does sort of answer that as well. Either way, though, some sort of limiting threshold seems appropriate prevent excessive resource consumption with this.

Other questions about how this would all work?

  1. Did you see my question about race condition potential with this as a non-blocking filter? I just want to be sure that I fully understand what we're dealing with here since an access phase filter would need to guarantee a final allow/deny regardless of any potential delays in that phase (rather than proceeding through the content phase and serving content).
  2. Would this js_access filter functionality require the use of client_body_in_file_only or some similar configuration in order to read the POST body args? Or with your intended strategy, would this be able to work independently of that setting?

@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 8, 2026

@lancedockins

So just to be clear, when you say that it is non-blocking, does that mean that race conditions are possible? Like hypothetically if your access filter was slow, could the content phase be reached such that the access filter never declines access even if it would ultimately resolve to do so?

no, this is not possible.

Would this js_access filter functionality require the use of client_body_in_file_only or some similar configuration in order to read the POST body args? Or with your intended strategy, would this be able to work independently of that setting?

no need for client_body_in_file_only, yes.

@lancedockins
Copy link
Copy Markdown

@xeioex thank you.

Given all of that and what you've said you plan to do with this so far, the only remaining thought that I have that I wouldn't be able to solve with what already exists in NJS and what you're adding or planning to add is the defensive limits on POST body parsing. Just to clarify a bit further on that, I realize that you can use client_max_body_size but the concern that I have is independent from that. If we have a high client_max_body_size value to allow for large attachment uploads, you could end up with different mixes of url encoded data vs attachment data. Technically for a 100M POST body, the POST body could be 99M of attachments and 1M of url encoded data or the reverse. It would probably be fine to parse the 1M of url encoded data in a 100M submission. But parsing out 99M of url encoded data definitely wouldn't be good.

Obviously a 99M POST of url encoded data isn't likely a common real world scenario. The intent with the limit is to prevent bad actors from DoS'ing the server by tampering with the submission to exploit the resource capacity needed to process and parse the url encoded data.

Hopefully that makes sense.

@drsm
Copy link
Copy Markdown
Contributor

drsm commented Apr 8, 2026

Hi!

Naming -- are readRequestText(), readRequestArrayBuffer(), and
readRequestJSON() clear and consistent? Would you prefer shorter
names or a different convention?

Maybe consider readRequestBody(options) with default { type: "text" }?
This allows to expose a options.maxBodySize param, so one can get some sort of RangeError if it's exeeded.

@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 8, 2026

@drsm

The idea from the current naming is to align it with existing Fetch API.

We cannot use unambiguously text(), arrayBuffer(), or body.text() as in njs r stands for both Request and Response, hence readRequest*() naming.

@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 8, 2026

@lancedockins

I realize that you can use client_max_body_size but the concern that I have is independent from that. If we have a high client_max_body_size value to allow for large attachment uploads, you could end up with different mixes of url encoded data vs attachment data

I am not sure we need a dedicated njs max_body_size limit. Rather, we may add an optional argument for readRequestForm(), similar to maxKeys querystring.parse(), it will have a reasonable limit for the max number of form args. This even looks to me as more appropriate limit for CPU consumption.

@xeioex
Copy link
Copy Markdown
Contributor Author

xeioex commented Apr 8, 2026

@lancedockins

Admittedly, I'm not a fan of the FormData API. It's overly cumbersome as it forces key/value pair requests through getters and setters rather than just treating the data like an object that inherently contains key value pairs that you can call via objectname.property or object name['property']

yes, but what about forms with identical property names?
For example, for querystring.

qs.parse('foo=bar&abc=xyz')
the output is

{
  foo: 'bar',
  abc: 'xyz',
}

qs.parse('foo=bar&abc=xyz&abc=123')
the output is

{
  foo: 'bar',
  abc: ['xyz', '123']
}

if we do the same, a user needs to check first. this is inconvenient.
Where as getAll() always returns an array, you do not need to do extra check. So, if you expect several duplicate prop names you use getAll(), if you do not care, you use get(). Which is a bit more predictable, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants