Programmable HTTP Access Phase and Request Body Reading#1044
Programmable HTTP Access Phase and Request Body Reading#1044xeioex wants to merge 2 commits intonginx:masterfrom
Conversation
930f611 to
3145d7c
Compare
The js_access directive registers a JavaScript handler in the HTTP ACCESS phase.
Added async methods
- r.readRequestText() as string
- r.readRequestBuffer() as ArrayBuffer
- r.readRequestJSON() as object.
that return Promises resolving with the request body wrapped
as a corresponding type.
3145d7c to
a6f3608
Compare
|
Hi @lancedockins, you may find it useful. |
|
Thanks, @xeioex. This is awesome. Since I first raised that suggestion, we had to find a workaround to do the routing and protection that we needed but I'd certain prefer to do this via an access phase filter than via a workaround. Off the cuff, the only thing that sounds like it's overtly missing from this sort of implementation is a body "parser". OpenResty has this for url encoded body types: While it's certainly great if you can constrain your use of this function to something like JSON bodies, that's not really up to us. We have to work with POST bodies of varying types. That might dictate the need for a few other attributes or utility functions. It certainly did for us when we created this type of logic. In our case, we needed this regardless of form submission type - not because we're concerned about the attachments that come with the uploads but because we care about the standard field types that precede those attachments. Because of that, we had to write our own body parse logic that determines whether a POST is url encoded or multipart and then populates an object with key value pairs based on that non-attachment data. For our purposes, we "discard" the attachment in the sense that we don't try to parse that into a JSON object (obviously we don't strip the attachment). If you were to institute this, you might need to add a configurable threshold to limit the size of POST bodies that you try to parse. That's what we did. We had to construct body size detection functions that return the size or false depending on whether they exceed the threshold or not. Ultimately we were able to achieve all of our existing implementations through NJS and creative Nginx configuration. But I suspect that it would be better and faster as a proper C implementation. Functions that I would consider important for this feature to be "complete":
Beyond that, I don't really see any issues that would prevent us from using this in the way that we intend. The only question that I have with what you've shared is the sync vs async nature of some elements here. It appears that you're saying that any "body read" would require an async call. That makes sense given the I/O involved, but is it correct to say that it's only async insofar as it needs to be to read or process the backend I/O? In other words, it's still ultimately a blocking filter? It just refrains from blocking other request processing that could occur within the access phase? But hypothetically if the backend I/O was particularly slow, it would eventually net out to a situation where everything that is non-blocking in the access phase completes and the backend I/O call ultimately becomes a blocking event in the access phase rather than passing the request onto the next phase? Given the intent of this sort of filter, that would be my assumption as you obviously wouldn't want a slow backend process to function like an access phase bypass. I may have some other thoughts here. But those are my initial thoughts and those are coming from heavy use of this type of functionality in our stack (albeit via workarounds and custom NJS code). |
No, this is a non-blocking filter. sync vs async is purely JS syntax part. using |
So just to be clear, when you say that it is non-blocking, does that mean that race conditions are possible? Like hypothetically if your access filter was slow, could the content phase be reached such that the access filter never declines access even if it would ultimately resolve to do so? An access filter that could be bypassed strictly by a race condition would not be an access filter that I would trust sufficiently to use it. So that is a significant detail that I want to understand correctly. |
|
Thanks, this is very helpful feedback. A few points from the current implementation/API surface:
I am planning to add FormData()-like api to support both It should close the Form gap.
How do you imagine it to work? read the body -> hash or something else?
will I am planning to add a streamable body API, but not in this scope. maybe this will help. |
|
Thanks, @xeioex. Understood on body hashing. That's what we do now so I guess that's technically a duplicate feature request. This API can certainly live without that. Basically we do read the body and then hash it (excluding attachments). I don't think that everyone will always need or want the body w/ its attachments. Speaking for our use, we mainly need the key/value data in the form POST rather than the attachments, but there are certainly situations where the attachment part of the client body could be important too so there might need to be a way to retrieve the attachment (optionally) for some users. Since Lua doesn't include that in their get_post_args function either and it seems like more of a niche need, though, I don't know that it strikes me as a requirement so much as a nice eventuality. Regarding FormData or something similar that sort of parsing functionality is absolutely critical for this from my perspective so I think that that's a great and necessary addition. Admittedly, I'm not a fan of the FormData API. It's overly cumbersome as it forces key/value pair requests through getters and setters rather than just treating the data like an object that inherently contains key value pairs that you can call via objectname.property or object name['property']. I think it's geared more towards form data manipulation rather than basic read operations. A standard JS object would be a lot more natural here unless you're intending to support POST body data writes for things like subrequests (e.g. modify the POST body data before submitting it to a subrequest). I can't personally think of a use case for modifying the POST body before submission upstream. That seems like more of a bad idea or niche case than not to me. Regarding body size, technically your body.length recommendation would work but it wouldn't actually fulfill our use case to do it that way. Right now we have to use client_body_in_file_only and the file system API to determine actual POST body sizes. Then we choose whether to parse or not based on its size. Essentially, we're trying to avoid expending significant CPU time and memory on body parsing for very large client POST bodies. If I'm following your explanation correctly, it sounds like readRequestBody() would first read the data of the POST into memory and then require a size calculation. So if the end goal is to avoid excessive CPU and memory use from pulling the full POST body into memory and parsing, that method would not achieve that. Given that, I suppose that if there was a configurable attribute for this functionality that you could use to determine whether NJS should try to parse or read the body or not, that is something that you could solve via configuration for this module rather than by exposing the size of the POST body as a property. But I can think of other situations where I would still want to know the POST body size. To your point, though, I suppose Content-Length does sort of answer that as well. Either way, though, some sort of limiting threshold seems appropriate prevent excessive resource consumption with this. Other questions about how this would all work?
|
no, this is not possible.
no need for client_body_in_file_only, yes. |
|
@xeioex thank you. Given all of that and what you've said you plan to do with this so far, the only remaining thought that I have that I wouldn't be able to solve with what already exists in NJS and what you're adding or planning to add is the defensive limits on POST body parsing. Just to clarify a bit further on that, I realize that you can use client_max_body_size but the concern that I have is independent from that. If we have a high client_max_body_size value to allow for large attachment uploads, you could end up with different mixes of url encoded data vs attachment data. Technically for a 100M POST body, the POST body could be 99M of attachments and 1M of url encoded data or the reverse. It would probably be fine to parse the 1M of url encoded data in a 100M submission. But parsing out 99M of url encoded data definitely wouldn't be good. Obviously a 99M POST of url encoded data isn't likely a common real world scenario. The intent with the limit is to prevent bad actors from DoS'ing the server by tampering with the submission to exploit the resource capacity needed to process and parse the url encoded data. Hopefully that makes sense. |
|
Hi!
Maybe consider |
I am not sure we need a dedicated njs max_body_size limit. Rather, we may add an optional argument for |
yes, but what about forms with identical property names? qs.parse('foo=bar&abc=xyz') qs.parse('foo=bar&abc=xyz&abc=123') if we do the same, a user needs to check first. this is inconvenient. |
Programmable HTTP Access Phase and Request Body Reading
Overview
Two new capabilities let JavaScript handlers participate in request
processing before the content phase begins:
js_access-- registers a handler in the HTTP access phase forauthorization, routing, and request preprocessing.
r.readRequestText(),r.readRequestArrayBuffer(),r.readRequestJSON()-- async methods that read and cache therequest body, available in any HTTP handler.
Together they enable decisions based on headers, arguments, variables,
and the request body -- all resolved before content generation or
proxying starts.
js_access Directive
Context:
http,server,locationThe handler runs in
NGX_HTTP_ACCESS_PHASE, before content handlers(
js_content,proxy_pass,fastcgi_pass, etc.) and after built-inaccess checkers (
allow/deny,auth_basic,auth_request).Configuration inherits from outer to inner blocks.
handler.
r.return(status)to reject the request, or simply return to continue to the next phase.
the Promise settles, enabling
ngx.fetch(),r.subrequest(),setTimeout(), and body reading without blocking the event loop.500 Internal Server Error.r.return(), processing continuesnormally to the content phase (the handler returns
NGX_OKto theaccess phase checker).
r.decline()signals "no opinion" (NGX_DECLINED), deferring theaccess decision to other checkers. Useful with the
satisfy anydirective when the handler should not grant or deny by itself.
r.return(301|302|303|307|308, url)sends a redirect directlyto the client, enabling authentication flows (OIDC, SAML) that
redirect unauthenticated users to an identity provider.
Interaction with
satisfyjs_accessparticipates in thesatisfydirective like other accessphase modules:
satisfy all(default) -- all access checkers must allow. Ahandler that completes normally returns
NGX_OK(allow);r.return(403)denies;
r.decline()passes through without granting or denying.satisfy any-- at least one checker must allow.NGX_OKfromjs_accessgrants access even ifdeny allis configured.r.decline()expresses "no opinion", letting other modules decide.satisfy allsatisfy anyNGX_OKr.return(403)403r.decline()NGX_DECLINEDRequest Body Reading
Three async methods read and cache the request body:
r.readRequestText()Promise<string>r.readRequestArrayBuffer()Promise<ArrayBuffer>r.readRequestJSON()Promise<object>proxy_passforwards it unchanged).
client_body_buffer_size),client_body_in_file_only, andclient_max_body_sizeenforcement.js_accessandjs_contenthandlers.Examples
Authentication with an External Service
The access handler calls an auth service via
ngx.fetch(). On failurethe request is rejected immediately; on success a variable is set for
downstream use. The content handler (
proxy_pass) only runs afterauthentication succeeds.
Authentication with a Subrequest
Same pattern using an internal subrequest instead of an outbound fetch.
Dynamic Upstream Routing
The access handler computes a routing variable synchronously;
proxy_passevaluates it after the access phase completes.
Body-Based Access Control
The request body is parsed as JSON in the access phase. The policy
decision is made before the request reaches the backend. The body is
preserved and forwarded to
proxy_passunchanged.Body-Driven Routing
Combines body reading with dynamic routing: the request is parsed once
and the upstream is selected based on a field in the payload.
Variable Enrichment
A server-level access handler enriches every request with computed
variables. Downstream locations consume them for header injection,
logging, or routing without duplicating the logic.
Access-Phase Redirect
When authentication fails the handler redirects the client to an
identity provider. On success the request continues to
proxy_pass.Combining with
satisfy anyRequests with a valid API key are allowed regardless of IP. Without a
key the handler declines and the IP-based
allow/denyrules apply.Questions for Reviewers
This is a draft. We are collecting feedback before finalizing the API.
Please share your thoughts on any of the following:
Naming -- are
readRequestText(),readRequestArrayBuffer(), andreadRequestJSON()clear and consistent? Would you prefer shorternames or a different convention?
Use cases -- which of the examples above match problems you are
solving today? Are there scenarios you would need that are not
covered?
Body reading in access phase -- reading the request body before
proxying adds latency and buffering. Is this trade-off acceptable
for your workloads? Would you want configuration knobs (e.g. max
body size for access-phase reads)?
Error handling -- the handler can call
r.return(status)toreject a request or let an unhandled exception produce 500. Is this
sufficient, or do you need more control over error responses from
access handlers?
Interaction with other phases -- do you foresee issues combining
js_accesswithauth_request,js_content, header/body filters,or other modules?
Missing features -- is there anything you would expect from an
access-phase handler that is not present here?