- 
                Notifications
    You must be signed in to change notification settings 
- Fork 251
Feature/cldsrv 546 post object #5601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: epic/RING-45960-postObject-api
Are you sure you want to change the base?
Feature/cldsrv 546 post object #5601
Conversation
| Hello kaztoozs,My role is to assist you with the merge of this Available options
 Available commands
 Status report is not available. | 
7948c37    to
    1f132e5      
    Compare
  
    1f132e5    to
    084eb5b      
    Compare
  
    084eb5b    to
    7c42b2f      
    Compare
  
    | Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command: Alternatively, the  | 
7c42b2f    to
    625603a      
    Compare
  
    625603a    to
    d18f9ec      
    Compare
  
    57f34b9    to
    d3b76f0      
    Compare
  
    00752ae    to
    252205a      
    Compare
  
    5325ca1    to
    7ef48e1      
    Compare
  
    7ef48e1    to
    95140e3      
    Compare
  
    c8e4647    to
    92d9273      
    Compare
  
    | /* eslint-disable no-param-reassign */ | ||
| let formDataParser; | ||
| try { | ||
| formDataParser = busboy({ headers: request.headers }); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details about multipart/form-data, parsing by busboy and aws behavior with some links to documentation:
Content-Type
Documentation for the Content-Type, with the boundary directive and example with body: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type
Again, busboy will accept application/x-www-form-urlencoded (https://github.com/fastify/busboy/blob/master/lib/types/urlencoded.js#L9)
That will unnecessarily read the whole body and trigger an error 400 InvalidArgument instead of the 412 Precondition Failed returned by aws and stopping the request early.
Content-Disposition
The documentation about Content-Disposition gives you details for its usage in multipart/form-data: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition
In the body, a part will look like
--boundary
Content-Disposition: form-data; name="f1"
value1
--boundary
Here busboy will parse this and emit a field event with fieldname being f1
You can send this with:
- an html form with an <input type="text" name="f1">
- curl (https://curl.se/docs/manpage.html#-F)
- curl -F f1=value1
- curl -F "f1=</etc/hostname"(this put the content of the file but not the filename)
 
- nodejs with form-dataor raw
You can also have
--boundary
Content-Disposition: form-data; name="f2"; filename="example.txt"
value2
--boundary
You can send this with:
- an html form with an <input type="file" name="f2">
- curl (https://curl.se/docs/manpage.html#-F)
- curl -F "[email protected]"(filename will be example.txt)
- `curl -F "f2=value2;filename=example.txt"
 
- nodejs with form-dataor raw
Notice the filename directive, this will make busboy emit a file event with fieldname  being f2.
As described in busboy documentation: https://www.npmjs.com/package/@fastify/busboy#busboy-methods with the default config isPartAFile.
You can notice in AWS documentation about the file form field: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html#RESTObjectPOST-requests-form-fields
The file or text content.
...
Type: File or text content
So you can either send:
- Content-Disposition: form-data; name="file"(with an input of type text)
- Content-Disposition: form-data; name="file"; filename="whatever.txt"
Which means by default the file form field can be emitted either by the field or file busboy event.
Also by testing on AWS, it accepts other form fields like key, acl having a filename directive (meaning coming from an input type="file"), example this should be possible:
- Content-Disposition: form-data; name="key"; filename="whatever.txt"
- Content-Disposition: form-data; name="acl"; filename="whatever.txt"
So any form field can be emitted by either the field or file busboy event.
The processing / validation should be performed based on the fieldname and not the busboy event.
Note that it could be interesting to replace the default behavior of busboy and defining isPartAFile to consider a file only the fieldname === 'file' (with potentially presence of filename).
When doing that there should also be some limits defined on busboy, if a form field other than file is being a big file and now parsed as a string instead of a stream, limit would help busboy to stop reading it. (The default limit.fieldSize is 1MiB but AWS accepts only 20KB for the form-data exluding file).
The form data and boundaries (excluding the contents of the file) cannot exceed 20KB.
Note: it looks like it's not just the field value but potentially the whole boundary withContent-Dispositionand potential headers of the part, but I'm not sure, might need to be tested.
${filename} replacement
About ${filename} replacement, it's not only on the key, but all form fields: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTForms.html
The variable ${filename} is automatically replaced with the name of the file provided by the user and is recognized by all form fields. If the browser or client provides a full or partial path to the file, only the text following the last slash (/) or backslash () is used (for example, C:\Program Files\directory1\file.txt is interpreted as file.txt). If no file or file name is provided, the variable is replaced with an empty string.
Also the documentation defines  how filename should be sanitized (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#filename)
Busboy should sanitize it, but you could add some tests with filename containing ../xxx, a/b/c, etc to ensure the ${filename} works as aws.
And validation on form fields, like key length, should then be performed after ${filename} replacement as it could impact the value.
Additional validation
You can receive a Content-Disposition: form-data without name, that would make an undefined fieldname and crash:
TypeError: Cannot read properties of undefined (reading 'toLowerCase')
Also by testing on AWS, they seem to trim the fieldname of whitespaces: name="  \n    key   \n    " will should match key.
It's also possible to have multiple parts with the same name.
For example with this:
<input type="text" name="key">
<input type="text" name="key">
<input type="text" name="acl">
<input type="text" name="acl">
<input type="file" name="file" multiple>or curl -F f1=value1 -F f1=value2
We should ensure, just like AWS that we don't have multiple times the same form field, whether busboy emits it via field or file.
Maybe the file should not be read if the parsing failed and an error was caught.
And when reading (pipe) the file stream, we need to catch error, otherwise it can crash:
Error: Part terminated early due to unexpected end of multipart data
It seems the finish event is triggered once the file is completely read, and there is a validation logic in there, the logic could happen earlier before writting file to disk.
There is a validation on Content-Length as well:
    <Code>EntityTooLarge</Code>
    <Message>Your proposed upload exceeds the maximum allowed size</Message>
    <ProposedSize>6442453100</ProposedSize>
    <MaxSizeAllowed>5368730624</MaxSizeAllowed>
The key should be validated just like it's done for regular PutObject.
See https://github.com/scality/Arsenal/blob/7eb2701f21aba91d45c2454ee8c5c257fea1c0db/lib/s3routes/routes.ts#L65-L73
Maybe there could be some blacklisted object prefix on cloudserver
You should find attached a javascript script to write the raw body form-data easier than using `nc`.
It allows to test invalid bodies that are usually well formated by lib like `form-data` or by `curl` or `postman`.
Also you can add a delay between writes to test server timeout behavior.
const http = require('http');
const CRLF = '\r\n';
const body = `\
--test
Content-Disposition: form-data; name="key"
some_key
--test
Content-Disposition: form-data; name="file"; filename="some_name"
a
--test--
`.replace(/\n/g, CRLF);
const options = {
    method: 'POST',
    headers: {
        'Host': 'micktestpost.localhost',
        'Content-Type': 'multipart/form-data; boundary=test',
        'Content-Length': Buffer.byteLength(body),
    },
};
const req = http.request('http://localhost:8000', options, (res) => {
    console.log(req)
    console.log('RESPONSE', { statusCode: res.statusCode, message: res.statusMessage, headers: res.headers });
    res.on('data', (chunk) => {
        console.log(`BODY: ${chunk}`);
    });
    res.on('end', () => {
        console.log('END');
    });
});
req.on('error', (e) => {
    console.error(`problem with request: ${e.toString()}`);
});
// Split body in block of 20 chars
const split = body.match(/[\s\S]{1,20}/g);
// Increase delay without writting to test server timeout
const writeDelayMs = 1_000;
function writeBodyPart(i) {
    if (i < split.length) {
        console.log('Writting', i, split[i])
        req.write(split[i]);
        setTimeout(() => writeBodyPart(i + 1), writeDelayMs);
    } else {
        console.log('Done writting body')
        req.end();
    }
}
// setTimeout(() => writeBodyPart(0), writeDelayMs)
req.write(body);
req.end();Generating the body raw without a lib can help you test any kind of invalid body to make sure you use busboy correctly and validate all input before using them to prevent a crash of the server.
Busboy already takes care of all the parsing, we need to configure it correctly (cf busboy readme) to match AWS spec and validate inputs to prevent a crash.
Ideally you could extract the parsing / validation (parseFormData function) with busboy into its own isolated class to make it easier for testing. And if you want to be consistent with callbacks usage instead of promises and async await, you can pass a callback as input of your parsing function to call once parsing /validation is done.
| request.formData[lowerFieldname] = val; | ||
| } | ||
| // add only the recognized fields to the formData object | ||
| if (POST_OBJECT_OPTIONAL_FIELDS.some(field => lowerFieldname.startsWith(field))) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startsWith will allow things like acl abc. Known fields should be checked with equality and the special x-amz-meta- can check checked aside
| const xml2js = require('xml2js'); | ||
| const axios = require('axios'); | ||
| const crypto = require('crypto'); | ||
| const FormData = require('form-data'); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency as well does not appear on package.json.
This is present only because it's a dependency of a dependency of a dependency, and is an old version.
Make sure to install dependencies you want to use, with latest version if possible (in devDependencies for tests)
> npm ls form-data
[email protected] /home/mickael/scality/cloudserver
└─┬ [email protected]
  └─┬ [email protected]
    └── [email protected]
> npm why form-data
[email protected]
node_modules/form-data
  form-data@"~2.3.2" from [email protected]
  node_modules/request
    request@"^2.86.0" from [email protected]
    node_modules/azure-storage
      azure-storage@"^2.1.0" from the root project
      azure-storage@"~2.10.7" from [email protected]
      node_modules/arsenal
        arsenal@"git+https://github.com/scality/arsenal#4ef5748c028619edff10d6d38b21df43c8d63d88" from the root project
      azure-storage@"~2.10.7" from [email protected]
      node_modules/bucketclient/node_modules/arsenal
      azure-storage@"~2.10.7" from [email protected]
      node_modules/utapi/node_modules/arsenal
      azure-storage@"~2.10.7" from [email protected]
      node_modules/vaultclient/node_modules/arsenal
    request@"^2.79.0" from [email protected]
    node_modules/google-auto-auth
      google-auto-auth@"^0.9.1" from the root project
    request@"^2.88.0" from [email protected]
    node_modules/oas-tools
      oas-tools@"^2.2.2" from [email protected]
      node_modules/utapi
        utapi@"git+https://github.com/scality/utapi#7.70.4" from the root project
| method: 'POST', | ||
| hostname: url.hostname, | ||
| port: url.port, | ||
| path: url.pathname + url.search, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review the documentation: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html#RESTObjectPOST-requests
POST / HTTP/1.1 Host: destinationBucket.s3.amazonaws.com
Anything (more than /) in the path should return a 405 (see #5601 (comment) and scality/Arsenal#2248 (comment))
And any querystring (url.search) should return a 400 (see scality/Arsenal#2248 (comment))
Don't put them in tests if you don't want them to fail once url validation is correct
|  | ||
| req.on('response', res => { | ||
| try { | ||
| assert.equal(res.statusCode, 204); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer strictEqual
| const req = http.request(options); | ||
|  | ||
| req.on('response', res => { | ||
| try { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this request handling make the same code snippet duplicated in all tests.
Ideally, you could encapsulate all this request sending and response / error callback.
You could promisify the request to simplify the code snippet in every tests, (some most recent tests with the aws-sdk uses promise).
| formData.getLength((err, length) => { | ||
| if (err) { | ||
| return done(err); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getLengthSync can be interesting to not have a callback
| }); | ||
| }); | ||
|  | ||
| it('should handle error when bucket does not exist', done => { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could include the statusCode in description for quicker troubleshooting when it fails
| }); | ||
| }); | ||
|  | ||
| it('should successfully upload an object using a POST form', done => { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for organization you could have an additionnal sub describe to regroup testing of successful request and testing of errors
| const req = http.request(options); | ||
|  | ||
| req.on('response', res => { | ||
| if (res.statusCode === 404) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally it should look like assert.strictEqual(res.statusCode, 404) so it's easy to see in the logs tha actual value vs expected
|  | ||
| req.on('response', res => { | ||
| if (res.statusCode === 204) { | ||
| s3.listObjectsV2({ Bucket: bucketName }, (err, data) => { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a HeadObject or GetObject would be more appropriate
Description
Motivation and context
This API adds the ability to upload objects via HTML web forms per customer demand. This is a preliminary implementation and does not yet include Authz, POST policy checking or extra parameters.
Related issues
ARSN-422 - Adds support for postObject API
Notes