-
Notifications
You must be signed in to change notification settings - Fork 46
feat(#690): add individual attachment upload for forms with large attachments #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(#690): add individual attachment upload for forms with large attachments #724
Conversation
dianabarsan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! I left some recommendations below
src/lib/insert-or-replace.js
Outdated
| .then(() => db.put(doc)); | ||
| const mime = require('mime-types'); | ||
|
|
||
| const upsertDoc = async (db, doc, maxRetries = 3, retryDelay = 100) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry model that I like the most is where there are no iterations inside the funciton:
const MAX_RETRY = 10;
const retryable = (retry = MAX_RETRY) => {
// do stuff
if (failed && retry) {
retryable(--retry);
}
}
Can you convert the function to follow this model?
src/lib/insert-or-replace.js
Outdated
| const existingDoc = await db.get(doc._id); | ||
| // Creating a copy to avoid mutating the original doc | ||
| const docToSave = { ...doc, _rev: existingDoc._rev }; | ||
| const result = await db.put(docToSave); | ||
| return result; | ||
| } catch (e) { | ||
| if (e.status === 404) { | ||
| // Document doesn't exist, try to create it | ||
| const docToSave = { ...doc }; | ||
| delete docToSave._rev; // Remove _rev for new documents | ||
| const result = await db.put(docToSave); | ||
| return result; | ||
| } else { | ||
| throw e; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this block should be its own function.
src/lib/insert-or-replace.js
Outdated
| const docToSave = { ...doc, _rev: existingDoc._rev }; | ||
| const result = await db.put(docToSave); | ||
| return result; | ||
| } catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catch will end up catching errors from both the get and the put.
can you please separate them?
maybe create a function getDoc that handles the get an errors from the get and a function putDoc that handles conflicts?
src/lib/insert-or-replace.js
Outdated
|
|
||
| console.log(`Conflict detected, retrying attempt ${attempts}/${maxRetries}`); | ||
| // Wait before retrying (exponential backoff) | ||
| await new Promise(resolve => setTimeout(resolve, retryDelay * Math.pow(2, attempts - 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add a delay, just retry immediately.
src/lib/insert-or-replace.js
Outdated
| } | ||
| }; | ||
|
|
||
| const handleLargeDocument = async (db, doc, maxRetries = 5, retryDelay = 200) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, can you please rewrite this using the reducing retry parameter?
src/lib/insert-or-replace.js
Outdated
|
|
||
| // Try to save the document without attachments | ||
| console.log(`Attempting to save ${doc._id} (attempt ${attempts + 1}/${maxRetries})`); | ||
| const res = await db.put(docWithoutAttachments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reuse the putDoc function from the comment above.
src/lib/insert-or-replace.js
Outdated
| console.log(`Successfully saved ${doc._id}, new rev: ${res.rev}`); | ||
|
|
||
| // Handle attachments one by one with individual error handling | ||
| if (attachments && Object.keys(attachments).length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this into a separate function
src/lib/insert-or-replace.js
Outdated
| const att = attachments[attachmentName]; | ||
| const contentType = att.content_type || mime.lookup(attachmentName) || 'application/octet-stream'; | ||
| const attachResult = await db.putAttachment(doc._id, attachmentName, currentDocRev, | ||
| att.data, contentType); | ||
| currentDocRev = attachResult.rev; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should also be a retryable addDocAttachment function.
src/lib/insert-or-replace.js
Outdated
|
|
||
| console.log(`Large document conflict detected for ${doc._id}, retrying attempt ${attempts}/${maxRetries}`); | ||
| // Increase delay progressively | ||
| const delay = retryDelay * Math.pow(2, attempts - 1) + Math.random() * 100; // Add jitter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we should use a delay.
src/lib/insert-or-replace.js
Outdated
| } | ||
| }; | ||
|
|
||
| const forceSaveDocument = async (db, doc) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this is not used. What is the purpose of this?
dianabarsan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very very nice! Huge improvements! Thank you!
I had just reviewed Nitu's PR for the serverside and I realised that uploading a document without any attachments - and then uploading only the xml - will always force the server to regenerate the model.xml and form.html attachments. This adds extra load and will generate additional conflicts for your code.
So ... I want to ask you if you could try, instead of removing all attachments, only remove the media attachments and keep the functional ones.
I appreciate it!
src/lib/insert-or-replace.js
Outdated
| if (!latestDoc) { | ||
| throw new Error(`Document ${docId} not found during attachment retry`); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the bloat in this catch, you could move this outside of the try/catch, right after you check for retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright thanks, let me work on these
src/lib/insert-or-replace.js
Outdated
| try { | ||
| const attachments = doc._attachments; | ||
| const docWithoutAttachments = { ...doc }; | ||
| delete docWithoutAttachments._attachments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save roundabouts on generating attachments on the server, can you please try to preserve the non-media attachments here? These would be the form xml, model.xml and form.html.
Can you please try this and check how it works server-side? We would ideally not see forms getting regenerated when a new media attachment is added.
… update tests for uploading large attachments
src/lib/insert-or-replace.js
Outdated
| try { | ||
| const attachments = doc._attachments || {}; | ||
| // Regex for specific functional files: form.xml, model.xml, form.html | ||
| const functionalRegex = /^(form|model)\.xml$|^form\.html$/i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the attachment that contains the main form xml would just be called xml. Is there a way that we can include that in this capture regex?
src/lib/insert-or-replace.js
Outdated
| const docToSave = { ...doc, _attachments: functionalAttachments }; | ||
| if (Object.keys(docToSave._attachments).length === 0) { | ||
| delete docToSave._attachments; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a little strange to me, where we add this property only to delete it in a validation later. I think it would make more sense to a quick read if we add it conditionally:
| } | |
| const docToSave = { | |
| ...doc, | |
| _attachments: Object.keys(functionalAttachments).length ? functionalAttachments : undefined | |
| }; |
src/lib/insert-or-replace.js
Outdated
| } | ||
|
|
||
| const existingDoc = await getDoc(db, doc._id); | ||
| log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these logs will get very very noisy if the users upload all forms. I suggest we change most of these to be debug logs.
src/lib/insert-or-replace.js
Outdated
| log.info(`Conflict detected for ${doc._id}, retrying`); | ||
| return upsertDoc(db, doc, retries - 1); | ||
| } else if (err.status === 413) { | ||
| return handleLargeDocument(db, doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a form with three video attachments, and every time I upload it, I am being told that I need to overwrite changes from the server, even though no changes have been done to the form in config.
However I'm seeing the same behavior when adding media to another form - media that would not trigger the custom code to handle large attachments, and I'm also seeing this on the main branch.
Would you mind creating an issue to track this bug? The warning should only show up when the config and the server are out of sync.
…ex, conditional _attachments assignment, and adjust log levels
dianabarsan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your PR is not passing our Sonar linting rules. Could you please have a look and address these issues? https://sonarcloud.io/project/issues?id=medic_cht-conf&pullRequest=724&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true
Thanks!
…ling in attachment functions
… reduce its cognitive complexity
dianabarsan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!!
|
awesome job - thanks for the review @dianabarsan and congrats to @sookwalinga on contributing the code which will now be in CHT Core \o/ Diana - are we ready to merge this then? Any reason to wait? |
|
Since there are a couple of connected PRs, this one and two in cht-core, I want to make sure than they work well together. It's a manual test. I'll merge after testing the full interaction. |
Closes #690
Is your feature request related to a problem? Please describe.
The CHT-Api has a hard limit of 32MB allowance per received JSON request bodies: https://github.com/medic/cht-core/blob/5c1f9ba4a149041dd2a16e58420546731cd27b25/api/src/routing.js#L91
This means that cht-conf form upload requests: https://github.com/medic/cht-conf/blob/main/src/lib/insert-or-replace.js will also be subjected to this limitation. Which blocks uploading any form that has a total size of attachments larger than 32MB.
Describe the solution you'd like
Add a fallback in the insert-or-replace function to check the response code from API. If the response code is that the request is too large, the code should attempt to upload the doc body and each individual attachment separately. When uploading attachments, the attachment real mime type should be used as content-type in order to overcome the limitation in API.
Describe alternatives you've considered
None.
Additional context