Skip to content

Conversation

@sookwalinga
Copy link

@sookwalinga sookwalinga commented Jul 22, 2025

Closes #690

Is your feature request related to a problem? Please describe.
The CHT-Api has a hard limit of 32MB allowance per received JSON request bodies: https://github.com/medic/cht-core/blob/5c1f9ba4a149041dd2a16e58420546731cd27b25/api/src/routing.js#L91

This means that cht-conf form upload requests: https://github.com/medic/cht-conf/blob/main/src/lib/insert-or-replace.js will also be subjected to this limitation. Which blocks uploading any form that has a total size of attachments larger than 32MB.

Describe the solution you'd like
Add a fallback in the insert-or-replace function to check the response code from API. If the response code is that the request is too large, the code should attempt to upload the doc body and each individual attachment separately. When uploading attachments, the attachment real mime type should be used as content-type in order to overcome the limitation in API.

Describe alternatives you've considered
None.

Additional context

@sookwalinga sookwalinga marked this pull request as ready for review July 22, 2025 10:19
@dianabarsan dianabarsan self-requested a review July 23, 2025 05:32
Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start! I left some recommendations below

.then(() => db.put(doc));
const mime = require('mime-types');

const upsertDoc = async (db, doc, maxRetries = 3, retryDelay = 100) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry model that I like the most is where there are no iterations inside the funciton:

const MAX_RETRY = 10;
const retryable = (retry = MAX_RETRY) => {
 // do stuff
 if (failed && retry) {
  retryable(--retry);
 }
}

Can you convert the function to follow this model?

Comment on lines 10 to 24
const existingDoc = await db.get(doc._id);
// Creating a copy to avoid mutating the original doc
const docToSave = { ...doc, _rev: existingDoc._rev };
const result = await db.put(docToSave);
return result;
} catch (e) {
if (e.status === 404) {
// Document doesn't exist, try to create it
const docToSave = { ...doc };
delete docToSave._rev; // Remove _rev for new documents
const result = await db.put(docToSave);
return result;
} else {
throw e;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block should be its own function.

const docToSave = { ...doc, _rev: existingDoc._rev };
const result = await db.put(docToSave);
return result;
} catch (e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch will end up catching errors from both the get and the put.
can you please separate them?

maybe create a function getDoc that handles the get an errors from the get and a function putDoc that handles conflicts?


console.log(`Conflict detected, retrying attempt ${attempts}/${maxRetries}`);
// Wait before retrying (exponential backoff)
await new Promise(resolve => setTimeout(resolve, retryDelay * Math.pow(2, attempts - 1)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add a delay, just retry immediately.

}
};

const handleLargeDocument = async (db, doc, maxRetries = 5, retryDelay = 200) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, can you please rewrite this using the reducing retry parameter?


// Try to save the document without attachments
console.log(`Attempting to save ${doc._id} (attempt ${attempts + 1}/${maxRetries})`);
const res = await db.put(docWithoutAttachments);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reuse the putDoc function from the comment above.

console.log(`Successfully saved ${doc._id}, new rev: ${res.rev}`);

// Handle attachments one by one with individual error handling
if (attachments && Object.keys(attachments).length > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract this into a separate function

Comment on lines 101 to 105
const att = attachments[attachmentName];
const contentType = att.content_type || mime.lookup(attachmentName) || 'application/octet-stream';
const attachResult = await db.putAttachment(doc._id, attachmentName, currentDocRev,
att.data, contentType);
currentDocRev = attachResult.rev;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should also be a retryable addDocAttachment function.


console.log(`Large document conflict detected for ${doc._id}, retrying attempt ${attempts}/${maxRetries}`);
// Increase delay progressively
const delay = retryDelay * Math.pow(2, attempts - 1) + Math.random() * 100; // Add jitter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we should use a delay.

}
};

const forceSaveDocument = async (db, doc) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this is not used. What is the purpose of this?

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very nice! Huge improvements! Thank you!

I had just reviewed Nitu's PR for the serverside and I realised that uploading a document without any attachments - and then uploading only the xml - will always force the server to regenerate the model.xml and form.html attachments. This adds extra load and will generate additional conflicts for your code.
So ... I want to ask you if you could try, instead of removing all attachments, only remove the media attachments and keep the functional ones.
I appreciate it!

Comment on lines 41 to 43
if (!latestDoc) {
throw new Error(`Document ${docId} not found during attachment retry`);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the bloat in this catch, you could move this outside of the try/catch, right after you check for retries.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright thanks, let me work on these

try {
const attachments = doc._attachments;
const docWithoutAttachments = { ...doc };
delete docWithoutAttachments._attachments;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save roundabouts on generating attachments on the server, can you please try to preserve the non-media attachments here? These would be the form xml, model.xml and form.html.
Can you please try this and check how it works server-side? We would ideally not see forms getting regenerated when a new media attachment is added.

try {
const attachments = doc._attachments || {};
// Regex for specific functional files: form.xml, model.xml, form.html
const functionalRegex = /^(form|model)\.xml$|^form\.html$/i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the attachment that contains the main form xml would just be called xml. Is there a way that we can include that in this capture regex?

const docToSave = { ...doc, _attachments: functionalAttachments };
if (Object.keys(docToSave._attachments).length === 0) {
delete docToSave._attachments;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little strange to me, where we add this property only to delete it in a validation later. I think it would make more sense to a quick read if we add it conditionally:

Suggested change
}
const docToSave = {
...doc,
_attachments: Object.keys(functionalAttachments).length ? functionalAttachments : undefined
};

}

const existingDoc = await getDoc(db, doc._id);
log.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these logs will get very very noisy if the users upload all forms. I suggest we change most of these to be debug logs.

log.info(`Conflict detected for ${doc._id}, retrying`);
return upsertDoc(db, doc, retries - 1);
} else if (err.status === 413) {
return handleLargeDocument(db, doc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a form with three video attachments, and every time I upload it, I am being told that I need to overwrite changes from the server, even though no changes have been done to the form in config.

However I'm seeing the same behavior when adding media to another form - media that would not trigger the custom code to handle large attachments, and I'm also seeing this on the main branch.

Would you mind creating an issue to track this bug? The warning should only show up when the config and the server are out of sync.

…ex, conditional _attachments assignment, and adjust log levels
Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR is not passing our Sonar linting rules. Could you please have a look and address these issues? https://sonarcloud.io/project/issues?id=medic_cht-conf&pullRequest=724&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true

Thanks!

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!!

@mrjones-plip
Copy link
Contributor

awesome job - thanks for the review @dianabarsan and congrats to @sookwalinga on contributing the code which will now be in CHT Core \o/

Diana - are we ready to merge this then? Any reason to wait?

@dianabarsan
Copy link
Member

Since there are a couple of connected PRs, this one and two in cht-core, I want to make sure than they work well together. It's a manual test.

I'll merge after testing the full interaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add individual attachment upload for forms with large attachments

3 participants