feat: Implement queue-based async DB export pipeline#188
Conversation
|
Warning Review limit reached
More reviews will be available in 45 minutes and 56 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (14)
📝 WalkthroughWalkthroughAdds an asynchronous database export feature: a Dashboard API endpoint with per‑day limits, shared BullMQ export queue and email plumbing, a consumer application with a streaming worker that uploads exports to S3‑compatible storage, and supporting storage helper and integration changes. ChangesDatabase Export Pipeline
Sequence DiagramsequenceDiagram
participant Client
participant DashboardAPI as Dashboard API
participant ExportQueue as Export Queue (BullMQ)
participant ConsumerWorker as Consumer Worker
participant MongoDB
participant Storage as S3-Compatible Storage
participant EmailQueue as Email Queue
Client->>DashboardAPI: POST /projects/:projectId/export
DashboardAPI->>DashboardAPI: verify owner + check daily limit
alt Limit exceeded
DashboardAPI->>Client: 429 Too Many Requests
else OK
DashboardAPI->>ExportQueue: enqueue export job
DashboardAPI->>Client: 202 Accepted + remaining usage
ExportQueue->>ConsumerWorker: dispatch job
ConsumerWorker->>MongoDB: cursor() on each collection
loop per collection
MongoDB-->>ConsumerWorker: stream documents
ConsumerWorker->>ConsumerWorker: write JSON to PassThrough
end
ConsumerWorker->>Storage: Upload (multipart) via Upload(stream)
Storage-->>ConsumerWorker: file location
ConsumerWorker->>Storage: generate presigned URL (86400s)
ConsumerWorker->>EmailQueue: enqueue send-export-email (downloadUrl, projectName)
EmailQueue->>Client: sendExportReadyEmail with link
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
packages/common/src/utils/emailService.js (1)
316-321: ⚡ Quick winUse configured sender/reply-to values for export emails.
Hardcoding these fields bypasses environment configuration and can cause inconsistent deliverability across environments.
Proposed change
- const { data, error } = await resend.emails.send({ - from: '"urBackend" <onboarding@resend.dev>', + const { data, error } = await resend.emails.send({ + from: defaultFromAddress, to: to, subject: subject, text: textBody, - replyTo: 'urbackend@apps.bitbros.in', + replyTo: replyToAddress, });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/common/src/utils/emailService.js` around lines 316 - 321, The export email send call currently hardcodes from and replyTo values ("urBackend" <onboarding@resend.dev> and 'urbackend@apps.bitbros.in'); update the send routine (the function that builds and calls the email client—look for the send/export email function in emailService.js where the object with from, to, subject, text, replyTo is passed) to use the configured sender and reply-to values instead (read from the existing configuration/environment variables or the module's mail config constants rather than literal strings) so all environments honor configuration for deliverability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/consumer/Dockerfile`:
- Around line 1-20: The Dockerfile is creating and running the container as
root; modify it to create a non-root user, chown the app directory to that user,
and switch to that user before CMD so the container runs unprivileged: add a
user creation step (e.g., addgroup/adduser or useradd) after the workspace and
package copy steps, run chown -R on /app (the WORKDIR used by RUN npm ci and
subsequent COPYs) to give ownership to the new user, and add a USER instruction
near the end so CMD ["npm","run","start"] runs as that non-root user.
In `@apps/consumer/package.json`:
- Around line 1-15: Add a "dependencies" section to apps/consumer/package.json
declaring the runtime packages the consumer imports: add "dotenv": "^17.2.3",
"bullmq": "^5.70.1", and "`@urbackend/common`": "*" (to match the monorepo
protocol/versions used elsewhere), then run the workspace install to update the
lockfile; update any existing script or import usage if package names differ.
Ensure the "dependencies" key is present alongside "scripts" and uses the exact
versions listed so module resolution works when the consumer workspace is
installed/run in isolation.
In `@apps/consumer/src/workers/export.worker.js`:
- Around line 44-47: The export loop currently ignores writeStream.write(...)
return values in the for await (const doc of cursor) loop (using the local
variable first) which can cause memory bloat; change it to check the boolean
result and, when false, await a Promise that resolves on
writeStream.once('drain') before continuing to apply backpressure. Also stop
calling fs.readFileSync(tempFilePath) before upload; instead stream the temp
file to Supabase (e.g., use fs.createReadStream(tempFilePath) or the Supabase
streaming/file API) so upload uses a stream rather than loading the whole file
into RAM (replace the fs.readFileSync usage and the subsequent
supabase.storage...upload(...) call to accept a stream).
- Around line 69-72: The code currently reads the entire temp file into memory
with fs.readFileSync(tempFilePath) before calling
supabase.storage.from(bucket).upload(storagePath, ...); replace the synchronous
read with a streaming upload by creating a ReadableStream
(fs.createReadStream(tempFilePath)) and passing that stream to
supabase.storage.from(bucket).upload so the file is uploaded without blocking
the event loop or loading the entire file into memory; keep the same
storagePath, bucket and contentType/upsert options and ensure any stream errors
are handled/rejected before completing the worker task (consider switching to
resumable/TUS uploads if exports may be very large).
In `@apps/dashboard-api/src/controllers/dbExport.controller.js`:
- Around line 40-50: The current get + incr sequence is race-prone; replace it
with a single atomic Redis operation (e.g. an EVAL Lua script or a Redis
multi/transaction) that increments the key, sets the 24h expiry when the counter
becomes 1, and returns the new count in one call; then check the returned count
against maxExports and call next(new AppError(...)) if it exceeds the limit
before calling exportQueue.add. Target the existing symbols key,
redis.get/redis.incr, maxExports, and exportQueue.add and ensure the atomic
script does: INCR key, if value == 1 then EXPIRE key 86400, return value.
- Around line 52-54: The response currently returns a message-only payload;
update the res.status(202).json call in the db export controller to follow the
standardized shape { success, data, message } — e.g. return
res.status(202).json({ success: true, data: { usageToday:
`${newCount}/${maxExports}`, newCount, maxExports }, message: "Database export
request received. You will receive an email with a download link shortly." });
ensure you modify the existing return in the export handler (the
res.status(202).json(...) statement) so callers receive success, data, and
message fields.
- Around line 57-58: The controller currently logs the full error but constructs
AppError using err.message which can leak internal/MongoDB details; keep the
console.error for internal diagnostics (including err) but change the AppError
instantiation to return a generic client-safe message (e.g., "Failed to initiate
database export.") instead of err.message, optionally attach an internal error
id or err.code to logs only; update the code around console.error and the new
AppError(...) call so next(new AppError(500, ...)) never includes err.message
while still logging err for debugging (referencing AppError, console.error,
req.params.projectId, next in the db export handler).
In `@packages/common/src/queues/emailQueue.js`:
- Around line 11-30: The worker currently treats unknown job.name values as
silent no-ops; update the handler so that after the known branches
(release-email and send-export-email) it explicitly rejects unsupported job
types by throwing an Error (or returning a failed Promise) that includes the
unrecognized job.name and job.data for observability; modify the function that
processes jobs (the code handling job.name in
packages/common/src/queues/emailQueue.js) to add a final else/throw path that
logs and throws a descriptive error like "Unsupported email job type:
<job.name>" so unknown jobs fail loudly.
In `@packages/common/src/utils/storage.manager.js`:
- Around line 145-147: The unknown-provider branch in getStorage
(storage.manager.js) logs the provider after throwing an Error, making
console.error unreachable; move or duplicate the diagnostic call so the provider
is logged before the exception is thrown (i.e., call console.error("[getStorage]
Unknown storage provider: ", provider) prior to throw new Error("Unknown storage
provider: " + provider)) so the provider is recorded when getStorage encounters
an unsupported provider.
---
Nitpick comments:
In `@packages/common/src/utils/emailService.js`:
- Around line 316-321: The export email send call currently hardcodes from and
replyTo values ("urBackend" <onboarding@resend.dev> and
'urbackend@apps.bitbros.in'); update the send routine (the function that builds
and calls the email client—look for the send/export email function in
emailService.js where the object with from, to, subject, text, replyTo is
passed) to use the configured sender and reply-to values instead (read from the
existing configuration/environment variables or the module's mail config
constants rather than literal strings) so all environments honor configuration
for deliverability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5e4b2562-ec50-430a-9cee-645134229570
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (14)
apps/consumer/Dockerfileapps/consumer/package.jsonapps/consumer/src/index.jsapps/consumer/src/workers/export.worker.jsapps/dashboard-api/package.jsonapps/dashboard-api/src/app.jsapps/dashboard-api/src/controllers/dbExport.controller.jsapps/dashboard-api/src/routes/projects.jspackage.jsonpackages/common/src/index.jspackages/common/src/queues/emailQueue.jspackages/common/src/queues/exportQueue.jspackages/common/src/utils/emailService.jspackages/common/src/utils/storage.manager.js
💤 Files with no reviewable changes (1)
- apps/dashboard-api/src/app.js
| { | ||
| "name": "consumer", | ||
| "version": "1.0.0", | ||
| "description": "", | ||
| "main": "src/app.js", | ||
| "scripts": { | ||
| "dev": "node src/index.js", | ||
| "start": "node src/index.js", | ||
| "test": "jest --testPathPatterns=src/" | ||
| }, | ||
| "keywords": [], | ||
| "author": "", | ||
| "license": "ISC", | ||
| "type": "commonjs" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show consumer manifest
echo "---- apps/consumer/package.json ----"
cat -n apps/consumer/package.json
# Identify import/require usage in consumer src
echo "---- imports in apps/consumer/src ----"
rg -n --no-heading "(require\\(|from\\s+['\"]|import\\s+['\"])\\s*('dotenv'|\"dotenv\"|dotenv)|\\bbullmq\\b|`@urbackend/common`" apps/consumer/src || true
# Show consumer src key files referenced
echo "---- list apps/consumer/src ----"
find apps/consumer/src -maxdepth 2 -type f -print
# Inspect monorepo root package.json for workspace/hoisting rules
echo "---- root package.json ----"
ls -1
if [ -f package.json ]; then cat -n package.json; fi
# If a package manager config exists (pnpm/yarn/npm), show workspace relevant sections
for f in pnpm-workspace.yaml yarn.lock package-lock.json pnpm-lock.yaml .npmrc .yarnrc.yml; do
if [ -e "$f" ]; then
echo "---- $f ----"
head -n 80 "$f" || true
fi
doneRepository: geturbackend/urBackend
Length of output: 4313
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show consumer manifest
echo "---- apps/consumer/package.json ----"
cat -n apps/consumer/package.json
# Identify import/require usage in consumer src
echo "---- imports in apps/consumer/src ----"
rg -n --no-heading "(require\\(|from\\s+['\"]|import\\s+['\"])\\s*('dotenv'|\"dotenv\"|dotenv)|\\bbullmq\\b|`@urbackend/common`" apps/consumer/src || true
# Show consumer src key files referenced
echo "---- list apps/consumer/src ----"
find apps/consumer/src -maxdepth 2 -type f -print
# Inspect monorepo root package.json for workspace/hoisting rules
echo "---- root package.json ----"
ls -1
if [ -f package.json ]; then cat -n package.json; fi
# If a package manager config exists (pnpm/yarn/npm), show workspace relevant sections
for f in pnpm-workspace.yaml yarn.lock package-lock.json pnpm-lock.yaml .npmrc .yarnrc.yml; do
if [ -e "$f" ]; then
echo "---- $f ----"
head -n 80 "$f" || true
fi
doneRepository: geturbackend/urBackend
Length of output: 4313
Declare consumer runtime dependencies in apps/consumer/package.json.
apps/consumer imports dotenv, @urbackend/common, and bullmq, but its package.json declares none (and package-lock.json shows no dependencies recorded for apps/consumer). This can break module resolution when installing/running the consumer workspace in isolation. Match the versions/protocol used by the other workspaces in this monorepo (e.g., dotenv ^17.2.3, bullmq ^5.70.1, @urbackend/common *).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/consumer/package.json` around lines 1 - 15, Add a "dependencies" section
to apps/consumer/package.json declaring the runtime packages the consumer
imports: add "dotenv": "^17.2.3", "bullmq": "^5.70.1", and "`@urbackend/common`":
"*" (to match the monorepo protocol/versions used elsewhere), then run the
workspace install to update the lockfile; update any existing script or import
usage if package names differ. Ensure the "dependencies" key is present
alongside "scripts" and uses the exact versions listed so module resolution
works when the consumer workspace is installed/run in isolation.
| console.error(`[Dashboard API] Error handling export request for project ${req.params.projectId}:`, err); | ||
| return next(new AppError(500, err.message || "Failed to initiate database export.")); |
There was a problem hiding this comment.
Do not expose raw internal error messages to clients.
Propagating err.message can leak MongoDB/internal details in API responses.
Proposed change
- return next(new AppError(500, err.message || "Failed to initiate database export."));
+ return next(new AppError(500, "Failed to initiate database export."));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| console.error(`[Dashboard API] Error handling export request for project ${req.params.projectId}:`, err); | |
| return next(new AppError(500, err.message || "Failed to initiate database export.")); | |
| console.error(`[Dashboard API] Error handling export request for project ${req.params.projectId}:`, err); | |
| return next(new AppError(500, "Failed to initiate database export.")); |
🧰 Tools
🪛 GitHub Check: CodeQL
[failure] 57-57: Use of externally-controlled format string
Format string depends on a user-provided value.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/dashboard-api/src/controllers/dbExport.controller.js` around lines 57 -
58, The controller currently logs the full error but constructs AppError using
err.message which can leak internal/MongoDB details; keep the console.error for
internal diagnostics (including err) but change the AppError instantiation to
return a generic client-safe message (e.g., "Failed to initiate database
export.") instead of err.message, optionally attach an internal error id or
err.code to logs only; update the code around console.error and the new
AppError(...) call so next(new AppError(500, ...)) never includes err.message
while still logging err for debugging (referencing AppError, console.error,
req.params.projectId, next in the db export handler).
|
Ahh, these are quick fixes. Also, @yash-pouranik I wanted to have a discussion on how to proceed further with the storage bucket streaming as supabase doesn't do a good job with streaming. From my homework AWS S3 however has great streaming capabilities. Do I maybe, check for storage provider and accordingly use an appropriate upload pipeline. (local and upload / direct stream) In the current setup, I optimized it by streaming to a temp file which wouldn't use up memory as it doesn't load up the entire file on memory and stream to it. (Streams to file on disk) |
|
really sry for too much delay. Will join at around 4 or 5 pm |
|
Sure, no worries :) |
|
Hi @yash-pouranik. Could we have a quick chat on how to proceed further. There are a few things we should get sorted out. |
|
Really sry, i literally forgot about this. |
|
So u mean current approach is, creating a file on disc and then streaming all the data to that file only?
So for projects who have S3 or R2 Configured? We provide the same streaming capabilities. as we have discussed in the issue that is creating file on cloud and streaming on that only.
What say? |
|
Tip For best results, initiate chat on the files or code changes.
✅ Yes, current approach = stream to disk first, then uploadThat's exactly what 🎯 Recommended: One unified AWS S3-compatible streaming pipeline for all 3 providersHere's the good news — you don't need separate pipelines. All three providers (S3, R2, Supabase) speak the S3 protocol and can use Supabase Storage is now officially an S3-compatible storage provider and supports multipart uploads via the S3 protocol. Multipart uploads split the file into smaller parts and upload them in parallel; this allows retrying individual parts on network issues and is the preferable method for server-side uploads when you want to maximize upload speed. The AWS SDK's managed upload intelligently detects when a stream can be split into multiple parts, provides more robust operations (individual parts can be retried separately), allows parallel uploads, and — most importantly — does not need to know the full size of the stream in advance. How to configure each providerconst { S3Client } = require("`@aws-sdk/client-s3`");
const { Upload } = require("`@aws-sdk/lib-storage`");
function getS3Client(project) {
const { storageProvider, storageConfig } = project;
if (storageProvider === "s3") {
return new S3Client({
region: storageConfig.region,
credentials: {
accessKeyId: storageConfig.accessKeyId,
secretAccessKey: storageConfig.secretAccessKey,
},
});
}
if (storageProvider === "r2") {
return new S3Client({
region: "auto",
endpoint: `https://${storageConfig.accountId}.r2.cloudflarestorage.com`,
credentials: {
accessKeyId: storageConfig.accessKeyId,
secretAccessKey: storageConfig.secretAccessKey,
},
});
}
if (storageProvider === "supabase") {
// Generate S3 access keys from: Supabase Dashboard → Storage → Settings → S3 Access Keys
return new S3Client({
region: storageConfig.region ?? "ap-south-1", // any valid region works
endpoint: `${storageConfig.projectUrl}/storage/v1/s3`,
credentials: {
accessKeyId: storageConfig.s3AccessKeyId,
secretAccessKey: storageConfig.s3SecretAccessKey,
},
forcePathStyle: true,
});
}
throw new Error(`Unsupported storage provider: ${storageProvider}`);
}Streaming export pipeline (no temp file needed)const { PassThrough } = require("stream");
const { Upload } = require("`@aws-sdk/lib-storage`");
async function streamExportToStorage(project, mongooseModel) {
const s3Client = getS3Client(project);
const passThrough = new PassThrough();
// Upload starts immediately, consuming chunks as they arrive
const upload = new Upload({
client: s3Client,
params: {
Bucket: project.storageBucket,
Key: `exports/db_export_${Date.now()}.json`,
Body: passThrough,
ContentType: "application/json",
},
queueSize: 4, // concurrent parts
partSize: 5 * 1024 * 1024, // 5 MB minimum per S3 spec
});
// Stream Mongoose cursor into PassThrough
const cursor = mongooseModel.find().lean().cursor();
passThrough.write("[");
let first = true;
for await (const doc of cursor) {
if (!first) passThrough.write(",");
passThrough.write(JSON.stringify(doc));
first = false;
}
passThrough.write("]");
passThrough.end();
const result = await upload.done();
return result.Key;
}Summary of the approach
To authenticate with Supabase S3, you can use standard This eliminates the temp file entirely, avoids ephemeral disk pressure on Render's free tier, and gives you one unified upload path for all three providers. The only thing to store per-project in config is the S3 credentials variant (Supabase's S3 keys are separate from its API key). 🐰 ✨ |
|
I went through the native Supabase Storage API and found that it wasn't streaming compatible. I'll take a look at Supabase's S3 compatible gateway now. If it provides a way to stream directly, I'll make the changes and update the PR. Sounds good ? |
|
yeah sure, |
|
Awsm, I'll make the changes and get back to you soon :) |
|
bro Waiting... @renganathc |
4b50866 to
0e71cd0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
apps/consumer/src/workers/export.worker.js (1)
62-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle backpressure when writing to PassThrough stream.
The cursor loop ignores
write()return values. For large exports (100k+ documents per issue requirements), the PassThrough buffer can grow unbounded if the S3 Upload consumes slower than the cursor produces.Proposed fix with backpressure handling
+const { once } = require('events'); // ... at top of file for await (const doc of cursor) { - if (!first) passThrough.write(',\n'); - passThrough.write(` ${JSON.stringify(doc)}`); + if (!first) { + if (!passThrough.write(',\n')) await once(passThrough, 'drain'); + } + if (!passThrough.write(` ${JSON.stringify(doc)}`)) { + await once(passThrough, 'drain'); + } first = false; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/consumer/src/workers/export.worker.js` around lines 62 - 66, The cursor loop writing to the PassThrough stream ignores write() return values and can overflow the buffer; update the loop that iterates over cursor (the for await (const doc of cursor) block) to check the boolean result of passThrough.write(...) and when it returns false await the 'drain' event before continuing (e.g., await new Promise(resolve => passThrough.once('drain', resolve))). Ensure this backpressure handling is applied for the initial '[' and subsequent comma + JSON writes so the export.worker.js passThrough stream never grows unbounded.packages/common/src/utils/storage.manager.js (1)
145-146:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUnreachable
console.errorafterthrow.The
throwstatement on line 145 executes before theconsole.erroron line 146, making the log unreachable. Swap the order.Proposed fix
} else { + console.error("[getStorage] Unknown storage provider: ", provider); throw new Error("Unknown storage provider: " + provider); - console.error("[getStorage] Unknown storage provider: ", provider); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/common/src/utils/storage.manager.js` around lines 145 - 146, The console.error after the throw is unreachable; in the getStorage flow log the unknown provider before raising the exception by moving the console.error("[getStorage] Unknown storage provider: ", provider) to precede the throw new Error("Unknown storage provider: " + provider) (i.e., log the provider inside the getStorage branch then throw), so the runtime will emit the diagnostic before the exception is thrown.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/common/src/utils/storage.manager.js`:
- Around line 288-296: getS3CompatibleStorage() is setting endpoint to
process.env.SUPABASE_URL (e.g. https://<project>.supabase.co) which is wrong for
Supabase’s S3 API; change the endpoint to the Supabase Storage S3 endpoint
(https://<project-ref>.storage.supabase.co/storage/v1/s3) instead. Update the
logic in getS3CompatibleStorage() to derive or use the correct storage host
(either build from SUPABASE_PROJECT_REF or parse SUPABASE_URL) and ensure the
endpoint includes the /storage/v1/s3 path while keeping region, forcePathStyle
and credentials usage intact so AWS SDK S3 calls hit the internal storage API.
---
Duplicate comments:
In `@apps/consumer/src/workers/export.worker.js`:
- Around line 62-66: The cursor loop writing to the PassThrough stream ignores
write() return values and can overflow the buffer; update the loop that iterates
over cursor (the for await (const doc of cursor) block) to check the boolean
result of passThrough.write(...) and when it returns false await the 'drain'
event before continuing (e.g., await new Promise(resolve =>
passThrough.once('drain', resolve))). Ensure this backpressure handling is
applied for the initial '[' and subsequent comma + JSON writes so the
export.worker.js passThrough stream never grows unbounded.
In `@packages/common/src/utils/storage.manager.js`:
- Around line 145-146: The console.error after the throw is unreachable; in the
getStorage flow log the unknown provider before raising the exception by moving
the console.error("[getStorage] Unknown storage provider: ", provider) to
precede the throw new Error("Unknown storage provider: " + provider) (i.e., log
the provider inside the getStorage branch then throw), so the runtime will emit
the diagnostic before the exception is thrown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 26a627e0-1a6b-4792-9e8c-9d60ea97a3d2
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (4)
apps/consumer/src/workers/export.worker.jspackage.jsonpackages/common/src/index.jspackages/common/src/utils/storage.manager.js
🚧 Files skipped from review as they are similar to previous changes (1)
- package.json
| const s3Client = new S3Client({ | ||
| region: "auto", | ||
| endpoint: process.env.SUPABASE_URL, | ||
| forcePathStyle: true, | ||
| credentials: { | ||
| accessKeyId: process.env.SUPABASE_PUBLIC_KEY, | ||
| secretAccessKey: process.env.SUPABASE_SECRET_KEY | ||
| } | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Supabase S3 compatible API endpoint format storage v1 s3
💡 Result:
Supabase Storage “S3 compatible API endpoint” format (Storage S3 protocol) is: https://.storage.supabase.co/storage/v1/s3 The path portion is exactly “/storage/v1/s3” and is described as the S3 protocol endpoint exposed by Storage [1][2]. Supabase’s S3 authentication docs show the full example endpoint including the “storage.supabase.co” hostname and “/storage/v1/s3” path [3][4]. Note: This is different from the Supabase Storage REST API base URL, which is: https://.supabase.co/storage/v1/ and is used for the REST Storage API [5].
Citations:
- 1: https://supabase.com/docs/guides/self-hosting/self-hosted-s3
- 2: https://github.com/supabase/supabase/blob/master/apps/docs/content/guides/self-hosting/self-hosted-s3.mdx
- 3: https://github.com/supabase/supabase/blob/master/apps/docs/content/guides/storage/s3/authentication.mdx
- 4: https://supabase.com/docs/guides/storage/s3/authentication
- 5: https://supabase-supabase.mintlify.app/api/rest/storage
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the relevant portion of the file around the flagged lines
sed -n '240,330p' packages/common/src/utils/storage.manager.js
# Find any other references to SUPABASE_URL or the s3 endpoint logic
rg -n "SUPABASE_URL|storage/v1/s3|S3Client|getS3CompatibleStorage" packages/common/src/utils/storage.manager.js packages/common/src/utils -S || true
# If there are env var docs elsewhere, try to locate them
rg -n "SUPABASE_PUBLIC_KEY|SUPABASE_SECRET_KEY|SUPABASE_URL" -S . || trueRepository: geturbackend/urBackend
Length of output: 6282
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect top of storage.manager.js to see how SUPABASE_URL is treated
sed -n '1,120p' packages/common/src/utils/storage.manager.js
# Inspect internal S3 gateway block a bit more fully
sed -n '250,430p' packages/common/src/utils/storage.manager.js
# Check where SUPABASE_URL is provided to Supabase clients (apps)
sed -n '1,120p' apps/public-api/src/app.js
sed -n '70,140p' apps/dashboard-api/src/app.js
# Search for any documentation or env validation about SUPABASE_URL
rg -n "SUPABASE_URL" -S . || true
rg -n "SUPABASE_PUBLIC_KEY|SUPABASE_SECRET_KEY" -S . || true
# Look for storage S3 endpoint construction patterns in repo
rg -n "storage/v1/s3|storage\\.supabase\\.co|s3Endpoint" -S . || trueRepository: geturbackend/urBackend
Length of output: 17176
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the env defaults/any constants at the top of storage.manager.js
sed -n '1,80p' packages/common/src/utils/storage.manager.js
# Show internal supabase block fully (around internal S3 client creation)
sed -n '260,360p' packages/common/src/utils/storage.manager.js
# Show the input validation for s3Endpoint (external supabase via s3 gateway config)
sed -n '360,480p' packages/common/src/utils/input.validation.js
# Show the UI hints/placeholder for s3Endpoint (to infer expected format)
sed -n '880,1100p' apps/web-dashboard/src/pages/ProjectSettings.jsxRepository: geturbackend/urBackend
Length of output: 24488
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the BYOS storage config patch route
rg -n "byod-config|resources\\.storage|storageProvider\"|storageUrl|storageKey" apps packages -S
# Inspect the storage routes for where config is created/normalized
fd -e js -e ts routes apps | head -n 50 || true
fd byod-config -t f . || true
fd storage -t f apps packages/common/src/utils || true
# Specifically open the project storage config routes (best-effort by filename)
rg -n "byod-config" apps -SRepository: geturbackend/urBackend
Length of output: 9884
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show how external BYOS storage config is persisted (storageUrl/storageKey -> resources.storage.config)
sed -n '470,610p' apps/dashboard-api/src/controllers/project.controller.js
# Also show the updateExternalConfig handler call site (if it does normalization)
sed -n '600,710p' apps/dashboard-api/src/controllers/project.controller.js
# Locate the validator usage for byos storage config
rg -n "updateExternalConfig|storageUrl|storageKey|storageProvider|s3Endpoint" apps/dashboard-api/src/controllers/project.controller.js apps/dashboard-api/src/middlewares apps/dashboard-api/src/routes -S
# Find env examples / docs for SUPABASE_URL and internal storage
ls -la
fd -a -t f ".env*example" . || true
fd -a -t f "README*|*.md" . | head -n 50
rg -n "SUPABASE_URL|SUPABASE_PUBLIC_KEY|SUPABASE_SECRET_KEY" -S . || trueRepository: geturbackend/urBackend
Length of output: 10869
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect env example for SUPABASE_URL / internal storage expectations
rg -n "SUPABASE_URL|SUPABASE_PUBLIC_KEY|SUPABASE_SECRET_KEY" .env.example
sed -n '1,120p' .env.example
# Search for any mention of /storage/v1/s3 in repo
rg -n "/storage/v1/s3|storage/v1/s3|storage\\.supabase\\.co|\\.storage\\.supabase\\.co" -S .
# Inspect storage.manager.js earlier sections that parse resources.storage.config for upload/presign/head object
sed -n '80,220p' packages/common/src/utils/storage.manager.js
sed -n '220,320p' packages/common/src/utils/storage.manager.js
sed -n '320,430p' packages/common/src/utils/storage.manager.js
# Inspect input.validation storageUrl/s3Endpoint expectations more directly
sed -n '360,470p' packages/common/src/utils/input.validation.js
# Inspect storage controllers tests that might construct endpoints
rg -n "storageUrl|s3Endpoint|SUPABASE_URL|storage/v1/s3" apps -SRepository: geturbackend/urBackend
Length of output: 2407
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Where getS3CompatibleStorage is used
rg -n "getS3CompatibleStorage\\(" packages/common/src/utils/storage.manager.js packages/common/src -S
# Show internal Supabase S3 client creation block with line context
sed -n '260,320p' packages/common/src/utils/storage.manager.js
# Show verifyUploadedFile / head object usage context (where returned s3Client is used)
sed -n '320,420p' packages/common/src/utils/storage.manager.jsRepository: geturbackend/urBackend
Length of output: 3521
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '100,260p' packages/common/src/utils/storage.manager.js
sed -n '260,380p' packages/common/src/utils/storage.manager.jsRepository: geturbackend/urBackend
Length of output: 9051
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "getS3CompatibleStorage\\(" -S apps packagesRepository: geturbackend/urBackend
Length of output: 278
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,160p' apps/consumer/src/workers/export.worker.js
# Also search for any other S3 usage in that worker
rg -n "HeadObjectCommand|ListObjects|GetObject|verifyUploadedFile|getS3CompatibleStorage|SUPABASE_URL|storage/v1/s3" apps/consumer/src/workers/export.worker.js -SRepository: geturbackend/urBackend
Length of output: 4114
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "storage\\.supabase\\.co|storage/v1/s3|/storage/v1/s3|supabase\\.co/storage/v1" -S . || trueRepository: geturbackend/urBackend
Length of output: 615
Fix internal Supabase S3 endpoint for AWS SDK.
getS3CompatibleStorage() sets endpoint: process.env.SUPABASE_URL (e.g., https://<project>.supabase.co), but Supabase’s S3-compatible endpoint must be https://<project-ref>.storage.supabase.co/storage/v1/s3 (including the storage.* host and /storage/v1/s3 path). With the current value, the AWS SDK calls will hit the wrong URL, breaking internal S3 operations like the export worker upload + signed download URL. [packages/common/src/utils/storage.manager.js:288-296]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/common/src/utils/storage.manager.js` around lines 288 - 296,
getS3CompatibleStorage() is setting endpoint to process.env.SUPABASE_URL (e.g.
https://<project>.supabase.co) which is wrong for Supabase’s S3 API; change the
endpoint to the Supabase Storage S3 endpoint
(https://<project-ref>.storage.supabase.co/storage/v1/s3) instead. Update the
logic in getS3CompatibleStorage() to derive or use the correct storage host
(either build from SUPABASE_PROJECT_REF or parse SUPABASE_URL) and ensure the
endpoint includes the /storage/v1/s3 path while keeping region, forcePathStyle
and credentials usage intact so AWS SDK S3 calls hit the internal storage API.
0e71cd0 to
414fe0a
Compare
|
Hello @yash-pouranik. Sorry for the delay. I was travelling. I've made the changes and the pass through works as intended. However, there are a few things I wanted to discuss regarding the supabase S3 gateway I worked on. Currently, the entire project uses the default supabase SDK for uploads. So, I couldn't use the getProject() fn to return a client and instead created an S3 based one to return an S3 client. The problem with this is, the user has to register with the S3 config (url, access key and secret key) instead of the usual config while using BYOS. This means we would have to update our entire infrastructure to use S3. (This is related to CodeRabbit flag as well. It is pointing out the ambiguity with process.env.SUPABASE_URL) My suggestion (for a quick fix) would be to GET both S3 configs and supabase configs while integrating BYOS so that we would have both and can use either the S3 gateway or the default supabase SDK for uploads. In the future we could completely migrate to S3 based uploads to avoid any inconsistencies. What do you think ? |
|
hmm got the issue. Right now whole urBackend storage infra for Supabase is based on the normal Supabase SDK flow, while this export pipeline uses the S3-compatible gateway for proper streaming support. So instead of migrating the entire storage architecture in this PR itself, lets keep the scope limited for now. For this PR:
Later we can properly migrate the complete storage layer towards unified S3-compatible uploads for all providers including Supabase. |
|
And I have already planned to migrate the defualt storage to R2 only, and BYOS to R2, S3 and supabase |
|
just 1 remaining, I mentioned there. |
|
You mean the res structure. That was already fixed @yash-pouranik |
|
Oh yes really sry, thanks for pointing. |
|
please list the env vars needed for consumer service |
|
also |
|
No new env vars were added. I've only worked with the existing ones. Also, this is not serverless. It's a long running process. The worker keeps running as long as the server is turned on. Its connected to our redis server, so any new job in the queue is reflected and performed in the worker. Just deploy it alongside the existing servers @yash-pouranik. Try npm run dev at the root and take a look at the workflow locally if you'd like. |
|
yes just pulling it on local |
|
bro how to trigger export?? |
|
Tip For best results, initiate chat on the files or code changes.
To answer your question: there is no UI button yet — this PR only implements the backend API endpoint. The export is currently API-only and can be triggered manually via curl or Postman: POST /api/projects/:projectId/export
Authorization: Bearer <your-auth-token>Example with curl: curl -X POST https://<your-dashboard-api>/api/projects/<projectId>/export \
-H "Authorization: Bearer <your-token>"On success you'll get a
ʕ •ᴥ•ʔ❓ |
|
I've only setup the api endpoints. UI still needs to be done @yash-pouranik. Testing has to be done with postman or the CURL commands like @coderabbitai mentioned. |
|
Tip For best results, initiate chat on the files or code changes. The first PR description image shows the Supabase Storage bucket ( The second PR description image shows an email titled "Export Ready: dummy lil' project" from
Would you like me to open a follow-up GitHub issue to track adding the "Export Database" trigger button to the web dashboard? I can create it now and link it back to this PR. |
|
@yash-pouranik Would you like me to setup a simple export workflow with a 'collection name' field in the frontend? |
|
What I suggest is, If u have time for next half hour? |
|
Sure, I can get it done in under an hour 👍 |
|
ok |
|
@coderabbitai |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
gh issue create \
--repo geturbackend/urBackend \
--title "feat: Add UI trigger for database export (Export Database button)" \
--assignee "renganathc" \
--body "## Overview
This issue tracks the frontend integration for the database export feature implemented in `#188`.
The backend API endpoint \`POST /api/projects/:projectId/export\` is now live and working (verified end-to-end — exports land in storage and the owner receives a download email). However, there is currently no UI to trigger an export; users must use curl/Postman directly.
## What needs to be done
- Add an **\"Export Database\"** button on the project settings or dashboard page (in \`apps/web-dashboard\`)
- The button should call \`POST /api/projects/:projectId/export\` with the authenticated user's token
- Optionally allow the user to specify a **collection name** to export a single collection (requires backend support for scoped export — coordinate with backend if needed)
- Show appropriate feedback to the user (e.g., \"Export started — you will receive an email when it's ready\")
- Handle error states: rate limit exceeded (429), unauthorized (403), not found (404)
## Acceptance criteria
- [ ] Export button is accessible from the project dashboard/settings page
- [ ] Clicking the button triggers the export API and shows a success/pending message
- [ ] Error responses (429, 403, 404) are surfaced clearly to the user
- [ ] No new lint warnings or console errors
## References
- Backend implementation: `#188`
- Original issue: `#176`
- Requested by: `@yash-pouranik`
"Length of output: 1675
➡️ The issue covers:
(╯°□°)╯ 🐇 ✨ |
|
list env vars used in consumer service |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== Consumer service env vars ==="
rg 'process\.env\.' apps/consumer/ -n
echo ""
echo "=== Common package env vars (redis, storage, email, queues) ==="
rg 'process\.env\.' packages/common/src/ -nLength of output: 3113 🏁 Script executed: #!/bin/bash
echo "=== validateEnv ==="
rg -n 'validateEnv' packages/common/src/ apps/consumer/src/ -A2 -B2
fd 'validateEnv' --exec cat {}
echo ""
echo "=== common index validateEnv export ==="
rg 'validateEnv' packages/common/src/index.js -A2Length of output: 2752
🔴 Required (validated at startup via
|
| Variable | Purpose |
|---|---|
MONGO_URL |
MongoDB connection string |
REDIS_URL |
Redis connection for BullMQ queues |
PORT |
validateEnv() but the consumer doesn't bind an HTTP server — this will cause startup failure if not set. Either add PORT to .env or remove it from validateEnv's required list for the consumer. |
🟡 Required at runtime (used by common modules)
| Variable | Purpose |
|---|---|
SUPABASE_URL |
Internal Supabase project URL — used as the S3 endpoint for the streaming upload pipeline |
SUPABASE_PUBLIC_KEY |
Supabase S3 access key ID (from Storage Settings) |
SUPABASE_SECRET_KEY |
Supabase S3 secret access key (from Storage Settings) |
RESEND_API_KEY |
Resend API key for sending export-ready emails |
RESEND_API_KEY_2 |
Fallback Resend API key (optional, takes priority over RESEND_API_KEY if set) |
ENCRYPTION_KEY |
Used to decrypt BYOS storage configs for S3/R2/Supabase projects |
🟢 Optional
| Variable | Default | Purpose |
|---|---|---|
EMAIL_FROM |
urBackend <urbackend@apps.bitbros.in> |
Sender address for export-ready emails |
EMAIL_REPLY_TO |
urbackend@apps.bitbros.in |
Reply-to address on outgoing emails |
NODE_ENV |
undefined |
Skips validateEnv() and some queue guards when set to test |
Note on
PORT: ThevalidateEnv()function was designed for the API servers. The consumer is a queue worker with no HTTP listener, soPORTisn't meaningful here. Consider either setting a dummy value (PORT=3003) in.envor creating a separatevalidateEnvcall in the consumer that only checksMONGO_URLandREDIS_URL.
|
i dont think public key and secret key required, only the anon key is needed |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== storage.manager.js ==="
cat packages/common/src/utils/storage.manager.js
echo ""
echo "=== All SUPABASE_ env vars referenced in the codebase ==="
rg 'process\.env\.SUPABASE' -n --type=jsLength of output: 15144
To clarify —
From if (!process.env.SUPABASE_URL || !process.env.SUPABASE_PUBLIC_KEY || !process.env.SUPABASE_SECRET_KEY) {
throw new Error("Internal Supabase S3 configuration is incomplete");
}
const s3Client = new S3Client({
endpoint: process.env.SUPABASE_URL,
credentials: {
accessKeyId: process.env.SUPABASE_PUBLIC_KEY, // ← S3-specific key, NOT anon key
secretAccessKey: process.env.SUPABASE_SECRET_KEY
}
});The S3 credentials are generated separately from: Supabase Dashboard → Storage → Settings → S3 Access Keys. The anon key cannot be used here — Supabase's S3 gateway requires its own dedicated access key pair. So |
|
@renganathc |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== export.worker.js ==="
cat apps/consumer/src/workers/export.worker.js
echo ""
echo "=== storage.manager.js (getS3CompatibleStorage section) ==="
grep -n "getS3CompatibleStorage\|supabase\|SUPABASE\|storageProvider" packages/common/src/utils/storage.manager.jsLength of output: 8556
I also owe a correction from my previous env var list —
Apologies for the confusion in the earlier message. 🐰 |
🚀 Pull Request Description
Fixes #176
Implemented asynchronous database exports using BullMQ workers and added a dedicated consumer service for processing export jobs and sending download emails.
One small deviation from the initial plan is that exports are first written to a temporary local file before being uploaded to storage, instead of directly streaming to cloud storage, since Supabase Storage does not properly support the intended streaming approach. Apart from this, the overall queue based export flow follows the planned design.
🛠️ Type of Change
🧪 Testing & Validation
Backend Verification:
npm testin thebackend/directory and all tests passed.Frontend Verification:
npm run lintin thefrontend/directory.📸 Screenshots / Recordings
✅ Checklist
Built with ❤️ for urBackend.
Summary by CodeRabbit
New Features
Chores