Skip to content

[BUG] R2 cache upload fails with 500 Internal Server Error during deploy (2400 prerendered pages) #1088

@isaacrowntree

Description

@isaacrowntree

Describe the bug

When deploying a large Next.js application with ~2,400 prerendered pages, the R2 incremental cache upload fails with a 500 Internal Server
Error. This occurs even after reducing --cacheChunkSize from the default 50 to 10.

The deploy process:

  1. Next.js build completes successfully (2,423 static pages generated)
  2. OpenNext bundling completes
  3. R2 cache population begins uploading 2,400 objects
  4. Upload fails at ~16% (400/2,400 objects) with HTTP 500

Error:
✘ [ERROR] Error uploading ".open-next/cache/.../activities.cache"
Error: Failed to fetch /accounts/.../r2/buckets/opennext-cache/objects/... - 500: Internal Server Error

Additionally, an esbuild deadlock error appears earlier in the build (though the build continues):
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [chan receive]:
github.com/evanw/esbuild/internal/helpers.(*ThreadSafeWaitGroup).Wait(...)

Steps to reproduce

  1. Have a Next.js 16 app with Partial Prerendering that generates ~2,400+ static pages
  2. Configure R2 incremental cache in wrangler.jsonc
  3. Run opennextjs-cloudflare build && opennextjs-cloudflare deploy --cacheChunkSize 10
  4. Observe R2 upload failing partway through with 500 error

Expected behavior

The R2 cache should upload all 2,400 objects successfully, with proper retry logic for transient API errors.

@opennextjs/cloudflare version

1.15.0

Wrangler version

4.59.3

next info output

Operating System:
    Platform: darwin
    Arch: arm64
    Version: Darwin Kernel Version 25.2.0
    Available memory (MB): 36864
    Available CPU cores: 12
  Binaries:
    Node: 24.11.1
    npm: 11.7.0
  Relevant Packages:
    next: 16.1.4
    react: 19.2.3
    react-dom: 19.2.3
    typescript: 5.9.3
  Next.js Config:
    output: standalone

Additional context

Investigation Findings

Root Cause Analysis

The R2 upload uses wrangler's r2 bulk put command which calls the Cloudflare API (not the S3-compatible API). This has significant
limitations:

  1. Cloudflare API Rate Limit: 1,200 requests per 5 minutes (wrangler uses 1,100 with a buffer)
    • Source: wrangler code at cli.js:270141-270147
    const API_RATE_LIMIT_WINDOWS_MS = 5 * 60 * 1e3;  // 5 minutes
    const API_RATE_LIMIT_REQUESTS = 1200 - 100;      // 1100 requests
  2. No Retry Logic: The fetchR2Objects function in wrangler throws immediately on any non-200/404 response:
    if (response.ok && response.body) {
    return response;
    } else if (response.status === 404) {
    return null;
    } else {
    throw new Error(Failed to fetch ${resource} - ${response.status}: ${response.statusText}););
    }
  3. No Retry in opennextjs: The runWrangler function exits immediately on any non-zero status code with no retry mechanism.
  4. Single Batch Upload: Unlike KV cache (which chunks uploads), R2 sends all 2,400 objects in a single r2 bulk put command. A single file
    failure aborts the entire upload.

Scale Issue

For 2,400 files at the rate limit:

  • Best case: 2,400 / 1,100 = 2.18 rate limit cycles = ~11 minutes minimum
  • With concurrency 10, the queue still counts toward the same rate limit window
  • 500 errors may be Cloudflare API overload rather than actual rate limiting

Cloudflare's Recommendation

Per Cloudflare community forums and docs, wrangler is designed for single-object uploads. For bulk uploads, Cloudflare recommends:

  • rclone with S3-compatible API
  • AWS SDK with R2 S3-compatible endpoint

Feature Request

Consider adding S3-compatible API support for R2 bulk uploads in opennextjs-cloudflare:

  • Uses endpoint: .r2.cloudflarestorage.com
  • R2 API tokens (Access Key ID + Secret Access Key)
  • Bypasses Cloudflare API rate limits
  • Can use parallel uploads with AWS SDK's built-in retry logic

OR (even better), remote bindings would probably fix this.

Workarounds Attempted

  • Reduced --cacheChunkSize from 50 to 10 - still failed
  • The error occurs at ~400/2,400 objects (~16%)

Environment

  • Build environment: Cloudflare Pages (Node.js v24.13.0)
  • @opennextjs/aws version: 3.9.11
  • Cache Components and Partial Prerender (PPR) enabled
  • ~2,400 prerendered pages generating ~2,400 cache files

Sources:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions