Skip to content

Conversation

wpinho-branch
Copy link
Collaborator

@wpinho-branch wpinho-branch commented Sep 4, 2025

Reference

EMT-2369 -- Eliminate Thread.sleep() in Network Layer with Coroutine-based Retry Mechanism.

Description

This PR addresses a critical performance issue in the Branch SDK's network layer where Thread.sleep() calls were blocking valuable threads during retry operations, leading to thread pool exhaustion and inefficient resource usage.

Problem Solved:

  • 6 instances of Thread.sleep() in BranchRemoteInterfaceUrlConnection.java (lines 89-95, 124-125) were blocking threads during network retries
  • Linear retry mechanism without exponential backoff
  • No cancellation support for ongoing network operations
  • Thread pool inefficiency due to blocked threads during retry delays

Solution Implemented:

  • New Coroutine-based Network Layer: BranchAsyncNetworkLayer.kt replaces blocking Thread.sleep() with non-blocking delay() coroutines
  • Exponential Backoff with Jitter: Implements proper retry strategy to prevent thundering herd problems
  • Structured Concurrency: Provides cancellation support and proper resource management
  • Backward Compatibility: BranchRemoteInterfaceUrlConnectionAsync.kt adapter maintains 100% API compatibility
  • Feature Flag Integration: Existing BranchRemoteInterfaceUrlConnection.java routes to async implementation by default, with fallback support
  • Enhanced Logging: Comprehensive BranchLogger integration for debugging and monitoring

Key Improvements:

  • Thread Efficiency: No more blocked threads during retry delays
  • Better Network Behavior: Exponential backoff with jitter reduces server load
  • Cancellation Support: Proper cleanup of ongoing operations
  • Performance Monitoring: Detailed logging for debugging and performance analysis
  • Zero Breaking Changes: Maintains full backward compatibility

Testing Instructions

  1. Log Verification: Look for these key log messages indicating Thread.sleep() elimination:

    I/BranchLogger: BranchRemoteInterfaceUrlConnection: Initialized with async implementation enabled (Thread.sleep eliminated)
    D/BranchLogger: BranchAsyncNetworkLayer: Using coroutine delay of 1247ms (eliminates Thread.sleep!)
    
  2. Retry Behavior Test: Simulate network errors (500+ response codes) and verify:

    • Exponential backoff delays are calculated and logged
    • No thread blocking during retry delays
    • Proper cancellation support
  3. Backward Compatibility Test: Verify existing Branch SDK functionality remains unchanged

    • Link generation
    • Deep link handling
    • Analytics tracking

Risk Assessment LOW

Rationale for LOW risk:

  • Zero Breaking Changes: Solution maintains 100% API compatibility through adapter pattern
  • Feature Flag Protection: Async implementation can be disabled if issues arise
  • Fallback Mechanism: Legacy implementation remains available as backup
  • Extensive Logging: Comprehensive monitoring and debugging capabilities
  • SOLID Principles: Clean architecture following dependency injection and single responsibility
  • Isolated Changes: New functionality is contained in separate Kotlin files

Mitigation Strategies:

  • Feature flag allows instant rollback to legacy implementation

  • Comprehensive logging enables quick issue identification

  • Adapter pattern ensures no changes to existing call sites

  • I, the PR creator, have tested — integration, unit, or otherwise — this code.

Reviewer Checklist (To be checked off by the reviewer only)

  • JIRA Ticket is referenced in PR title.
  • Correctness & Style
    • Conforms to AOSP Style Guides
    • Mission critical pieces are documented in code and out of code as needed.
  • Unit Tests reviewed and test issue sufficiently.
  • Functionality was reviewed in QA independently by another engineer on the team.

cc @BranchMetrics/saas-sdk-devs for visibility.

Logs:
D BranchRemoteInterfaceUrlConnection: Using async implementation for POST (no Thread.sleep)
D BranchRemoteInterfaceUrlConnectionAsync: Executing POST request using async layer (no Thread.sleep)
D BranchAsyncNetworkLayer: Starting POST request to https://api2.branch.io/v1/open with payload size 1518 chars
D BranchAsyncNetworkLayer: Starting request with retry limit 3
V BranchAsyncNetworkLayer: Executing request attempt #0
D hide(ime(), fromIme=false)
I io.branch.branchandroidtestbed:3a1d9e7a: onCancelled at PHASE_CLIENT_ALREADY_HIDDEN
I Compiler allocated 5111KB to compile void android.view.ViewRootImpl.performTraversals()
V Branch Networking Success
URL: https://api2.branch.io/v1/open
Response Code: 200
Response Message: OK
Retry number: 0
requestId: 1bd6e249-4f3a-4558-920a-d34b13ab6adc-2025081421

- Introduced BranchAsyncNetworkLayer for non-blocking network operations, replacing Thread.sleep() with coroutine delays.
- Added BranchRemoteInterfaceUrlConnectionAsync to maintain API compatibility while leveraging modern async operations.
- Enhanced retry mechanisms with exponential backoff and cancellation support for improved network reliability.
- Updated BranchRemoteInterfaceUrlConnection to support both legacy and modern implementations, allowing for flexible usage.
- Improved logging for better debugging and performance insights during network requests.
Copy link
Contributor

Code Quality bug fix new feature

Reference

EMT-2369 -- Eliminate Thread.sleep() in Network Layer with Coroutine-based Retry Mechanism.

Description

Summary By MatterAI MatterAI logo

🔄 What Changed

This pull request introduces a new BranchAsyncNetworkLayer class that leverages Kotlin Coroutines to replace blocking Thread.sleep() calls with non-blocking delay() for network retries. It implements an exponential backoff strategy with jitter, provides proper cancellation support via structured concurrency, and improves thread pool efficiency. The existing BranchRemoteInterfaceUrlConnection has been updated to optionally use this new asynchronous implementation via a BranchRemoteInterfaceUrlConnectionAsync wrapper, maintaining backward compatibility while modernizing the network stack.

🔍 Impact of the Change

The primary impact is the elimination of blocking operations during network retries, which significantly improves the responsiveness and resource utilization of the Branch SDK. This change prevents potential ANRs (Application Not Responding) and improves overall application performance, especially under poor network conditions. The exponential backoff with jitter enhances network stability by reducing server load during transient failures. The new architecture also provides a clear path for future asynchronous network operations and better error handling.

📁 Total Files Changed

  • Branch-SDK/src/main/java/io/branch/referral/network/BranchAsyncNetworkLayer.kt: New file implementing the core coroutine-based network logic with retry and backoff.
  • Branch-SDK/src/main/java/io/branch/referral/network/BranchRemoteInterface.java: Modified to add getter methods for responseData, responseCode, branchErrorCode, and branchErrorMessage in BranchResponse and BranchRemoteException classes.
  • Branch-SDK/src/main/java/io/branch/referral/network/BranchRemoteInterfaceUrlConnection.java: Modified to integrate and enable the new BranchRemoteInterfaceUrlConnectionAsync for network operations, replacing the legacy blocking implementation.
  • Branch-SDK/src/main/java/io/branch/referral/network/BranchRemoteInterfaceUrlConnectionAsync.kt: New file acting as a bridge, wrapping BranchAsyncNetworkLayer to provide a synchronous interface for backward compatibility using runBlocking.

🧪 Test Added

N/A - No explicit test files were added or modified in this pull request. The changes primarily focus on refactoring the network layer's implementation details.

🔒Security Vulnerabilities

No new security vulnerabilities were introduced. The change improves reliability and performance without compromising existing security measures. The retry mechanism with exponential backoff and jitter is a robust pattern that helps prevent accidental Denial-of-Service (DoS) scenarios from aggressive retries.

Testing Instructions

Risk Assessment [HIGH || MEDIUM || LOW]

MEDIUM
This change involves a fundamental shift in the network layer's concurrency model. While designed for backward compatibility, thorough testing is required to ensure no regressions in network request handling, retry logic, error reporting, and overall SDK stability across various Android versions and network conditions. The use of runBlocking for compatibility needs careful consideration to ensure it does not inadvertently block the main thread if the calling context changes.

  • I, the PR creator, have tested — integration, unit, or otherwise — this code.

Reviewer Checklist (To be checked off by the reviewer only)

  • JIRA Ticket is referenced in PR title.
  • Correctness & Style
    • Conforms to AOSP Style Guides
    • Mission critical pieces are documented in code and out of code as needed.
  • Unit Tests reviewed and test issue sufficiently.
  • Functionality was reviewed in QA independently by another engineer on the team.

cc @BranchMetrics/saas-sdk-devs for visibility.

Tip

Quality Recommendations

  1. Refactor lastResponseCode, lastResponseMessage, and lastRequestId in BranchAsyncNetworkLayer to be local to each request or managed in a per-request context to prevent race conditions when multiple requests are made concurrently on the same instance. These instance variables are shared state and can lead to incorrect debugging information.

  2. While runBlocking is used for backward compatibility, ensure that BranchRemoteInterfaceUrlConnectionAsync is always invoked from a background thread to prevent blocking the main thread and potential ANRs. Add a clear warning or assertion if possible.

  3. Consider making the CoroutineScope in BranchAsyncNetworkLayer injectable or providing a more explicit lifecycle management mechanism beyond cancelAll(), especially if the SDK's initialization/deinitialization pattern is complex or if multiple instances of the network layer might exist.

Tanka Poem ♫

Sleep no more, old thread,
Coroutines dance, light and swift,
Backoff, then try again.
Network's hum, a steady beat,
Science sings, no longer waits. ✨

Sequence Diagram

sequenceDiagram
    participant Client as Client Application
    participant LegacyNet as BranchRemoteInterfaceUrlConnection
    participant AsyncBridge as BranchRemoteInterfaceUrlConnectionAsync
    participant AsyncNet as BranchAsyncNetworkLayer
    participant HttpsConn as HttpsURLConnection
    participant BranchAPI as Branch API
    participant Prefs as PrefHelper
    participant Logger as BranchLogger

    Client->>LegacyNet: doRestfulGet(url)
    activate LegacyNet
    LegacyNet->>AsyncBridge: doRestfulGet(url)
    activate AsyncBridge
    AsyncBridge->>Logger: d("Using async layer")
    AsyncBridge->>AsyncNet: runBlocking { doRestfulGet(url) }
    activate AsyncNet

    loop Retry Attempts (up to retryLimit)
        AsyncNet->>Logger: v("Executing attempt #retryNumber")
        AsyncNet->>AsyncNet: executeWithRetry(operation: performGetRequest)
        activate AsyncNet

        AsyncNet->>Prefs: getTimeout(), getConnectTimeout()
        Prefs-->>AsyncNet: timeout, connectTimeout
        AsyncNet->>HttpsConn: openConnection() as HttpsURLConnection
        activate HttpsConn
        HttpsConn->>HttpsConn: setConnectTimeout(connectTimeout)
        HttpsConn->>HttpsConn: setReadTimeout(timeout)
        HttpsConn->>BranchAPI: GET /api/endpoint?retryNumber=X
        BranchAPI-->>HttpsConn: HTTP Response (code, message, headers)
        deactivate HttpsConn

        AsyncNet->>HttpsConn: getResponseCode(), getHeaderField(RequestId)
        HttpsConn-->>AsyncNet: responseCode, requestId
        AsyncNet->>AsyncNet: update lastResponseCode, lastRequestId

        alt Response Code >= 500 or Network Exception
            AsyncNet->>AsyncNet: shouldRetry(responseCode, retryNumber) / shouldRetryOnException(e, retryNumber)
            alt Should Retry
                AsyncNet->>AsyncNet: calculateRetryDelay(retryNumber)
                AsyncNet->>Logger: w("Retrying after delay")
                AsyncNet->>AsyncNet: delay(calculatedDelay)
                AsyncNet->>AsyncNet: increment retryNumber
            else Max Retries Exceeded or Non-retryable Error
                AsyncNet->>Logger: e("Request failed permanently")
                AsyncNet->>AsyncNet: convertToBranchRemoteException(e)
                AsyncNet--xAsyncBridge: BranchRemoteException
            end
        else Success (responseCode < 500)
            AsyncNet->>HttpsConn: getInputStream() / getErrorStream()
            HttpsConn-->>AsyncNet: InputStream
            AsyncNet->>AsyncNet: getResponseString(inputStream)
            AsyncNet-->>AsyncBridge: BranchResponse(data, code)
        end
        deactivate AsyncNet
    end

    AsyncBridge-->>LegacyNet: BranchResponse
    deactivate AsyncBridge
    LegacyNet-->>Client: BranchResponse
    deactivate LegacyNet

    Client->>LegacyNet: doRestfulPost(url, payload)
    activate LegacyNet
    LegacyNet->>AsyncBridge: doRestfulPost(url, payload)
    activate AsyncBridge
    AsyncBridge->>Logger: d("Using async layer")
    AsyncBridge->>AsyncNet: runBlocking { doRestfulPost(url, payload) }
    activate AsyncNet

    loop Retry Attempts (up to retryLimit)
        AsyncNet->>Logger: v("Executing attempt #retryNumber")
        AsyncNet->>AsyncNet: executeWithRetry(operation: performPostRequest)
        activate AsyncNet

        AsyncNet->>Prefs: getTimeout(), getConnectTimeout()
        Prefs-->>AsyncNet: timeout, connectTimeout
        AsyncNet->>AsyncNet: payload.put(RETRY_NUMBER, retryNumber)
        AsyncNet->>HttpsConn: openConnection() as HttpsURLConnection
        activate HttpsConn
        HttpsConn->>HttpsConn: setConnectTimeout(connectTimeout)
        HttpsConn->>HttpsConn: setReadTimeout(timeout)
        HttpsConn->>HttpsConn: setRequestProperty("Content-Type", "application/json")
        HttpsConn->>HttpsConn: setRequestMethod("POST")
        AsyncNet->>HttpsConn: OutputStreamWriter(outputStream).write(payload.toString())
        HttpsConn->>BranchAPI: POST /api/endpoint (payload)
        BranchAPI-->>HttpsConn: HTTP Response (code, message, headers)
        deactivate HttpsConn

        AsyncNet->>HttpsConn: getResponseCode(), getHeaderField(RequestId)
        HttpsConn-->>AsyncNet: responseCode, requestId
        AsyncNet->>AsyncNet: update lastResponseCode, lastRequestId

        alt Response Code >= 500 or Network Exception
            AsyncNet->>AsyncNet: shouldRetry(responseCode, retryNumber) / shouldRetryOnException(e, retryNumber)
            alt Should Retry
                AsyncNet->>AsyncNet: calculateRetryDelay(retryNumber)
                AsyncNet->>Logger: w("Retrying after delay")
                AsyncNet->>AsyncNet: delay(calculatedDelay)
                AsyncNet->>AsyncNet: increment retryNumber
            else Max Retries Exceeded or Non-retryable Error
                AsyncNet->>Logger: e("Request failed permanently")
                AsyncNet->>AsyncNet: convertToBranchRemoteException(e)
                AsyncNet--xAsyncBridge: BranchRemoteException
            end
        else Success (responseCode < 500)
            AsyncNet->>HttpsConn: getInputStream() / getErrorStream()
            HttpsConn-->>AsyncNet: InputStream
            AsyncNet->>AsyncNet: getResponseString(inputStream) or handle QR code binary
            AsyncNet-->>AsyncBridge: BranchResponse(data, code)
        end
        deactivate AsyncNet
    end

    AsyncBridge-->>LegacyNet: BranchResponse
    deactivate AsyncBridge
    LegacyNet-->>Client: BranchResponse
    deactivate LegacyNet

    Client->>LegacyNet: cancelAsyncOperations()
    activate LegacyNet
    LegacyNet->>AsyncBridge: cancelAllOperations()
    activate AsyncBridge
    AsyncBridge->>AsyncNet: cancelAll()
    activate AsyncNet
    AsyncNet->>AsyncNet: scope.cancel("Network layer cancelled")
    deactivate AsyncNet
    deactivate AsyncBridge
    deactivate LegacyNet
Loading

Copy link
Contributor

Important

PR Review Skipped

PR review skipped as per the configuration setting. Run a manually review by commenting /matter review

💡Tips to use MatterAI

Command List

  • /matter summary: Generate AI Summary for the PR
  • /matter review: Generate AI Reviews for the latest commit in the PR
  • /matter review-full: Generate AI Reviews for the complete PR
  • /matter release-notes: Generate AI release-notes for the PR
  • /matter : Chat with your PR with MatterAI Agent
  • /matter remember : Generate AI memories for the PR
  • /matter explain: Get an explanation of the PR
  • /matter help: Show the list of available commands and documentation
  • Need help? Join our Discord server: https://discord.gg/fJU5DvanU3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant