Skip to content

hub/analytics: make analytics cookie optional #8452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

haraldschilly
Copy link
Contributor

@haraldschilly haraldschilly commented Jul 16, 2025

Summary

This PR implements optional analytics tracking in CoCalc, allowing the system to operate without analytics cookies for improved privacy compliance. The latest changes introduce a new analytics_cookie server setting and implement IP address fallback for abuse detection when cookies are not available.

Key Changes

New Analytics Cookie Server Setting

  • Added analytics_cookie server setting: Administrators can now control whether analytics cookies are enabled through the admin settings panel
  • Centralized cookie control: When analytics_cookie is disabled, no analytics cookies are set or read across the application
  • Backward compatibility: Existing functionality remains intact when the setting is enabled

Anonymous User ID System

  • Added getAnonymousID function (packages/next/lib/user-id.ts): Centralized logic to get anonymous user ID with intelligent fallback:
    1. Analytics cookie (when analytics_cookie server setting is enabled)
    2. Cloudflare CF-Connecting-IP header (for users behind Cloudflare)
    3. X-Forwarded-For header (for users behind other proxies)
    4. Socket remote address (direct connections)
  • Validation: All anonymous IDs are validated using isValidAnonymousID function

Enhanced Abuse Protection

  • IP address fallback for abuse detection: When analytics cookies are disabled, the system uses IP addresses to track usage for abuse prevention
  • Updated abuse detection (packages/server/jupyter/abuse.ts, packages/server/llm/abuse.ts):
    • Uses anonymous_id parameter (from cookie or IP) instead of analytics_cookie
    • Validates anonymous IDs using isValidAnonymousID
    • Maintains same quota enforcement whether using cookies or IP addresses
  • Consistent tracking: Both Jupyter and LLM APIs now use the same anonymous ID logic

API Endpoint Updates

  • Jupyter execution API (packages/next/pages/api/v2/jupyter/execute.ts): Uses getAnonymousID for consistent anonymous tracking
  • LLM evaluation API (packages/next/pages/api/v2/llm/evaluate.ts): Uses getAnonymousID for consistent anonymous tracking

Enhanced Validation and Testing

  • Renamed isValidAnonID to isValidAnonymousID (packages/util/misc.ts): Improved naming consistency
  • Added comprehensive tests (packages/util/misc.test.ts): Tests for IPv4, IPv6, UUIDs, and edge cases
  • Updated all imports: Consistent use of isValidAnonymousID across the codebase

Backend Analytics System

  • Enhanced analytics script (packages/hub/analytics-script.ts):

    • Only sets analytics cookies when analytics_cookie server setting is enabled
    • Maintains cookieless functionality when analytics is disabled
  • Improved analytics handler (packages/hub/analytics.ts):

    • Rate limiting: Added 10 entries/second rate limit for analytics data
    • Cookieless tracking: Supports anonymous tracking without user association
    • Cookie conditional logic: Only sets cookies when analytics_cookie server setting is enabled

Privacy and Compliance Benefits

  1. Flexible Anonymous Tracking: System can identify users via cookies or IP addresses as fallback
  2. Admin Control: New analytics_cookie server setting allows administrators to disable analytics cookies entirely
  3. Robust Abuse Protection: Abuse detection works with or without cookies by using IP addresses as fallback
  4. Privacy-First Design: When analytics cookies are disabled, the system uses only IP addresses for essential abuse prevention
  5. Cookieless Mode: When analytics_cookie setting is disabled, no analytics cookies are set or read
  6. Rate Limiting: Prevents analytics data flooding with configurable limits
  7. Proxy-Aware: Properly handles Cloudflare and other proxy headers for accurate IP identification

Implementation Notes

  • The analytics_cookie server setting controls cookie behavior across the entire application
  • All existing analytics functionality remains intact when the setting is enabled
  • The system gracefully handles both cookie-based and IP-based anonymous tracking
  • Rate limiting prevents potential abuse of analytics endpoints
  • CORS validation has been improved for better security
  • Database schema remains backward compatible (anonymous_id values stored in existing analytics_cookie fields)
  • Abuse detection maintains the same quota enforcement regardless of tracking method

🤖 Generated with Claude Code

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add getAnonymousID function to get anonymous ID from cookie or IP
- Support Cloudflare CF-Connecting-IP and X-Forwarded-For headers
- Fall back to socket remote address if no cookie available
- Update LLM and Jupyter APIs to use new anonymous ID logic
- Rename isValidAnonID to isValidAnonymousID for consistency
- Add comprehensive tests for isValidAnonymousID validation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
}

// Fall back to IP address - check headers in order of preference
const connectingIp = (req.headers["cf-connecting-ip"] ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor this with the code in packages/conat/core/server.ts --

The code could be put in packages/util/get-ip-address.ts

// See https://socket.io/how-to/get-the-ip-address-of-the-client
function getAddress(socket) {
  const header = socket.handshake.headers["forwarded"];
  if (header) {
    for (const directive of header.split(",")[0].split(";")) {
      if (directive.startsWith("for=")) {
        return directive.substring(4);
      }
    }
  }

  let addr = socket.handshake.headers["x-forwarded-for"]?.split(",")?.[0];
  if (addr) {
    return addr;
  }
  for (const other of ["cf-connecting-ip", "fastly-client-ip"]) {
    addr = socket.handshake.headers[other];
    if (addr) {
      return addr;
    }
  }

  return socket.handshake.address;
}

(Also, concerns about the order of x-forwarded-for versus cf-connecting-ip...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent way too much time on this and this, but I think I got it. The more I looked into this and existing packages, the more edge cases popped up. Also, a widely used package has problems like not prioritizing cloudflare or not knowing how that forward header works. In any case, I wrapped it and wrote tests.

875e790

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was pretty intense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could in theory be a standalone library, which Andrey could also use in sagecellserver...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants