Skip to content

Conversation

@flamingbear
Copy link
Member

@flamingbear flamingbear commented Dec 18, 2025

Jira Issue ID

Originally I couldn't fix my other PR. But I did manage to. I think this is the alternative now in the off chance you want both caches separate.

None general improvements for NSIDC (and others) metrics

NOTE: I was trying to combine the DB lookup for providerID with the collectionIds, in #827, but I am just failing with my cache so far. I am over my head on that one. I don't know if the extra function call in this version, where I just follow the existing providerId pattern causes enough overhead to keep trying to figure out what I'm doing wrong on the other PR, or if this is an acceptable solution.

Description

Adds an additional query parameter of A-collection-concept-ids which is a comma separated list of collection concept ids associated with the JOBID. In all cases now, this is a single value.

Local Test Steps

Pull this branch. Run the tests. Build local images and run Harmony-In-A-Box.

Run a sample request. e.g.:

http://localhost:3000/C1238392622-EEDTEST/ogc-api-coverages/1.0.0/collections/parameter_vars/coverage/rangeset?forceAsync=true&subset=lat(0%3A15)&subset=lon(-150%3A-105)&label=harmony-py&granuleId=G1245840464-EEDTEST&variable=atmosphere_cloud_liquid_water_content

Look at the logs in the k8s_harmony_harmony pod when you download the result and look for the logs that show the signing parameters. see that a new value "A-collection-concept-ids":"C1238392622-EEDTEST" is included

 Signing s3://local-staging-bucket/public/7fd434f6-326b-4d9f-b25e-2a1748b4cfe7/2/f16_ssmis_20210425v7_atmosphere_cloud_liquid_water_content_subsetted.nc4 with params {"A-userid":"matthewsavoie","A-api-request-uuid":"7fd434f6-326b-4d9f-b25e-2a1748b4cfe7","A-provider":"EEDTEST","A-collection-concept-ids":"C1238392622-EEDTEST"}

PR Acceptance Checklist

  • Acceptance criteria met
  • Tests added/updated (if needed) and passing
  • Documentation updated (if needed)
  • Harmony in a Box tested (if changes made to microservices or new dependencies added)

Summary by CodeRabbit

  • New Features

    • Service results now fetch and include collection concept identifiers in request headers when available.
  • Tests

    • Added tests covering collection identifier retrieval and asserting header presence or absence in service result flows.
  • Configuration

    • New collection cache configuration options and defaults added (size and TTL) with validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 18, 2025

Walkthrough

Adds a collection ID cache and retrieval flow: new LRU collectionIdCache and fetchCollectionId; Job.getCollectionIdForJobId reads collection IDs for a job; getServiceResult fetches collection IDs and injects an A-collection-concept-ids header when present. Env defaults and tests updated for cache config.

Changes

Cohort / File(s) Summary
Service frontend & cache
services/harmony/app/frontends/service-results.ts
Exports collectionIdCache (LRU with ttl, maxSize, size heuristic). Adds fetchCollectionId (logs, delegates to Job.getCollectionIdForJobId) and updates getServiceResult to fetch collection IDs and inject an A-collection-concept-ids header (uppercased in signing params) alongside provider header logic.
Job model DB access
services/harmony/app/models/job.ts
Adds static async getCollectionIdForJobId(tx, jobID) which queries the jobs table for collection IDs, normalizes stringified JSON or array forms, and returns collection identifier(s) or undefined.
Environment defaults & config
services/harmony/env-defaults, services/cron-service/test/resources/test-env-defaults, services/harmony/app/util/env.ts
Adds COLLECTION_CACHE_SIZE and COLLECTION_CACHE_TTL defaults; extends HarmonyServerEnv with collectionCacheSize and collectionCacheTtl validated as positive integers.
Tests
services/harmony/test/service-results.ts
Imports exported collectionIdCache, stubs collectionIdCache.fetch to return collection IDs (and a scenario for undefined), and adds assertions verifying presence/absence of A-collection-concept-ids header.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant ServiceResults as ServiceResults(frontend)
  participant Cache as collectionIdCache
  participant JobModel as Job.getCollectionIdForJobId
  participant Signer as RequestSigner

  Client->>ServiceResults: request service result (jobId)
  alt jobId present
    ServiceResults->>Cache: fetch(jobId)
    alt cache hit
      Cache-->>ServiceResults: collectionIds
    else cache miss
      Cache-->>ServiceResults: (miss)
      ServiceResults->>JobModel: getCollectionIdForJobId(jobId)
      JobModel-->>ServiceResults: collectionIds (or undefined)
      ServiceResults->>Cache: store(jobId, collectionIds)
    end
    ServiceResults->>Signer: include header A-provider, A-collection-concept-ids (if collectionIds)
  else no jobId
    ServiceResults->>Signer: include header A-provider only
  end
  Signer-->>ServiceResults: signed request
  ServiceResults-->>Client: redirect / service result response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Verify LRU cache configuration: ttl units, maxSize and size heuristic.
  • Review SQL and parsing in Job.getCollectionIdForJobId for null/empty, string vs array handling, and unexpected types.
  • Confirm header name/casing and signing behavior (A-collection-concept-idsA-COLLECTION-CONCEPT-IDS) matches downstream consumers.
  • Ensure exported collectionIdCache is intended and that tests stub/restore it correctly.

Poem

I nibble keys with tiny paws,
A cache that hops across the code,
Collection IDs in tidy rows,
Headers leap in uppercase glow,
The rabbit twitches—deploy is owed! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description includes all required sections from the template: Jira Issue ID (implicitly referenced by PR context), Description, Local Test Steps, and PR Acceptance Checklist with appropriate items marked complete.
Title check ✅ Passed The PR title accurately describes the main change: adding collection concept IDs to signed S3 URLs. It directly relates to the changeset's core purpose of including a new A-collection-concept-ids signing parameter.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mhs/NSIDC-metrics/Add-collection-to-signed-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@flamingbear flamingbear marked this pull request as ready for review December 18, 2025 21:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
services/harmony/test/service-results.ts (1)

68-84: Consider adding edge case test coverage.

The test correctly validates the happy path where collection IDs exist and are formatted as expected. However, consider adding a test case for when a job has no associated collection IDs (where collectionIdCache.fetch returns undefined). This would ensure the header is omitted gracefully rather than causing errors.

Example test case
describe('when the job has no collection IDs', function () {
  let providerIdCacheStub, collectionIdCacheStub;

  before(function () {
    providerIdCacheStub = sinon.stub(providerIdCache, 'fetch').resolves('eedtest');
    collectionIdCacheStub = sinon.stub(collectionIdCache, 'fetch').resolves(undefined);
  });

  after(function () {
    providerIdCacheStub.restore();
    collectionIdCacheStub.restore();
  });

  hookUrl('/service-results/some-bucket/public/some-job-id/some-work-item-id/some-path.tif', 'jdoe');
  
  it('does not include the A-collection-concept-ids header', function () {
    expect(this.res.headers.location).to.not.include('A-collection-concept-ids');
  });
});
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 322c133 and 40bb0e2.

📒 Files selected for processing (3)
  • services/harmony/app/frontends/service-results.ts (3 hunks)
  • services/harmony/app/models/job.ts (1 hunks)
  • services/harmony/test/service-results.ts (2 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
services/harmony/app/frontends/service-results.ts (1)

138-140: Add defensive type check to prevent runtime error.

The code calls .toUpperCase() on collectionIds without verifying it's a string. Based on the Job.getCollectionIdForJobId implementation (see relevant snippet), the method can return undefined (when no results are found) or potentially an array, despite its Promise<string> signature. Calling .toUpperCase() on non-string values will cause a runtime error.

🔎 Proposed fix
  if (collectionIds) {
-   customParams['A-collection-concept-ids'] = collectionIds.toUpperCase();
+   customParams['A-collection-concept-ids'] = String(collectionIds).toUpperCase();
  }

Alternatively, add a type guard:

- if (collectionIds) {
+ if (collectionIds && typeof collectionIds === 'string') {
    customParams['A-collection-concept-ids'] = collectionIds.toUpperCase();
  }

Note: The root cause should also be addressed in Job.getCollectionIdForJobId to ensure it always returns a string or explicitly returns string | undefined, and handles the undefined case properly.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40bb0e2 and 0eb9647.

📒 Files selected for processing (4)
  • services/cron-service/test/resources/test-env-defaults (1 hunks)
  • services/harmony/app/frontends/service-results.ts (3 hunks)
  • services/harmony/app/util/env.ts (1 hunks)
  • services/harmony/env-defaults (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
services/harmony/app/frontends/service-results.ts (1)
services/harmony/app/models/job.ts (1)
  • Job (431-1248)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: compare-services
  • GitHub Check: build (22.14.x)
  • GitHub Check: build (22.14.x)
🔇 Additional comments (3)
services/harmony/app/util/env.ts (1)

132-138: LGTM!

The collection cache configuration fields are properly validated with appropriate constraints and follow the established pattern for other cache settings.

services/harmony/app/frontends/service-results.ts (2)

79-94: LGTM!

The fetchCollectionId function is well-structured and follows the established pattern of fetchProviderId. The JSDoc correctly describes the function's behavior.


96-102: LGTM!

The collectionIdCache configuration properly mirrors the providerIdCache pattern with appropriate TTL, size limits, and fetch method.

I don't think this is strictly necessary, but I also don't think it was right
before.
@flamingbear flamingbear marked this pull request as draft December 19, 2025 21:07
@flamingbear flamingbear changed the title Include Collection Concept Id in the signed s3 urls Alternative Collection Concept Id in the signed s3 urls Dec 19, 2025
@flamingbear
Copy link
Member Author

Closing this since I was able to get #827 working properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants