feat: add dataset api #622

galzilber · 2025-08-06T09:32:57Z

Important

Introduces a comprehensive dataset management API to the Traceloop SDK, including dataset creation, updating, deletion, publishing, versioning, and CSV import, with new classes, sample applications, and integration tests.

New Features:
- Introduced dataset management API in traceloop-sdk, including creation, updating, deletion, publishing, and versioning.
- Added support for dataset schemas, columns, rows, and CSV import in dataset.ts, column.ts, row.ts.
- Enhanced TraceloopClient with project-scoped dataset operations.
Sample Applications:
- Added sample_dataset.ts to demonstrate dataset API usage.
- Added test_dataset_api.ts for testing dataset API functionality.
Tests:
- Added integration tests in datasets-recording.test.ts to cover dataset lifecycle and API usage.
- Included HTTP interaction recordings for testing in recordings/ directory.

^{This description was created by}^{for 85f2547. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Introduced comprehensive dataset management capabilities, including creating, updating, deleting, publishing, and versioning datasets.
- Added support for defining dataset schemas, managing columns and rows, importing data from CSV, and performing data analysis.
- Exposed new classes and interfaces for datasets, columns, and rows, enabling advanced programmatic interaction.
- Enhanced SDK with project-scoped dataset operations and improved API integration.
- Added new npm scripts to build and run sample dataset and test dataset API applications demonstrating SDK usage.
Tests
- Added integration and sample application tests covering the full dataset lifecycle and API usage.
Chores
- Included new sample scripts and HTTP interaction recordings to support testing and documentation.

gitguardian · 2025-08-06T09:33:02Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

CLAassistant · 2025-08-06T09:33:03Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2025-08-06T09:33:05Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This change introduces a comprehensive dataset management feature into the Traceloop SDK and its sample application. It adds new core classes, interfaces, and methods for creating, updating, querying, and managing datasets, columns, and rows, with extensive test coverage, sample scripts, and supporting HAR recordings for API interactions.

Changes

Cohort / File(s)	Change Summary
Sample App Scripts `packages/sample-app/package.json`	Added npm scripts for running and testing dataset sample and API test files.
Sample Dataset Demo `packages/sample-app/src/sample_dataset.ts`	New sample app demonstrating dataset lifecycle, schema, LLM data ingestion, CSV import, stats, versioning, publishing, and search using Traceloop SDK and OpenAI API.
Sample Dataset API Test `packages/sample-app/src/test_dataset_api.ts`	New script providing comprehensive, sequential tests for the Dataset API, with logging and error handling.
SDK Dataset Core Classes `packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts`, `.../dataset/column.ts`, `.../dataset/dataset.ts`, `.../dataset/datasets.ts`, `.../dataset/row.ts`, `.../dataset/index.ts`	Introduced `BaseDataset`, `Dataset`, `Datasets`, `Column`, and `Row` classes for full dataset, column, and row management; added index re-export.
SDK Client Enhancements `packages/traceloop-sdk/src/lib/client/traceloop-client.ts`	Enhanced `TraceloopClient` with project scoping, new HTTP methods, dataset path rewriting, and a `datasets` property for managing datasets.
SDK Dataset Interfaces `packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts`, `.../interfaces/index.ts`	Added comprehensive interfaces for datasets, columns, rows, CSV import, stats, and versioning; exported from main interface index; added optional `projectId` to client options.
SDK Node Exports `packages/traceloop-sdk/src/lib/node-server-sdk.ts`	Exported new dataset-related interfaces and classes from the SDK's public API.
SDK Dataset Integration Test `packages/traceloop-sdk/test/datasets-recording.test.ts`	Added integration tests for dataset lifecycle, columns, rows, and listing, using Polly.js for HTTP recording/playback.
HAR Recordings – Dataset API `packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/*`	Added HAR files capturing API requests/responses for dataset creation, management, column operations, and listing for test coverage.
HAR Recordings – Integration Test `packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/*`	Added HAR files for integration test scenarios, including dataset creation, management, and listing.

Sequence Diagram(s)

Dataset Creation and Row Ingestion (Sample App Flow)

sequenceDiagram
    participant App as Sample App
    participant Traceloop as TraceloopClient
    participant API as Traceloop API
    participant OpenAI as OpenAI API

    App->>Traceloop: Initialize client
    App->>Traceloop: Create dataset
    Traceloop->>API: POST /datasets
    API-->>Traceloop: Dataset created
    Traceloop-->>App: Dataset object

    loop For each prompt
        App->>OpenAI: Send prompt
        OpenAI-->>App: LLM response
        App->>Traceloop: Add row to dataset
        Traceloop->>API: POST /datasets/{id}/rows
        API-->>Traceloop: Row added
        Traceloop-->>App: Row confirmation
    end

    App->>Traceloop: Import CSV data
    Traceloop->>API: POST /datasets/{id}/rows (batch)
    API-->>Traceloop: Rows added

    App->>Traceloop: Fetch stats, rows, versions, publish
    Traceloop->>API: GET/PUT /datasets/{id}
    API-->>Traceloop: Stats/versions/confirmation
    Traceloop-->>App: Results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

doronkopit5

Poem

In fields of data, rabbits leap—
New columns sprout, the rows grow deep.
SDKs bloom and tests abound,
With HARs and scripts all hopping 'round.
A dataset garden, neat and bright—
Reviewed by moon and monitor light.
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85f2547 and c442aa8.

📒 Files selected for processing (1)

packages/sample-app/package.json (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/sample-app/package.json

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gz/add-datasets

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to bb5536a in 2 minutes and 41 seconds. Click for details.

Reviewed 2770 lines of code in 19 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/src/sample_dataset.ts:8

Draft comment:
Consider validating that TRACELOOP_API_KEY is defined to avoid silent failures at runtime.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% 1. The code already has error handling through getClient() check and try/catch blocks. 2. The SDK likely handles invalid API keys gracefully. 3. Environment variables are commonly used without explicit validation. 4. Adding validation wouldn't significantly improve the error handling that's already in place. 5. The comment is suggesting a defensive programming practice that isn't strictly necessary. The code could fail in a slightly more user-friendly way with explicit validation. The SDK's error handling behavior isn't completely clear from the code shown. The existing error handling with getClient() check and try/catch blocks already provides sufficient protection against API key issues. Adding another validation layer would be redundant. The comment should be deleted as the code already has adequate error handling through multiple mechanisms, and additional validation would be redundant.

2. packages/traceloop-sdk/src/lib/client/traceloop-client.ts:48

Draft comment:
Endpoint version mismatch: buildDatasetPath rewrites '/v2/datasets' to '/v2/projects/{projectId}/datasets', yet tests expect '/v1/datasets'. Ensure consistency with API version and update tests/documentation accordingly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The code clearly handles v2 paths specifically and intentionally, as shown by the comment on line 49. There's no evidence of any v1 path issues - the function explicitly only modifies v2 paths and leaves all other paths unchanged. The comment speculates about test expectations without showing actual test code or failures. I could be wrong about the test expectations - there might be actual test files showing v1 usage that I can't see. The version mismatch could be a real issue in the broader codebase. While there could be test issues, we don't have evidence of them. The code change itself is clear and intentional about handling v2 paths specifically. Without seeing failing tests or other evidence, this comment is speculative. Delete this comment as it makes speculative assumptions about test expectations without evidence, and the code change itself is clear about its v2 path handling intention.

3. packages/traceloop-sdk/test/datasets.test.ts:78

Draft comment:
Test expectation mismatch: The test expects URL 'https://api.traceloop.com/v1/datasets', but client now uses '/v2/projects/default/datasets'. Update test expectations to match the current API endpoint.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a new test file being added, so the test author deliberately wrote it to expect v1. The comment assumes the test is wrong and v2 is correct, but provides no evidence for this. The client is initialized with just the base URL, with no version specified. Without seeing the client implementation or API docs, we can't know if v1 or v2 is correct. The comment could be correct - maybe the API really did change to v2. But we need more context about the API versioning to be sure. Given that this is a new test file being added, and the author explicitly wrote it for v1, we should assume they know which version they're testing against. Without strong evidence that v2 is correct, we should trust the test author. Delete the comment. While it could be correct about v2, we don't have enough evidence to override the test author's explicit choice of v1 in this new test file.

4.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-list-datasets_1091121199/recording.har:43

Draft comment:
Typographical suggestion: In the JSON string under the 'text' property, several slug values use 'daatset' (e.g., "daatset-11", "daatset-10", etc.). This appears to be a typo, and it might be intended to be 'dataset'. Please verify if these should be corrected.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_ApGRFbgPPJgZrcRO

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-08-06T09:35:42Z

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts

+    return rows;
+  }
+
+  private parseCSVLine(line: string, delimiter: string): string[] {


CSV parsing logic is simplistic and may not correctly handle edge cases (e.g., embedded delimiters, escaped quotes, newlines). Consider using a robust CSV parsing library.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 6955d14 in 2 minutes and 15 seconds. Click for details.

Reviewed 3095 lines of code in 20 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/src/lib/client/traceloop-client.ts:48

Draft comment:
The buildDatasetPath method replaces '/v2/datasets' with '/v2/projects//datasets'. However, some tests (e.g. in datasets.test.ts) expect the v1 endpoint. Ensure endpoint versions are consistent across the code and tests.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is speculative - it mentions tests that may have issues but we can't see them. The code itself handles v2 paths explicitly and leaves other paths unchanged, which seems intentional. The comment asks to "ensure" consistency, which is a request for verification rather than pointing out a concrete issue. I might be missing actual test failures in datasets.test.ts that demonstrate this is a real problem. The version handling could be buggy in ways not visible in this file. Without seeing actual test failures or the test file contents, this remains speculative. The code's version handling appears intentional and clear. Delete this comment as it's speculative and asks for verification rather than pointing out a concrete issue. The code's version handling appears intentional.

2. packages/traceloop-sdk/test/datasets.test.ts:78

Draft comment:
Test expects the URL 'https://api.traceloop.com/v1/datasets' but with the new dataset path logic, the URL will be transformed to v2 endpoints (e.g. '/v2/projects/default/datasets'). Update the test URL expectation to match the new endpoint.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% Looking at the test file, this is a new file being added, not a modification. The test is being written fresh with v1 endpoints. There's no evidence in the diff that shows v2 endpoints are being introduced or that there's any "new dataset path logic". The comment makes claims about changes that aren't visible in this PR. Maybe there are other files in the PR or codebase that show the v2 endpoint changes? Maybe this is part of a larger refactoring effort? Even if v2 endpoints are coming in the future, this comment is speculative - it's commenting on potential future changes rather than actual issues with the current code. The rules say not to make speculative comments. Delete the comment. It's speculative and refers to changes that aren't visible in this PR. If there is a plan to move to v2 endpoints, that should be handled in a separate PR.

3. packages/traceloop-sdk/src/lib/client/dataset/dataset.ts:138

Draft comment:
Good use of batching in CSV import. This improves performance for large CSV files.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-list-datasets_1091121199/recording.har:43

Draft comment:
I noticed several instances of "daatset" (e.g. "daatset-11", "daatset-10", etc.) in the dataset slugs within the JSON text. Is this a typographical error? Possibly it should be "dataset"?
Reason this comment was not posted:
Comment was on unchanged code.

5. packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts:97

Draft comment:
There is no newline at the end of the file. Consider adding a trailing newline for consistency.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While having a newline at end of file is a common convention, this is a very minor issue that would typically be handled automatically by linters or IDE settings. It doesn't affect functionality and is not related to any logic changes. Most modern development environments handle this automatically. The missing newline could cause issues with some Unix tools and version control systems. It's a widely accepted convention in software development. While true, this is exactly the kind of minor issue that should be handled by automated tooling rather than manual code review comments. It's too trivial for a human review comment. Delete this comment as it's too minor and would be better handled by automated tooling like linters or editor settings.

Workflow ID: wflow_TyQhpFoav052cKes

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-08-06T09:52:31Z

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts

+    return versionsData.versions.find(v => v.version === version) || null;
+  }
+
+  private parseCSV(csvContent: string, delimiter: string, hasHeader: boolean): RowData[] {


The custom CSV parsing implementation may not handle all edge cases (e.g. fields with embedded newlines or escaped quotes). Consider using a robust CSV parsing library for production use.

coderabbitai

Actionable comments posted: 12

♻️ Duplicate comments (1)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1)

197-217: CSV parsing logic needs improvement

The current CSV parsing implementation doesn't handle all edge cases correctly (e.g., escaped quotes within quoted fields, newlines within quotes).

Consider using a robust CSV parsing library like csv-parse or papaparse:

+import { parse } from 'csv-parse/sync';

 private parseCSV(csvContent: string, delimiter: string, hasHeader: boolean): RowData[] {
-  // Current manual parsing logic
+  const records = parse(csvContent, {
+    delimiter,
+    columns: hasHeader ? true : false,
+    skip_empty_lines: true,
+    trim: true,
+    cast: true,
+    cast_date: false
+  });
+  
+  if (!hasHeader && records.length > 0) {
+    // Generate column names for headerless CSV
+    const firstRow = records[0];
+    const headers = Object.keys(firstRow).map((_, i) => `column_${i + 1}`);
+    return records.map(row => {
+      const newRow: RowData = {};
+      Object.values(row).forEach((value, i) => {
+        newRow[headers[i]] = value;
+      });
+      return newRow;
+    });
+  }
+  
+  return records;
 }

🧹 Nitpick comments (3)

packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts (1)

28-44: Consider consolidating validation methods to reduce duplication.

The three validation methods (validateDatasetId, validateDatasetSlug, validateDatasetName) have identical logic with only the error message differing.

Consider refactoring to a generic validation method:

+  protected validateRequiredString(value: string, fieldName: string): void {
+    if (!value || typeof value !== 'string' || value.trim().length === 0) {
+      throw new Error(`${fieldName} is required and must be a non-empty string`);
+    }
+  }
+
-  protected validateDatasetId(id: string): void {
-    if (!id || typeof id !== 'string' || id.trim().length === 0) {
-      throw new Error('Dataset ID is required and must be a non-empty string');
-    }
-  }
-
-  protected validateDatasetSlug(slug: string): void {
-    if (!slug || typeof slug !== 'string' || slug.trim().length === 0) {
-      throw new Error('Dataset slug is required and must be a non-empty string');
-    }
-  }
-
-  protected validateDatasetName(name: string): void {
-    if (!name || typeof name !== 'string' || name.trim().length === 0) {
-      throw new Error('Dataset name is required and must be a non-empty string');
-    }
-  }
+  protected validateDatasetId(id: string): void {
+    this.validateRequiredString(id, 'Dataset ID');
+  }
+
+  protected validateDatasetSlug(slug: string): void {
+    this.validateRequiredString(slug, 'Dataset slug');
+  }
+
+  protected validateDatasetName(name: string): void {
+    this.validateRequiredString(name, 'Dataset name');
+  }

packages/sample-app/src/test_dataset_api.ts (1)

136-149: Consider using a more realistic CSV import example.

The hardcoded CSV data is functional but could be more representative of real-world usage scenarios.

Consider using a more comprehensive CSV example:
        const csvData = `user_id,score,active,department,join_date
-user202,88,true
-user303,91,false
-user404,76,true`;
+user202,88,true,engineering,2023-01-15
+user303,91,false,marketing,2022-11-22
+user404,76,true,sales,2023-03-08`;

packages/sample-app/src/sample_dataset.ts (1)

293-296: Be cautious with stack trace logging

Logging full stack traces in production could potentially expose sensitive information about the application structure.

Consider conditionally logging stack traces based on environment:
   } catch (error) {
     console.error("❌ Error in dataset operations:", error.message);
-    if (error.stack) {
+    if (error.stack && process.env.NODE_ENV === 'development') {
       console.error("Stack trace:", error.stack);
     }
   }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7a78a5 and 6955d14.

📒 Files selected for processing (20)

packages/sample-app/package.json (1 hunks)
packages/sample-app/src/sample_dataset.ts (1 hunks)
packages/sample-app/src/test_dataset_api.ts (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Column-Operations_3207658095/should-add-columns-to-dataset_1128156327/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-create-a-new-dataset_1486295619/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-get-dataset-by-slug_1748151842/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-list-datasets_1091121199/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-update-dataset_4001908675/recording.har (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/column.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/datasets.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/index.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/row.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/traceloop-client.ts (4 hunks)
packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts (1 hunks)
packages/traceloop-sdk/src/lib/interfaces/index.ts (1 hunks)
packages/traceloop-sdk/src/lib/node-server-sdk.ts (1 hunks)
packages/traceloop-sdk/test/datasets-recording.test.ts (1 hunks)
packages/traceloop-sdk/test/datasets.test.ts (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/traceloop-sdk/src/lib/interfaces/index.ts (1)

packages/traceloop-sdk/src/lib/interfaces/traceloop-client.interface.ts (1)

TraceloopClientOptions (1-5)

🪛 GitHub Actions: CI

packages/traceloop-sdk/src/lib/client/dataset/datasets.ts

[error] 46-46: TypeError: Cannot read properties of null (reading 'datasets') in test 'should list datasets' (test/datasets.test.ts).

[error] 60-60: TypeError: Cannot read properties of null (reading 'datasets') in test 'should find dataset by name' (test/datasets.test.ts).

packages/traceloop-sdk/src/lib/client/dataset/column.ts

[error] 109-109: ESLint: Unexpected lexical declaration in case block (no-case-declarations)

[error] 121-121: ESLint: Unexpected lexical declaration in case block (no-case-declarations)

packages/traceloop-sdk/test/datasets.test.ts

[error] 78-78: AssertionError: Expected URL 'https://api.traceloop.com/v1/datasets' but got 'https://api.traceloop.com/v2/projects/default/datasets' in test 'should include correct headers and payload for dataset creation'.

packages/traceloop-sdk/test/datasets-recording.test.ts

[error] 1-1: PollyError: Recording for POST request to https://api-staging.traceloop.com/v2/projects/default/datasets not found and 'recordIfMissing' is false, causing test failure in 'should create a new dataset'.

[error] 87-87: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should get dataset by slug'.

[error] 105-105: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should update dataset'.

[error] 130-130: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should add columns to dataset'.

[error] 159-159: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should get columns from dataset'.

[error] 185-185: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should add single row to dataset'.

[error] 201-201: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should add multiple rows to dataset'.

[error] 219-219: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should get rows from dataset'.

[error] 239-239: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should import CSV data with headers'.

[error] 265-265: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should publish dataset'.

[error] 280-280: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should get dataset versions'.

[error] 291-291: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should get dataset stats'.

[error] 304-304: TypeError: Cannot read properties of undefined (reading 'skip') in test 'should delete the test dataset'.

packages/traceloop-sdk/src/lib/client/dataset/row.ts

[error] 112-112: ESLint: Type string trivially inferred from a string literal, remove type annotation (@typescript-eslint/no-inferrable-types)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts

[error] 26-26: TypeError: Cannot read properties of null (reading 'id') in test 'should create a new dataset' (test/datasets.test.ts:49).

[error] 26-26: TypeError: Cannot read properties of null (reading 'id') in test 'should get a dataset by ID' (test/datasets.test.ts:119).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should update a dataset' (test/datasets.test.ts:206).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should delete a dataset' (test/datasets.test.ts:216).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should publish a dataset' (test/datasets.test.ts:234).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should add a column to dataset' (test/datasets.test.ts:271).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should get columns from dataset' (test/datasets.test.ts:308).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should add a row to dataset' (test/datasets.test.ts:344).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should add multiple rows to dataset' (test/datasets.test.ts:373).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should import CSV data' (test/datasets.test.ts:416).

[error] 30-30: TypeError: Cannot read properties of null (reading 'slug') in test 'should handle CSV without headers' (test/datasets.test.ts:433).

🪛 Biome (2.1.2)

packages/traceloop-sdk/src/lib/client/dataset/column.ts

[error] 109-109: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

[error] 121-121: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (18)

packages/traceloop-sdk/src/lib/interfaces/index.ts (1)

5-5: LGTM!

The new export follows the existing pattern and properly exposes the dataset interfaces.

packages/sample-app/package.json (1)

31-32: LGTM!

The new npm scripts follow the established pattern and provide convenient commands to run the dataset examples. The naming convention is consistent with existing scripts.

packages/traceloop-sdk/src/lib/client/dataset/index.ts (1)

1-5: LGTM!

The barrel export pattern is implemented correctly, providing a clean entry point for dataset-related classes. The naming convention is consistent and follows TypeScript best practices.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Column-Operations_3207658095/should-add-columns-to-dataset_1128156327/recording.har (1)

1-134: HAR recording structure is valid.

The HTTP Archive file correctly captures the column addition API interaction. The request structure, headers, and response format appear appropriate for the dataset API. The use of Polly.JS for recording is a standard approach for HTTP interaction testing.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-create-a-new-dataset_1486295619/recording.har (1)

1-130: HAR recording structure is valid.

The HTTP Archive file correctly captures the dataset creation API interaction. The 201 Created status code is appropriate for resource creation, and the response includes comprehensive dataset metadata. The recording structure is suitable for test playback.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-get-dataset-by-slug_1748151842/recording.har (1)

1-126: LGTM - Well-structured test recording for dataset retrieval by slug.

This HAR recording properly captures the GET request for retrieving a dataset by slug, including the correct API endpoint structure (/v2/projects/default/datasets/{slug}), SDK version header, and expected JSON response format with dataset metadata.

packages/traceloop-sdk/src/lib/node-server-sdk.ts (2)

8-22: LGTM - Comprehensive dataset interface exports.

The new dataset-related interface exports cover all necessary aspects of dataset management including CRUD operations, column/row management, CSV import, statistics, and versioning. The exports follow the established patterns and naming conventions.

25-25: LGTM - Proper dataset class exports.

The export of the core dataset management classes (Dataset, Datasets, Column, Row) provides a clean public API for dataset operations.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-list-datasets_1091121199/recording.har (1)

1-135: LGTM - Comprehensive test recording for dataset listing with pagination.

This HAR recording properly captures the GET request for listing datasets with pagination parameters, demonstrating the API's ability to handle larger dataset collections (24 datasets in response). The response structure correctly shows the expected dataset metadata format.

packages/sample-app/src/test_dataset_api.ts (3)

3-21: LGTM - Proper SDK initialization and client setup.

The initialization sequence correctly configures the Traceloop SDK with appropriate settings for testing, including proper async handling with waitForInitialization() and client validation.

22-40: Good error isolation pattern for test execution.

The nested try-catch blocks properly isolate failures per test step, allowing the test suite to continue even when individual tests fail. This provides comprehensive coverage of the API functionality.

247-251: LGTM - Proper error handling and process exit.

The main function execution with error catching and process exit provides appropriate handling for test failures.

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-update-dataset_4001908675/recording.har (1)

60-60: Future dates in HAR recording file

The HAR recording contains future dates (August 2025) which could cause issues with date-based logic or sorting. Consider using realistic dates or clearly marking this as test data.

Also applies to: 110-110, 175-175, 221-221
packages/traceloop-sdk/src/lib/client/dataset/column.ts (1)

53-53: Project-Scoped API Paths Are Handled by TraceloopClient

I confirmed that all calls using /v2/datasets/... are automatically rewritten to include the project scope:

• In packages/traceloop-sdk/src/lib/client/traceloop-client.ts the client’s path normalization does:
return path.replace(
  '/v2/datasets',
  `/v2/projects/${this.projectId}/datasets`
);
No changes are needed in the dataset client – the existing calls on lines 53, 67, and 73 will be properly scoped at runtime.
packages/traceloop-sdk/test/datasets.test.ts (1)

438-466: Excellent error handling test coverage

Great job on including comprehensive error handling tests for both HTTP errors and network errors. This ensures robust error handling in the SDK.
packages/traceloop-sdk/src/lib/client/traceloop-client.ts (1)

48-54: Potential issue with path replacement logic

The current implementation uses path.replace() which will replace the first occurrence of '/v2/datasets' anywhere in the path. This could lead to incorrect replacements if the pattern appears elsewhere in the path.

Consider using a more precise replacement:
 buildDatasetPath(path: string): string {
   // Replace any path that starts with /v2/datasets with the correct project-based path
   if (path.startsWith('/v2/datasets')) {
-    return path.replace('/v2/datasets', `/v2/projects/${this.projectId}/datasets`);
+    return `/v2/projects/${this.projectId}/datasets${path.slice('/v2/datasets'.length)}`;
   }
   return path;
 }
This ensures we only replace the prefix and not any occurrence within the path.

Likely an incorrect or invalid review comment.
packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1)

25-55: Null safety confirmed
The _data property is declared as a non-nullable DatasetResponse and the constructor enforces this type. Our search shows no tests or code paths passing null or undefined into new Dataset(...). No changes are required.

packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts (1)

1-97: Well-structured interface definitions

The interfaces are comprehensive and well-designed, providing clear type definitions for the dataset API functionality.

coderabbitai · 2025-08-06T09:56:20Z

packages/sample-app/src/sample_dataset.ts

+    const openai = new OpenAI({
+      apiKey: process.env.OPENAI_API_KEY
+    });


🛠️ Refactor suggestion

Add validation for OpenAI API key

The code doesn't validate if the OpenAI API key is present before creating the client. This could lead to runtime errors.

+ const openaiApiKey = process.env.OPENAI_API_KEY; + if (!openaiApiKey) { + console.warn("⚠️ OPENAI_API_KEY not found. Skipping LLM interaction simulation."); + // You could still continue with mock data or skip this section + return; + } + const openai = new OpenAI({ - apiKey: process.env.OPENAI_API_KEY + apiKey: openaiApiKey });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const openai = new OpenAI({

apiKey: process.env.OPENAI_API_KEY

});

const openaiApiKey = process.env.OPENAI_API_KEY;

if (!openaiApiKey) {

console.warn("⚠️ OPENAI_API_KEY not found. Skipping LLM interaction simulation.");

// You could still continue with mock data or skip this section

return;

}

const openai = new OpenAI({

apiKey: openaiApiKey

});

🤖 Prompt for AI Agents

In packages/sample-app/src/sample_dataset.ts around lines 98 to 100, the OpenAI API key is used directly without validation, which may cause runtime errors if the key is missing. Add a check before creating the OpenAI client to verify that process.env.OPENAI_API_KEY is defined and not empty. If the key is missing, throw an error or handle it gracefully to prevent proceeding with an invalid configuration.

packages/sample-app/src/test_dataset_api.ts

packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts

packages/traceloop-sdk/src/lib/client/dataset/column.ts

packages/traceloop-sdk/src/lib/client/dataset/row.ts

packages/traceloop-sdk/src/lib/interfaces/index.ts

packages/traceloop-sdk/test/datasets-recording.test.ts

packages/traceloop-sdk/test/datasets.test.ts

ellipsis-dev

Important

Looks good to me! 👍

Reviewed a1c4ab6 in 1 minute and 17 seconds. Click for details.

Reviewed 77 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/src/lib/client/dataset/column.ts:108

Draft comment:
Wrap 'number' case logic in braces for proper scoping and consistency.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/traceloop-sdk/src/lib/client/dataset/column.ts:121

Draft comment:
Wrap 'date' case logic in braces for consistent scoping and improved readability.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/traceloop-sdk/src/lib/client/dataset/dataset.ts:118

Draft comment:
Removed the unused destructured variable in the .find() callback for cleaner code.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. packages/traceloop-sdk/src/lib/client/dataset/dataset.ts:248

Draft comment:
Default parameters now infer type; explicit type annotations were removed in getRows method.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/traceloop-sdk/src/lib/client/dataset/datasets.ts:33

Draft comment:
Removed redundant explicit type annotations in list() parameters; default values sufficiently infer number types.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

6. packages/traceloop-sdk/src/lib/client/dataset/row.ts:112

Draft comment:
Omitting the explicit type for the 'delimiter' parameter in toCSVRow leverages type inference and improves brevity.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_N9QTwhaZ6Tj1nRok

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (2)

293-328: CSV parsing is simplistic and may not handle all edge cases

The custom CSV parsing implementation may not correctly handle complex CSV scenarios.

The parsing logic doesn't handle:

Embedded newlines within quoted fields

Escaped quotes (doubled quotes)

Different line ending formats (CRLF vs LF)

Edge cases with malformed CSV

Consider using a robust CSV parsing library like csv-parse or papaparse for production use.

330-350: CSV line parsing has quote handling issues

The quote detection logic in parseCSVLine is overly simplistic and may not correctly handle edge cases.

Specific issues:

Doesn't handle escaped quotes (doubled quotes like "He said ""Hello"")

Quote detection condition is too simplistic

May incorrectly parse fields with quotes not at boundaries

Consider using a proven CSV library instead of this custom implementation.

🧹 Nitpick comments (1)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1)
49-55: Improve field access pattern for timestamp properties

The getters use fallback chains with type casting to handle API response variations, but this could be fragile and indicates inconsistent API responses.

Consider creating a helper method to handle timestamp field normalization:
+ private getTimestamp(field: 'createdAt' | 'updatedAt'): string {
+   return this._data[field] || 
+          (this._data as any)[field === 'createdAt' ? 'created_at' : 'updated_at'] || 
+          '';
+ }

  get createdAt(): string {
-   return this._data.createdAt || (this._data as any).created_at || '';
+   return this.getTimestamp('createdAt');
  }

  get updatedAt(): string {
-   return this._data.updatedAt || (this._data as any).updated_at || '';
+   return this.getTimestamp('updatedAt');
  }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6955d14 and a1c4ab6.

📒 Files selected for processing (12)

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-list-datasets_1091121199/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Column-Operations_3207658095/should-add-columns-to-dataset_1128156327/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-create-a-new-dataset_1486295619/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-get-dataset-by-slug_1748151842/recording.har (1 hunks)
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-update-dataset_4001908675/recording.har (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/column.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/datasets.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/row.ts (1 hunks)
packages/traceloop-sdk/test/datasets-recording.test.ts (1 hunks)

✅ Files skipped from review due to trivial changes (2)

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-list-datasets_1091121199/recording.har
packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har

🚧 Files skipped from review as they are similar to previous changes (9)

packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-create-a-new-dataset_1486295619/recording.har
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Column-Operations_3207658095/should-add-columns-to-dataset_1128156327/recording.har
packages/traceloop-sdk/test/datasets-recording.test.ts
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-get-dataset-by-slug_1748151842/recording.har
packages/traceloop-sdk/recordings/Test-Dataset-API-Recording_2413299938/Dataset-Creation-and-Management_3667698088/should-update-dataset_4001908675/recording.har
packages/traceloop-sdk/src/lib/client/dataset/row.ts
packages/traceloop-sdk/src/lib/client/dataset/column.ts
packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts
packages/traceloop-sdk/src/lib/client/dataset/datasets.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and test

🔇 Additional comments (10)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (10)

1-16: LGTM: Clean imports and dependencies

The imports are well-organized and include all necessary types and interfaces from the appropriate modules.

17-23: LGTM: Proper class structure and constructor

The class properly extends BaseDataset and maintains internal state through the private _data field.

57-61: LGTM: Proper refresh implementation

The refresh method correctly fetches updated data and handles the response appropriately.

63-73: LGTM: Update method with proper validation and refresh

The method correctly validates the name if provided and refreshes data after update since the API returns empty responses.

140-167: LGTM: Column retrieval with proper data extraction

The method correctly refreshes data and extracts columns from the dataset response format.

169-180: LGTM: Single row addition delegates to batch method

Good design pattern to reuse the batch logic for consistency.

182-230: LGTM: Proper batch row insertion with column mapping

The method correctly transforms column names to IDs for the API and transforms the response back to names.

257-276: LGTM: CSV import with batching for performance

The method properly validates input and uses batching to improve performance for large datasets.

278-291: LGTM: Statistics and versioning methods

Clean implementation of stats retrieval and version management methods.

352-371: LGTM: Appropriate value type parsing

The parseValue method correctly handles common CSV value types including null, boolean, numeric, and string values.

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 41abe67 in 45 seconds. Click for details.

Reviewed 20 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/src/sample_dataset.ts:150

Draft comment:
Good improvement using 'error instanceof Error ? error.message : String(error)' for robust error handling. Consider extracting the error message to a variable to avoid duplicating the logic.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/sample-app/src/sample_dataset.ts:156

Draft comment:
Updating the error response similarly is a good change. As with the log statement, consider reusing the computed error message to reduce duplication.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_9nHxrY93SLmcJzAR

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 95ad80b in 1 minute and 14 seconds. Click for details.

Reviewed 1138 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/src/lib/client/dataset/row.ts:37

Draft comment:
getValue() uses '|| null' which may override valid falsy values (e.g. 0 or false). Consider checking property existence (e.g. using 'hasOwnProperty') instead.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/traceloop-sdk/src/lib/client/dataset/dataset.ts:323

Draft comment:
The custom CSV parser may not handle all edge cases (e.g. escaped quotes or embedded newlines). Consider using a well‐tested CSV parsing library.
Reason this comment was not posted:
Comment was on unchanged code.

3. packages/traceloop-sdk/src/lib/client/dataset/dataset.ts:299

Draft comment:
In fromCSV(), rows are added in sequential batches. If API rate limits allow, consider parallelizing batch calls to improve performance.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_Ri5h2ETtP1acIcsG

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (2)

92-156: Complex response parsing logic needs refactoring.

The addColumn method handles two different API response formats with extensive 'any' casting, making it difficult to maintain and error-prone.

This issue was previously identified and still needs to be addressed. Consider extracting response parsing into separate helper methods for better maintainability.

366-386: Custom CSV parsing may not handle all edge cases.

The parseCSVLine method implements basic CSV parsing but may not correctly handle complex cases like embedded newlines in quoted fields, escaped quotes, or other CSV edge cases.

Consider using a robust CSV parsing library for production use as previously suggested.

🧹 Nitpick comments (1)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1)
49-55: Consider improving timestamp handling and type safety.

The timestamp getters use type casting and fallback to empty strings, which may not be ideal:
  get createdAt(): string {
-   return this._data.createdAt || (this._data as any).created_at || "";
+   return this._data.createdAt || (this._data as any).created_at || new Date().toISOString();
  }

  get updatedAt(): string {
-   return this._data.updatedAt || (this._data as any).updated_at || "";
+   return this._data.updatedAt || (this._data as any).updated_at || new Date().toISOString();
  }
Consider defining proper interfaces to eliminate the need for any casting and provide more meaningful defaults than empty strings.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1c4ab6 and 95ad80b.

📒 Files selected for processing (10)

packages/sample-app/src/sample_dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/column.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/datasets.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/index.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/dataset/row.ts (1 hunks)
packages/traceloop-sdk/src/lib/client/traceloop-client.ts (4 hunks)
packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts (1 hunks)
packages/traceloop-sdk/test/datasets-recording.test.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

packages/sample-app/src/sample_dataset.ts
packages/traceloop-sdk/src/lib/client/dataset/column.ts
packages/traceloop-sdk/src/lib/client/dataset/index.ts
packages/traceloop-sdk/src/lib/client/traceloop-client.ts
packages/traceloop-sdk/src/lib/client/dataset/datasets.ts
packages/traceloop-sdk/src/lib/client/dataset/base-dataset.ts
packages/traceloop-sdk/test/datasets-recording.test.ts
packages/traceloop-sdk/src/lib/client/dataset/row.ts
packages/traceloop-sdk/src/lib/interfaces/dataset.interface.ts

🔇 Additional comments (5)

packages/traceloop-sdk/src/lib/client/dataset/dataset.ts (5)

1-23: LGTM! Clean class structure and imports.

The class structure follows good OOP principles with proper inheritance and encapsulation.

57-90: LGTM! Well-structured CRUD operations.

The methods properly handle async operations, include appropriate validation, and the refresh-after-update pattern compensates for API limitations effectively.

189-280: LGTM! Well-designed row management with good patterns.

The implementation uses good patterns like:

Consistent batch API usage for single and multiple rows

Proper input validation

Column ID transformation for API compatibility

Pagination support for efficient data retrieval

306-321: LGTM! Clean statistics and versioning methods.

Simple, effective implementations with proper async handling and error management.

388-407: LGTM! Well-structured value parsing logic.

The parseValue method handles type conversion appropriately with good precedence: null values, booleans, numbers, then strings as fallback.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 55ba20b in 1 minute and 22 seconds. Click for details.

Reviewed 493 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/sample-app/src/sample_dataset.ts:177

Draft comment:
CSV rows include leading whitespace; ensure the CSV parser trims these lines or remove indentation.
Reason this comment was not posted:
Comment was on unchanged code.

2. packages/sample-app/src/sample_dataset.ts:312

Draft comment:
Safely extract the error message (check error instanceof Error) before accessing error.message.
Reason this comment was not posted:
Comment was on unchanged code.

3. packages/sample-app/src/sample_dataset.ts:113

Draft comment:
Consider parallelizing independent API calls in the simulation loop if execution order isn’t critical.
Reason this comment was not posted:
Confidence changes required: 40% <= threshold 50% None

4. packages/sample-app/src/test_dataset_api.ts:40

Draft comment:
Use a safe error extraction (e.g. checking error instanceof Error) when logging errors to avoid undefined properties.
Reason this comment was not posted:
Comment was on unchanged code.

5. packages/sample-app/src/test_dataset_api.ts:138

Draft comment:
Consider using a dedicated testing framework instead of nested try/catch blocks for improved test reporting and isolation.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_sR8o6KVHneb6bvZt

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…uest matching

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 2e07925 in 55 seconds. Click for details.

Reviewed 70 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/test/datasets-recording.test.ts:7

Draft comment:
Ensure the new FetchAdapter import is necessary for tests using fetch requests.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the new FetchAdapter import is necessary for tests. This falls under the category of asking the author to double-check or confirm something, which is against the rules.

2. packages/traceloop-sdk/test/datasets-recording.test.ts:20

Draft comment:
Check that including both 'node-http' and 'fetch' adapters in setupPolly is intentional and works as expected.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to confirm their intention and ensure that the behavior is as expected, which violates the rules. It doesn't provide a specific suggestion or point out a clear issue.

3. packages/traceloop-sdk/test/datasets-recording.test.ts:23

Draft comment:
Verify that expanding the 'matchRequestsBy' options (method, body, order, url) meets the recording requirements without over-matching or hiding differences.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify their changes, which violates the rule against asking for confirmation or verification. It doesn't provide a specific suggestion or point out a specific issue with the code.

4. packages/traceloop-sdk/test/datasets-recording.test.ts:46

Draft comment:
Hardcoding dummy environment values in non-recording mode ensures consistency; confirm these values are isolated to test contexts only.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that the hardcoded dummy environment values are isolated to test contexts only. This falls under asking the author to confirm their intention, which is against the rules.

5. packages/traceloop-sdk/test/datasets-recording.test.ts:75

Draft comment:
Using a fixed dataset name in recorded mode is good for stability; ensure that this constant value won't cause collisions if tests run concurrently.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that a constant value won't cause collisions, which is a form of asking them to double-check something. This violates the rule against asking the author to ensure behavior is intended or tested.

6. packages/traceloop-sdk/test/datasets-recording.test.ts:104

Draft comment:
The comment noting the recorded response already shows the updated description is helpful; ensure future changes keep test expectations aligned with recorded data.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

Workflow ID: wflow_QTnc4RsHUz7qZBRg

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 26b57b5 in 29 seconds. Click for details.

Reviewed 18 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/traceloop-sdk/test/datasets-recording.test.ts:75

Draft comment:
Formatting update: The ternary assignment for datasetName is now more readable. Ensure consistency across similar patterns.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_fz5Wi0AmW5ZS6ku3

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 85f2547 in 2 minutes and 39 seconds. Click for details.

Reviewed 1184 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1.

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har:30

Draft comment:
Dataset creation values (ID, slug, timestamps) are updated consistently. Verify these changes align with the API contract and expected test flows.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. packages/traceloop-sdk/test/datasets-recording.test.ts:22

Draft comment:
Added 'recordFailedRequests: true' and changed matchRequestsBy.order to true. Confirm these settings are intentional and that request order remains deterministic.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

3. packages/traceloop-sdk/test/datasets-recording.test.ts:76

Draft comment:
The fixed dataset name has been updated to use the new timestamp (2025-08-07T08-56-39-202Z). Ensure this change is intentional and aligns with the HAR recordings.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

4.

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har:485

Draft comment:
The URL contains a timestamp with lowercase 't' and 'z' ("integration-test-2025-08-07t08-56-39-202z"). For consistency with the ISO format used elsewhere (e.g., in the startedDateTime with uppercase 'T' and 'Z'), consider normalizing the case.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This appears to be a test recording file. The case difference in timestamps is likely not a functional issue since URLs are typically case-insensitive. The inconsistency seems purely cosmetic. The comment doesn't point to any actual problems that need fixing. Additionally, since this is a recording file, it's capturing actual HTTP traffic - changing the URL format might not even be possible if that's how the API expects to receive it. The inconsistency in timestamp formats could potentially indicate an underlying issue with how dates are being formatted in the codebase. Maybe there's a mix of date formatting functions being used? While date formatting consistency is generally good practice, in this case we're looking at URLs in a test recording file. The lowercase format in URLs is likely intentional or required by the API, and changing it could break functionality. The inconsistency between URL format and response format is not a problem that needs fixing. This comment should be deleted as it suggests a purely cosmetic change to a test recording file, doesn't identify a real issue that needs fixing, and could potentially cause problems if implemented.

5.

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har:605

Draft comment:
Similarly, the URL on this line uses lowercase 't' and 'z' in the timestamp. It would be more consistent to match the standard ISO format (uppercase 'T' and 'Z') if applicable.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This appears to be a test recording file that captures HTTP traffic. The timestamps in URLs are likely generated by test code. While consistency in formats is generally good, this is test data and the actual format doesn't impact functionality. The URLs work fine either way. Making this change would be purely cosmetic and not improve the code's behavior. The comment does point out an inconsistency in the codebase. Having consistent timestamp formats could make the code more maintainable and easier to understand. However, this is a test recording file containing historical HTTP traffic data. Changing timestamp formats in URLs here wouldn't affect the actual application code or behavior. It would just be modifying test data. The comment should be deleted as it suggests a purely cosmetic change to test data that wouldn't improve code quality or functionality.

6.

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har:722

Draft comment:
Typo: In the dataset object, the 'slug' field uses lowercase ('integration-test-2025-08-07t08-56-39-202z') while the 'name' field uses uppercase 'T' and 'Z' ('integration-test-2025-08-07T08-56-39-202Z'). Please verify if this inconsistency in letter casing is intended.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This appears to be intentional API behavior where slugs are always lowercase for URL compatibility while names preserve original casing. The comment is asking for verification of intended behavior, which violates our rules. We should not ask authors to verify things that appear to be working as designed. Additionally, this is not suggesting any actual code changes. Could this inconsistency cause bugs or usability issues that should be addressed? Maybe there's a good reason to make the casing consistent? No - slugs being lowercase is a common web practice for URLs, while preserving original name casing is good for display. This is standard practice and working as intended. Delete the comment. This is asking for verification of behavior that appears intentional and follows standard web practices. No actual code changes are being suggested.

7.

packages/traceloop-sdk/recordings/Dataset-Integration-Test_2732516487/should-create-and-manage-a-dataset_3398104721/recording.har:941

Draft comment:
Typo: In the POST request payload, the JSON property is labeled as "Rows" (with an uppercase 'R') whereas the response uses "rows" in lowercase. Please check if this casing is correct or should be consistent.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The API appears to be working correctly despite the casing difference. The response is successful with status 201. This suggests the API is case-insensitive for this field. Additionally, this is a test file recording HTTP interactions, not the actual implementation code. The casing difference doesn't affect functionality and pointing it out doesn't add value. The casing inconsistency could indicate an underlying API design issue. If the API documentation specifies a particular casing, this could be violating that contract. Since this is a test recording file and the API works correctly, maintaining strict casing consistency here isn't critical. The actual API implementation is what matters, not the test recordings. Delete the comment. The casing difference is not impacting functionality, and this is just a test recording file, not implementation code.

Workflow ID: wflow_FYd3IbCSwt4tZ7J1

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

feat: add dataset api

bb5536a

galzilber closed this Aug 6, 2025

ellipsis-dev bot reviewed Aug 6, 2025

View reviewed changes

fix

6955d14

galzilber reopened this Aug 6, 2025

ellipsis-dev bot reviewed Aug 6, 2025

View reviewed changes

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

galzilber added 2 commits August 7, 2025 11:05

add new records

52e8eef

remove no need

a1c4ab6