[Oneshot] Add validation for empty dataset and enhance oneshot function parameters #1957

ArkaSanka · 2025-10-21T19:38:58Z

Fix argument handling in oneshot function #1850

Issue Description

The oneshot function signature in oneshot.py was missing several parameters that exist in the underlying dataclasses (DatasetArguments, ModelArguments, RecipeArguments). This caused issues when users tried to use these parameters directly, particularly with:

sequential_targets: Conflicts occurred between recipe modifiers and direct parameters
preprocessing_func: Returns an error when the dataset is empty
pipeline: Not properly validated against sequential_targets

Changes Made

Parameter Alignment:

Updated the oneshot function signature to include all missing parameters from the argument dataclasses
Ensured type hints and default values match those defined in the dataclasses
Added missing parameters: preprocessing_func, data_collator, raw_kwargs, max_train_samples, pipeline, tracing_ignore, sequential_targets

Validation Logic:

Added validation to detect conflicting sequential_targets between recipe modifiers and direct parameters
Added validation to prevent incompatible pipeline settings with sequential_targets
Fixed error message formatting to comply with style guidelines

Test Improvements:

Updated the test fixture in test_api_inputs.py to handle all parameters correctly
Added detection for potential parameter conflicts to make tests more robust

Impact

These changes ensure that all parameters defined in the argument dataclasses can be used directly with the oneshot function without unexpected behavior. Users can now pass parameters like sequential_targets and preprocessing_func directly to oneshot without running into cryptic errors or unexpected behavior. The API is now more consistent with its underlying implementation, making it more intuitive to use.

github-actions · 2025-10-21T19:39:13Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2025-10-21T19:39:14Z

Summary of Changes

Hello @ArkaSanka, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the oneshot function's usability and robustness by aligning its parameter signature with its internal dataclasses and introducing critical validation checks. It ensures that users can pass a wider range of configuration options directly to oneshot without encountering unexpected behavior or cryptic errors, while also preventing operations on empty datasets and incompatible pipeline settings.

Highlights

Enhanced oneshot Function Signature: The oneshot function now includes previously missing parameters like preprocessing_func, pipeline, and sequential_targets, aligning its signature with underlying dataclasses for improved usability.
Improved Parameter Validation: New validation checks prevent conflicting configurations, specifically disallowing sequential_targets when the pipeline is set to 'independent', ensuring more robust function calls.
Empty Dataset Safeguard: A new check has been added to prevent calibration from proceeding with an empty dataset, raising a ValueError for clarity and preventing potential issues.
Robust Test Coverage: The test suite has been updated to correctly handle the expanded oneshot parameters and gracefully manage empty dataset scenarios during testing, making tests more reliable.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances the oneshot function by aligning its parameters with the underlying argument dataclasses, which makes the API more complete and intuitive. It also introduces valuable validation for empty datasets and incompatible parameter combinations, such as sequential_targets with an independent pipeline. The tests have been updated accordingly to cover these new parameters and handle potential data-related issues more gracefully. My review focuses on a potential issue in the test suite where preprocessing_func is wrapped in a tuple, which seems to contradict its type hint and the stated goal of the pull request. I have also included a minor style suggestion to improve code maintainability.

tests/llmcompressor/transformers/oneshot/test_api_inputs.py

ArkaSanka · 2025-10-22T12:58:41Z

Hi @kylesayrs, @dsikka, let me know if there are any additional/missing changes to be made.

src/llmcompressor/entrypoints/oneshot.py

Signed-off-by: Arka Sanka <[email protected]> Refactor oneshot function parameters to use Optional types and enhance documentation Signed-off-by: Arka Sanka <[email protected]>

brian-dellabetta · 2025-11-06T20:28:49Z

tests/llmcompressor/transformers/oneshot/test_api_inputs.py

+    logger.info(f"Dataset type: {type(one_shot_args.get('dataset'))}")
+    if isinstance(one_shot_args.get("dataset"), str):
+        logger.info(f"Dataset name: {one_shot_args.get('dataset')}")
+        logger.info(f"Dataset config: {one_shot_args.get('dataset_config_name')}")
+    try:
+        # Call oneshot with all parameters as flat arguments
+        oneshot(
+            **one_shot_args,
+            output_dir=tmp_path,
+            num_calibration_samples=10,
+            pad_to_max_length=False,
+        )
+
+    except ValueError as e:
+        if "num_samples should be a positive integer value" in str(
+            e
+        ) or "Dataset is empty. Cannot create a calibration dataloader" in str(e):
+            logger.warning(f"Dataset is empty: {one_shot_args.get('dataset')}")
+            pytest.skip(f"Dataset is empty: {one_shot_args.get('dataset')}")
+        else:
+            raise  # Re-raise other ValueError exceptions


can you explain why you needed to add these changes? if you are asserting that a certain pathway raises an error in a test, you can do that with

with pytest.raises(ValueError):

there are examples of this in the code

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

tests/llmcompressor/transformers/oneshot/test_api_inputs.py Outdated Show resolved Hide resolved

tests/llmcompressor/transformers/oneshot/test_api_inputs.py Outdated Show resolved Hide resolved

ArkaSanka force-pushed the oneshot-dataset-params branch from 83ebf56 to b882a49 Compare October 21, 2025 20:12

ArkaSanka changed the title ~~Add validation for empty dataset and enhance oneshot function parameters~~ [Oneshot] Add validation for empty dataset and enhance oneshot function parameters Oct 21, 2025

kylesayrs self-assigned this Oct 22, 2025

kylesayrs self-requested a review October 22, 2025 01:59

dsikka assigned ArkaSanka and unassigned kylesayrs Oct 22, 2025

ArkaSanka force-pushed the oneshot-dataset-params branch from b882a49 to 845aac4 Compare October 22, 2025 12:55

ArkaSanka force-pushed the oneshot-dataset-params branch from 845aac4 to 3d96b12 Compare October 26, 2025 09:40

brian-dellabetta reviewed Oct 27, 2025

View reviewed changes

src/llmcompressor/entrypoints/oneshot.py Show resolved Hide resolved

ArkaSanka force-pushed the oneshot-dataset-params branch from 3d96b12 to 3b25767 Compare October 27, 2025 17:18

ArkaSanka requested a review from brian-dellabetta October 27, 2025 21:34

brian-dellabetta reviewed Oct 27, 2025

View reviewed changes

src/llmcompressor/entrypoints/oneshot.py Outdated Show resolved Hide resolved

ArkaSanka force-pushed the oneshot-dataset-params branch 3 times, most recently from 2737670 to 7a809b6 Compare October 29, 2025 21:30

Add validation for empty dataset and enhance oneshot function parameters

59312d5

Signed-off-by: Arka Sanka <[email protected]> Refactor oneshot function parameters to use Optional types and enhance documentation Signed-off-by: Arka Sanka <[email protected]>

ArkaSanka force-pushed the oneshot-dataset-params branch from 7a809b6 to 59312d5 Compare November 2, 2025 09:56

brian-dellabetta reviewed Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Oneshot] Add validation for empty dataset and enhance oneshot function parameters #1957

[Oneshot] Add validation for empty dataset and enhance oneshot function parameters #1957

ArkaSanka commented Oct 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

gemini-code-assist bot commented Oct 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ArkaSanka commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Oneshot] Add validation for empty dataset and enhance oneshot function parameters #1957

Are you sure you want to change the base?

[Oneshot] Add validation for empty dataset and enhance oneshot function parameters #1957

Conversation

ArkaSanka commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix argument handling in oneshot function #1850

Issue Description

Changes Made

Parameter Alignment:

Validation Logic:

Test Improvements:

Impact

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

gemini-code-assist bot commented Oct 21, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ArkaSanka commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

brian-dellabetta Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArkaSanka commented Oct 21, 2025 •

edited

Loading