Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
6 changes: 4 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ This guide explains how to contribute to the Kubeflow SDK project.
For the Kubeflow SDK documentation, please check [the official Kubeflow documentation](https://www.kubeflow.org/docs/components/).

## Requirements

- [Supported Python version](./pyproject.toml#L4)
- [pre-commit](https://pre-commit.com/)
- [uv](https://docs.astral.sh/uv/getting-started/installation/)


## Development

The Kubeflow SDK project includes a Makefile with several helpful commands to streamline your development workflow.
Expand All @@ -20,6 +20,7 @@ make install-dev
```

### Coding Style

Make sure to install [pre-commit](https://pre-commit.com/) (`uv pip install pre-commit`) and run `pre-commit install` from the root of the repository at least once before creating git commits.

The pre-commit hooks ensure code quality and consistency. They are executed in CI. PRs that fail to comply with the hooks will not be able to pass the corresponding CI gate. The hooks are only executed against staged files unless you run `pre-commit run --all`, in which case, they'll be executed against every file in the repository.
Expand All @@ -37,18 +38,19 @@ make verify
The Kubeflow SDK project includes several types of tests to ensure code quality and functionality.

### Unit Testing

To run unit tests locally use the following make command:

```shell
make test-python
```

### E2E Tests

E2E test run in CI on a kind cluster using [Kubeflow Trainer E2E Scripts](https://github.com/kubeflow/trainer/blob/master/CONTRIBUTING.md#e2e-tests).
Clone the `Kubeflow Trainer` repo and run the provided commands against `Trainer` Makefile.
For more details check [the Kubeflow Trainer Contributing Guide](https://github.com/kubeflow/trainer/blob/master/CONTRIBUTING.md#e2e-tests).


## Best Practices

### Pull Request Title Conventions
Expand Down
5 changes: 2 additions & 3 deletions kubeflow/trainer/api/trainer_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def list_runtimes(self) -> list[types.Runtime]:
return self.backend.list_runtimes()

def get_runtime(self, name: str) -> types.Runtime:
"""Get the runtime object
"""Get the runtime object.

Args:
name: Name of the runtime.
Expand Down Expand Up @@ -165,7 +165,7 @@ def list_jobs(self, runtime: types.Runtime | None = None) -> list[types.TrainJob
return self.backend.list_jobs(runtime=runtime)

def get_job(self, name: str) -> types.TrainJob:
"""Get the TrainJob object
"""Get the TrainJob object.

Args:
name: Name of the TrainJob.
Expand Down Expand Up @@ -204,7 +204,6 @@ def get_job_logs(
Returns:
Iterator of log lines.


Raises:
TimeoutError: Timeout to get a TrainJob.
RuntimeError: Failed to get a TrainJob.
Expand Down
4 changes: 2 additions & 2 deletions kubeflow/trainer/types/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ class TorchTuneInstructDataset:
@dataclass
class LoraConfig:
"""Configuration for the LoRA/QLoRA/DoRA.
REF: https://meta-pytorch.org/torchtune/main/tutorials/memory_optimizations.html
REF: https://pytorch.org/torchtune/main/tutorials/lora_finetune.html

Args:
apply_lora_to_mlp (`Optional[bool]`):
Expand Down Expand Up @@ -200,7 +200,7 @@ class TorchTuneConfig:
batch_size (`Optional[int]`):
The number of samples processed before updating model weights.
epochs (`Optional[int]`):
The number of samples processed before updating model weights.
The number of times the full training dataset is passed through the model.
loss (`Optional[Loss]`): The loss algorithm we use to fine-tune the LLM,
e.g. `torchtune.modules.loss.CEWithChunkedOutputLoss`.
num_nodes (`Optional[int]`): The number of nodes to use for training.
Expand Down
Loading