generated from fastai/nbdev_template
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.3k
Open
Labels
β‘ PEFTRelated to PEFTRelated to PEFTπ RLOORelated to RLOORelated to RLOOπ SFTRelated to SFTRelated to SFTπ bugSomething isn't workingSomething isn't working
Description
CI fails with dev dependencies: https://github.com/huggingface/trl/actions/runs/18712033739/job/53362520844
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
Multiple tests:
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_beta_non_zero - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_peft - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_and_importance_sampling - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_and_liger - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_grpo_trainer.py::TestGRPOTrainer::test_training_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm[trl-internal-testing/tiny-Qwen2VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_beta_non_zero - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_peft - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_rloo_trainer.py::TestRLOOTrainer::test_training_vlm_multi_image - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_prompt_completion - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  FAILED tests/test_sft_trainer.py::TestSFTTrainer::test_train_vlm_text_only_data - huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.Stacktrace:
_ TestGRPOTrainer.test_training_vlm[trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration] _
  [gw3] linux -- Python 3.12.12 /__w/trl/trl/.venv/bin/python3
  
  path_or_repo_id = '', filenames = ['processor_config.json']
  cache_dir = '/github/home/.cache/huggingface/hub', force_download = False
  proxies = None, token = None, revision = None, local_files_only = False
  subfolder = '', repo_type = None
  user_agent = 'transformers/5.0.0.dev0; python/3.12.12; session_id/2047a935e364492090f99c27947a738f; torch/2.9.0'
  _raise_exceptions_for_gated_repo = False
  _raise_exceptions_for_missing_entries = False
  _raise_exceptions_for_connection_errors = False, _commit_hash = None
  deprecated_kwargs = {}, full_filenames = ['processor_config.json']
  existing_files = [], filename = 'processor_config.json', file_counter = 0
  
      def cached_files(
          path_or_repo_id: str | os.PathLike,
          filenames: list[str],
          cache_dir: str | os.PathLike | None = None,
          force_download: bool = False,
          proxies: dict[str, str] | None = None,
          token: bool | str | None = None,
          revision: str | None = None,
          local_files_only: bool = False,
          subfolder: str = "",
          repo_type: str | None = None,
          user_agent: str | dict[str, str] | None = None,
          _raise_exceptions_for_gated_repo: bool = True,
          _raise_exceptions_for_missing_entries: bool = True,
          _raise_exceptions_for_connection_errors: bool = True,
          _commit_hash: str | None = None,
          **deprecated_kwargs,
      ) -> str | None:
          """
          Tries to locate several files in a local folder and repo, downloads and cache them if necessary.
      
          Args:
              path_or_repo_id (`str` or `os.PathLike`):
                  This can be either:
                  - a string, the *model id* of a model repo on huggingface.co.
                  - a path to a *directory* potentially containing the file.
              filenames (`list[str]`):
                  The name of all the files to locate in `path_or_repo`.
              cache_dir (`str` or `os.PathLike`, *optional*):
                  Path to a directory in which a downloaded pretrained model configuration should be cached if the standard
                  cache should not be used.
              force_download (`bool`, *optional*, defaults to `False`):
                  Whether or not to force to (re-)download the configuration files and override the cached versions if they
                  exist.
              proxies (`dict[str, str]`, *optional*):
                  A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
                  'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
              token (`str` or *bool*, *optional*):
                  The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
                  when running `hf auth login` (stored in `~/.huggingface`).
              revision (`str`, *optional*, defaults to `"main"`):
                  The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
                  git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
                  identifier allowed by git.
              local_files_only (`bool`, *optional*, defaults to `False`):
                  If `True`, will only try to load the tokenizer configuration from local files.
              subfolder (`str`, *optional*, defaults to `""`):
                  In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
                  specify the folder name here.
              repo_type (`str`, *optional*):
                  Specify the repo type (useful when downloading from a space for instance).
      
          Private args:
              _raise_exceptions_for_gated_repo (`bool`):
                  if False, do not raise an exception for gated repo error but return None.
              _raise_exceptions_for_missing_entries (`bool`):
                  if False, do not raise an exception for missing entries but return None.
              _raise_exceptions_for_connection_errors (`bool`):
                  if False, do not raise an exception for connection errors but return None.
              _commit_hash (`str`, *optional*):
                  passed when we are chaining several calls to various files (e.g. when loading a tokenizer or
                  a pipeline). If files are cached for this commit hash, avoid calls to head and get from the cache.
      
          <Tip>
      
          Passing `token=True` is required when you want to use a private model.
      
          </Tip>
      
          Returns:
              `Optional[str]`: Returns the resolved file (to the cache folder if downloaded from a repo).
      
          Examples:
      
          ```python
          # Download a model weight from the Hub and cache it.
          model_weights_file = cached_file("google-bert/bert-base-uncased", "pytorch_model.bin")
          ```
          """
          if is_offline_mode() and not local_files_only:
              logger.info("Offline mode: forcing local_files_only=True")
              local_files_only = True
          if subfolder is None:
              subfolder = ""
      
          # Add folder to filenames
          full_filenames = [os.path.join(subfolder, file) for file in filenames]
      
          path_or_repo_id = str(path_or_repo_id)
          existing_files = []
          for filename in full_filenames:
              if os.path.isdir(path_or_repo_id):
                  resolved_file = os.path.join(path_or_repo_id, filename)
                  if not os.path.isfile(resolved_file):
                      if _raise_exceptions_for_missing_entries and filename != os.path.join(subfolder, "config.json"):
                          revision_ = "main" if revision is None else revision
                          raise OSError(
                              f"{path_or_repo_id} does not appear to have a file named {filename}. Checkout "
                              f"'[https://huggingface.co/{path_or_repo_id}/tree/{revision_}](https://huggingface.co/%7Bpath_or_repo_id%7D/tree/%7Brevision_%7D)' for available files."
                          )
                      else:
                          continue
                  existing_files.append(resolved_file)
      
          if os.path.isdir(path_or_repo_id):
              return existing_files if existing_files else None
      
          if cache_dir is None:
              cache_dir = TRANSFORMERS_CACHE
          if isinstance(cache_dir, Path):
              cache_dir = str(cache_dir)
      
          existing_files = []
          file_counter = 0
          if _commit_hash is not None and not force_download:
              for filename in full_filenames:
                  # If the file is cached under that commit hash, we return it directly.
                  resolved_file = try_to_load_from_cache(
                      path_or_repo_id, filename, cache_dir=cache_dir, revision=_commit_hash, repo_type=repo_type
                  )
                  if resolved_file is not None:
                      if resolved_file is not _CACHED_NO_EXIST:
                          file_counter += 1
                          existing_files.append(resolved_file)
                      elif not _raise_exceptions_for_missing_entries:
                          file_counter += 1
                      else:
                          raise OSError(f"Could not locate {filename} inside {path_or_repo_id}.")
      
          # Either all the files were found, or some were _CACHED_NO_EXIST but we do not raise for missing entries
          if file_counter == len(full_filenames):
              return existing_files if len(existing_files) > 0 else None
      
          user_agent = http_user_agent(user_agent)
          # download the files if needed
          try:
              if len(full_filenames) == 1:
                  # This is slightly better for only 1 file
  >               hf_hub_download(
                      path_or_repo_id,
                      filenames[0],
                      subfolder=None if len(subfolder) == 0 else subfolder,
                      repo_type=repo_type,
                      revision=revision,
                      cache_dir=cache_dir,
                      user_agent=user_agent,
                      force_download=force_download,
                      proxies=proxies,
                      token=token,
                      local_files_only=local_files_only,
                  )
  
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:469: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:85: in _inner_fn
      validate_repo_id(arg_value)
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  repo_id = ''
  
      def validate_repo_id(repo_id: str) -> None:
          """Validate `repo_id` is valid.
      
          This is not meant to replace the proper validation made on the Hub but rather to
          avoid local inconsistencies whenever possible (example: passing `repo_type` in the
          `repo_id` is forbidden).
      
          Rules:
          - Between 1 and 96 characters.
          - Either "repo_name" or "namespace/repo_name"
          - [a-zA-Z0-9] or "-", "_", "."
          - "--" and ".." are forbidden
      
          Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
      
          Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
      
          Example:
          ```py
          >>> from huggingface_hub.utils import validate_repo_id
          >>> validate_repo_id(repo_id="valid_repo_id")
          >>> validate_repo_id(repo_id="other..repo..id")
          huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
          ```
      
          Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
          In moon-landing (internal repository):
          - https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
          - https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
          """
          if not isinstance(repo_id, str):
              # Typically, a Path is not a repo_id
              raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
      
          if repo_id.count("/") > 1:
              raise HFValidationError(
                  "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
                  f" '{repo_id}'. Use `repo_type` argument if needed."
              )
      
          if not REPO_ID_REGEX.match(repo_id):
  >           raise HFValidationError(
                  "Repo id must use alphanumeric chars, '-', '_' or '.'."
                  " The name cannot start or end with '-' or '.' and the maximum length is 96:"
                  f" '{repo_id}'."
              )
  E           huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:135: HFValidationError
  
  During handling of the above exception, another exception occurred:
  
  self = <tests.test_grpo_trainer.TestGRPOTrainer object at 0x7fe900d1c470>
  model_id = 'trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration'
  
      @pytest.mark.parametrize(
          "model_id",
          [
              "trl-internal-testing/tiny-Gemma3ForConditionalGeneration",
              "trl-internal-testing/tiny-LlavaNextForConditionalGeneration",
              "trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration",
              "trl-internal-testing/tiny-Qwen2VLForConditionalGeneration",
              # "trl-internal-testing/tiny-SmolVLMForConditionalGeneration", seems not to support bf16 properly
          ],
      )
      @require_vision
      def test_training_vlm(self, model_id):
          dataset = load_dataset("trl-internal-testing/zen-image", "conversational_prompt_only", split="train")
      
          def reward_func(completions, **kwargs):
              """Reward function that rewards longer completions."""
              return [float(len(completion[0]["content"])) for completion in completions]
      
          training_args = GRPOConfig(
              output_dir=self.tmp_dir,
              learning_rate=0.1,  # increase the learning rate to speed up the test
              per_device_train_batch_size=3,  # reduce the batch size to reduce memory usage
              num_generations=3,  # reduce the number of generations to reduce memory usage
              max_completion_length=8,  # reduce the completion length to reduce memory usage
              max_prompt_length=None,  # disable prompt truncation, because usually, models don't support it
              report_to="none",
          )
  >       trainer = GRPOTrainer(
              model=model_id,
              reward_funcs=reward_func,
              args=training_args,
              train_dataset=dataset,
          )
  
  tests/test_grpo_trainer.py:1279: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  trl/trainer/grpo_trainer.py:281: in __init__
      processing_class = AutoProcessor.from_pretrained(model.config._name_or_path, truncation_side="left")
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  .venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py:287: in from_pretrained
      processor_config_file = cached_file(pretrained_model_name_or_path, PROCESSOR_NAME, **cached_file_kwargs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:326: in cached_file
      file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:520: in cached_files
      _get_cache_file_to_return(path_or_repo_id, filename, cache_dir, revision, repo_type)
  .venv/lib/python3.12/site-packages/transformers/utils/hub.py:152: in _get_cache_file_to_return
      resolved_file = try_to_load_from_cache(
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:85: in _inner_fn
      validate_repo_id(arg_value)
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  repo_id = ''
  
      def validate_repo_id(repo_id: str) -> None:
          """Validate `repo_id` is valid.
      
          This is not meant to replace the proper validation made on the Hub but rather to
          avoid local inconsistencies whenever possible (example: passing `repo_type` in the
          `repo_id` is forbidden).
      
          Rules:
          - Between 1 and 96 characters.
          - Either "repo_name" or "namespace/repo_name"
          - [a-zA-Z0-9] or "-", "_", "."
          - "--" and ".." are forbidden
      
          Valid: `"foo"`, `"foo/bar"`, `"123"`, `"Foo-BAR_foo.bar123"`
      
          Not valid: `"datasets/foo/bar"`, `".repo_id"`, `"foo--bar"`, `"foo.git"`
      
          Example:
          ```py
          >>> from huggingface_hub.utils import validate_repo_id
          >>> validate_repo_id(repo_id="valid_repo_id")
          >>> validate_repo_id(repo_id="other..repo..id")
          huggingface_hub.utils._validators.HFValidationError: Cannot have -- or .. in repo_id: 'other..repo..id'.
          ```
      
          Discussed in https://github.com/huggingface/huggingface_hub/issues/1008.
          In moon-landing (internal repository):
          - https://github.com/huggingface/moon-landing/blob/main/server/lib/Names.ts#L27
          - https://github.com/huggingface/moon-landing/blob/main/server/views/components/NewRepoForm/NewRepoForm.svelte#L138
          """
          if not isinstance(repo_id, str):
              # Typically, a Path is not a repo_id
              raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
      
          if repo_id.count("/") > 1:
              raise HFValidationError(
                  "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
                  f" '{repo_id}'. Use `repo_type` argument if needed."
              )
      
          if not REPO_ID_REGEX.match(repo_id):
  >           raise HFValidationError(
                  "Repo id must use alphanumeric chars, '-', '_' or '.'."
                  " The name cannot start or end with '-' or '.' and the maximum length is 96:"
                  f" '{repo_id}'."
              )
  E           huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: ''.
  
  .venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:135: HFValidationErrorMetadata
Metadata
Assignees
Labels
β‘ PEFTRelated to PEFTRelated to PEFTπ RLOORelated to RLOORelated to RLOOπ SFTRelated to SFTRelated to SFTπ bugSomething isn't workingSomething isn't working