What happened
In intel_owl/settings/storage.py, the S3Boto3StorageWrapper.retrieve() method caches
downloaded S3 files under MEDIA_ROOT/<analyzer>/<filename>. This means the file
The file from S3 is downloaded and is stored separately for each process.
If 10 analyzers are set to run on the same sample, the file will be retrieved from the disk, written to the HDD or sent over the network 10 times.
Environment
- OS: Any
- IntelOwl version: latest develop
What did you expect to happen
The file should be cached once at MEDIA_ROOT/ and then reused by all the analyzers.
avoiding redundant S3 HTTP requests and duplicate disk writes.
How to reproduce your issue
- Set
LOCAL_STORAGE=False (S3 mode in env_file_app)
- Upload a file and run multiple analyzers on it
- Inspect MEDIA_ROOT/ — a separate subdirectory per analyzer is created with identical file content.
Error messages and logs
No error. This is a known FIXME left by maintainers:
https://github.com/intelowlproject/IntelOwl/blob/develop/intel_owl/settings/storage.py#L37
What happened
In
intel_owl/settings/storage.py, theS3Boto3StorageWrapper.retrieve()method cachesdownloaded S3 files under
MEDIA_ROOT/<analyzer>/<filename>. This means the fileThe file from S3 is downloaded and is stored separately for each process.
If 10 analyzers are set to run on the same sample, the file will be retrieved from the disk, written to the HDD or sent over the network 10 times.
Environment
What did you expect to happen
The file should be cached once at MEDIA_ROOT/ and then reused by all the analyzers.
avoiding redundant S3 HTTP requests and duplicate disk writes.
How to reproduce your issue
LOCAL_STORAGE=False(S3 mode in env_file_app)Error messages and logs
No error. This is a known FIXME left by maintainers:
https://github.com/intelowlproject/IntelOwl/blob/develop/intel_owl/settings/storage.py#L37