Skip to content

[Performance] S3Boto3StorageWrapper caches file per-analyzer instead of sharing a single cached copy #3459

@mannubaveja007

Description

@mannubaveja007

What happened

In intel_owl/settings/storage.py, the S3Boto3StorageWrapper.retrieve() method caches
downloaded S3 files under MEDIA_ROOT/<analyzer>/<filename>. This means the file
The file from S3 is downloaded and is stored separately for each process.

If 10 analyzers are set to run on the same sample, the file will be retrieved from the disk, written to the HDD or sent over the network 10 times.

Environment

  1. OS: Any
  2. IntelOwl version: latest develop

What did you expect to happen

The file should be cached once at MEDIA_ROOT/ and then reused by all the analyzers.
avoiding redundant S3 HTTP requests and duplicate disk writes.

How to reproduce your issue

  1. Set LOCAL_STORAGE=False (S3 mode in env_file_app)
  2. Upload a file and run multiple analyzers on it
  3. Inspect MEDIA_ROOT/ — a separate subdirectory per analyzer is created with identical file content.

Error messages and logs

No error. This is a known FIXME left by maintainers:
https://github.com/intelowlproject/IntelOwl/blob/develop/intel_owl/settings/storage.py#L37

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions