Skip to content

Add streaming preprocess mode for inference with optional disk persistence#29

Draft
vadanamu wants to merge 1 commit into
devfrom
codex/update-inference-preprocess-for-streaming-mode
Draft

Add streaming preprocess mode for inference with optional disk persistence#29
vadanamu wants to merge 1 commit into
devfrom
codex/update-inference-preprocess-for-streaming-mode

Conversation

@vadanamu

Copy link
Copy Markdown
Owner

Motivation

  • The existing inference flow requires preprocessing to write .npz shards to disk before running inference, which adds I/O and prevents online inference.
  • Provide a stream mode that preprocesses POD5/BAM on-the-fly and feeds chunks directly to the inference pipeline while optionally allowing the same chunks to be persisted to disk.

Description

  • Add CLI options to deeprm call run: --preprocess-mode (disk|stream), --pod5, --stream-save-dir, and several --prep-* knobs to control streaming preprocess behavior.
  • Extend src/deeprm/inference/inference_preprocess_python.py with _prepare_bam_dataframe, _list_pod5_paths, stream(...), dataframe_to_chunk(...), and make segment_normalize_signal(...) emit in-memory chunk payloads via an emit_chunk_fn while optionally writing .npz via write_to_disk.
  • Refactor inference internals in src/deeprm/inference/inference.py by adding _init_model_for_device(...) and _run_chunk_inference(...), and implement run_inference_stream(...) to consume preprocessed chunks via callback and run inference without intermediate files.
  • Preserve existing disk-backed behavior (preprocess-mode=disk) and make streamed execution a drop-in alternative that can also persist streamed chunks when --stream-save-dir is provided.

Testing

  • Ran python -m compileall -q src/deeprm/inference/inference.py src/deeprm/inference/inference_preprocess_python.py which succeeded.
  • Ran pytest -q tests/test_imports.py which passed.
  • Ran full pytest -q which failed in this environment because CLI help tests invoke the deeprm console script that is not installed on the PATH (automated failure unrelated to the diff logic).
  • Running PYTHONPATH=src python -m deeprm call run --help printed a Torch availability message and did not proceed because torch is not installed in the test environment (environment limitation, not a code error).

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant