Question about merge_input function - Does it really include different resolutions?

https://github.com/jy0205/Pyramid-Flow/blob/a012faa1dc4d71301a7a153c7f9554c081947ea2/pyramid_dit/flux_modules/modeling_pyramid_flux.py#L242

Hi, I have a question regarding the merge_input function in your code. Specifically, the docstring mentions:

```python
def merge_input(self, sample, encoder_hidden_length, encoder_attention_mask):
    """
        Merge the input video with different resolutions into one sequence
        Sample: From low resolution to high resolution
    """
```
However, when looking at the implementation, it seems to me that this function might not actually handle different resolutions, but rather incorporates historical frame information. Could you please clarify if this function indeed processes inputs of varying resolutions, or if it only deals with historical conditions from past frames?

Thank you for your time and for providing this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about merge_input function - Does it really include different resolutions? #231

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about merge_input function - Does it really include different resolutions? #231

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions