Skip to content

Conversation

@bentyeh
Copy link

@bentyeh bentyeh commented May 8, 2025

Overview

Problem: An IndexError can be raised on line 384 of uint8_to_bed_parallel.py when trying to access an out-of-bounds index of ar_quant:

out_link.write(str(ar_quant[each_pos]) + "\n")

Fix: this commit both simplifies the logic of the code and fixes the bug.

Details

Relevant variables

  • uniquely_mappable: single read mappability array of length N
  • ar_quant: multi-read mappability array of length N (length of chromosome)
  • unimap_diff: array of length N - 1
  • poses_start: start position of non-zero "runs"; initially (line 367) calculated as a value 1 less than the desired 0-based (start-open, end-closed) index
  • poses_end: end position of non-zero "runs"; initially (line 368) calculated as a value 1 less than the desired 0-based (start-open, end-closed) index

Here, I annotate the original code to clarify its logic.

if len(poses_start) != len(poses_end):
    # The length of poses_start may be different than the length of poses_end if the
    # first and last elements of ar_quant are not both zero or nonzero.
    if len(poses_start) > len(poses_end):
        # - If the last element of ar_quant is nonzero, then
        #   poses_end is missing the end position of the last run, because the last run did not "finish."
        # - Consequently, we append an end position to poses_end. At this point in the code, 
        #   poses_end contains indices of value 1 less than the desired 0-based (start-open, end-closed) index.
        #   So the value appended should be (len(uniquely_mappable) - 1), not len(uniquely_mappable).
        poses_end = np.append(poses_end, [len(uniquely_mappable)])
    else:
        # - If the first element of ar_quant is nonzero, then
        #   poses_start is missing the start position of the first run.
        # - Consequently, we insert a start position of 0 to poses_start.
        poses_start = np.append([0], poses_start)
elif uniquely_mappable[0] == 1:
    # This code is only encountered if the first element of ar_quant is nonzero
    # AND the last element of ar_quant is also nonzero, such that:
    # - poses_start is missing the start position of the first run. Consequently, we insert a start position of 0 to poses_start.
    # - poses_end is missing the end position of the last run. At this point in the code, 
    #   poses_end contains indices of value 1 less than the desired 0-based (start-open, end-closed) index.
    #   So the value appended should be (len(uniquely_mappable) - 1), not len(uniquely_mappable).
    poses_start = np.append([0], poses_start)
    poses_end = np.append(poses_end, [len(uniquely_mappable)])

Note that whether the length of the arrays poses_end and poses_start are the same is ultimately not relevant. We simply need to add a start position of 0 if the first element of ar_quant is non-zero, and add an end position of len(uniquely_mappable) - 1 if the last element of ar_quant is non-zero. This entire code block can therefore be simplifed as follows:

if uniquely_mappable[0] > 0:
    poses_start = np.append([0], poses_start)
if uniquely_mappable[-1] > 0:
    poses_end = np.append(poses_end, [len(uniquely_mappable) - 1])

Example for a kmer size of 1

  • uniquely_mappable = [ 0, 0, 1, 1, 1, 0, 0, 1, 1 ]
  • ar_quant = [ 0, 0, 1, 1, 1, 0, 0, 1, 1 ]
  • unimap_diff = [ 0, 1, 0, 0, -1, 0, 1, 0]
  • poses_start = [1, 6]
  • poses_end = [4] --> extended to poses_end = [4, 8]

Desired wiggle output

fixedStep chrom=chrom start=3 step=1 span=1
ar_quant[2]
ar_quant[3]
ar_quant[4]
fixedStep chrom=chrom start=8 step=1 span=1
ar_quant[7]
ar_quant[8]

Working backwards, we need pos_st = [2, 7] and pos_end = [5, 9] for the following loop (lines 383-384) to work as intended:

for each_pos in range(pos_st, pos_end):
    out_link.write(str(ar_quant[each_pos]) + "\n")

Lines 378 and 379 add a value of 1 to each of the positions from poses_start and poses_end, therefore giving the desired result.

Problem: An IndexError can be raised on line 384 when trying to access an out-of-bounds index of `ar_quant`:

```python
out_link.write(str(ar_quant[each_pos]) + "\n")
```

Fix: this commit both simplifies the logic of the code and fixes the bug.
@EricR86
Copy link
Member

EricR86 commented May 12, 2025

This looks great and seemingly correct. This would be best served not merged with master, but the last tagged release to ensure compatibility however old it is.

As mentioned before though this software will likely be sunsetted soon in favour of a new solution. I'll have to talk internally how we want to support this software along with possibly expediting its intended successor. Worst case we will leave this up as an intended patch for those running into the same problem in the future and to which we are very grateful for your work.

@bentyeh
Copy link
Author

bentyeh commented May 12, 2025

Thank you for reviewing the pull request. What is the best way to get this into the tagged release?

@bentyeh
Copy link
Author

bentyeh commented Jun 19, 2025

Bumping this

@EricR86
Copy link
Member

EricR86 commented Jun 25, 2025

Sorry for the delay, was on vacation. As mentioned earlier support for this software in general is lower priority and a better solution is imminent. I will update this issue after when it's available which ideally should make this issue moot. If it doesn't, I would more than welcome further discussion.

@EricR86
Copy link
Member

EricR86 commented Jul 8, 2025

@bentyeh to follow up, we have done a prelimary release of a new replacement method called Newmap. It should address virtually all issues in this repository and it would be great to get feedback. Hopefully this fix any further issues as well.

Any feedback would also be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants