basenji_read overflow

Hi,

I am trying to do a study similar to the Enformer study for my final thesis, and to do so I have downloaded 4505 Encode tracks.
When I ran the basenji_data script on I encountered the following error message numerous times

```
/mnt2/fscratch/users/ac_aux/vguirado/preprocess/bin/basenji_data_read.py:307: RuntimeWarning: overflow encountered in cast
  cov = self.cov_open.values(chrm, start, end, numpy=True).astype('float16')
```
The code:

```
#SBATCH --job-name=preprocess
#SBATCH --time=0-30:0
#SBATCH --mem=50G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
##SBATCH --ntasks-per-node=1
#SBATCH --constraint=cal

...more sbatch things...

time python bin/basenji_data.py -s .9 -g data/hg38_gaps.bed -l 196608 --local -o data/human -p 128 -v .1 -w 128 data/genome.fa data/human_data.txt
```

I would like to know if this can affect something to the generation of the TFRecords, since during the training I am finding an extremely strange behavior as I show below:

![porlacara](https://github.com/calico/basenji/assets/84041165/07fb88de-42c3-4723-8c4b-6d6dab88802c)

This is by recovering a checkpoint at epoch 80 and training 50 more until 130 (first graph), recovering the checkpoint from epoch 130 until 180 (second) and from 180 until 230 (right). Here I'm using a small subsample, but the same happens with the whole dataset (and worse loss).

Apparently my training code is fine, because I have tried retrieving the Enformer checkpoint that is public and modifying the output to train the same subset and there I do get results. That is, I keep the trunk part already trained and add a single linear layer on top.

But starting from 0, and also including 1019 tracks for the mouse, the model is not able to learn anything. The values of R^2 are 0 or negative no matter how many steps I train.

So it occurs to me that the problem is in the generation of the TFRecords, but the only warning I found was that.

Thank you for your time.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basenji_read overflow #196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

basenji_read overflow #196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions