Skip to content

basenji_read overflow #196

@ElArquitectorgo

Description

@ElArquitectorgo

Hi,

I am trying to do a study similar to the Enformer study for my final thesis, and to do so I have downloaded 4505 Encode tracks.
When I ran the basenji_data script on I encountered the following error message numerous times

/mnt2/fscratch/users/ac_aux/vguirado/preprocess/bin/basenji_data_read.py:307: RuntimeWarning: overflow encountered in cast
  cov = self.cov_open.values(chrm, start, end, numpy=True).astype('float16')

The code:

#SBATCH --job-name=preprocess
#SBATCH --time=0-30:0
#SBATCH --mem=50G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
##SBATCH --ntasks-per-node=1
#SBATCH --constraint=cal

...more sbatch things...

time python bin/basenji_data.py -s .9 -g data/hg38_gaps.bed -l 196608 --local -o data/human -p 128 -v .1 -w 128 data/genome.fa data/human_data.txt

I would like to know if this can affect something to the generation of the TFRecords, since during the training I am finding an extremely strange behavior as I show below:

porlacara

This is by recovering a checkpoint at epoch 80 and training 50 more until 130 (first graph), recovering the checkpoint from epoch 130 until 180 (second) and from 180 until 230 (right). Here I'm using a small subsample, but the same happens with the whole dataset (and worse loss).

Apparently my training code is fine, because I have tried retrieving the Enformer checkpoint that is public and modifying the output to train the same subset and there I do get results. That is, I keep the trunk part already trained and add a single linear layer on top.

But starting from 0, and also including 1019 tracks for the mouse, the model is not able to learn anything. The values of R^2 are 0 or negative no matter how many steps I train.

So it occurs to me that the problem is in the generation of the TFRecords, but the only warning I found was that.

Thank you for your time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions