Skip to content

memory requirement #3

@jaykumar16

Description

@jaykumar16

Hey,

What are the memory requirement to train this model? I am providing 187GB of RAm and it fails after

INFO:tensorflow:Saving checkpoints for 0 into summary/knee_l1/model.ckpt.
Here the memory requirement changes from 4GB to more than 187 GB and the job gets killed as it runs out of memory.

I am just running the model based on train_all.sh command, where I have decreased the batch size from 2 to 1 and iteration steps from 10000 to only 10.

python3 recon_train.py
--shape_y 320 --shape_z 256
--num_channels 8 --num_maps 1
--batch_size 1
--model_dir summary/knee_l1
--loss_l1 1
--max_steps 10
--device $device

Can you please help me, I am unable to train the model? I am proving 1 GPU of 16 GB. Does this model design to run on multiple nodes and CPU?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions