memory requirement

Hey,

What are the memory requirement to train this model? I am providing 187GB of RAm and it fails after 

INFO:tensorflow:Saving checkpoints for 0 into summary/knee_l1/model.ckpt.
Here the memory requirement changes from 4GB to more than 187 GB and the job gets killed as it runs out of memory. 

I am just running the model based on train_all.sh command, where I have decreased the batch size from 2 to 1 and iteration steps from 10000 to only 10.

python3 recon_train.py \
        --shape_y 320 --shape_z 256 \
        --num_channels 8 --num_maps 1 \
        --batch_size 1 \
        --model_dir summary/knee_l1 \
        --loss_l1 1 \
        --max_steps 10 \
        --device $device

Can you please help me, I am unable to train the model? I am proving 1 GPU of 16 GB. Does this model design to run on multiple nodes and CPU? 

Thank you. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory requirement #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory requirement #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions