Hello,
I’d like to ask, with my dataset and machine, if it is normal to see out of memory or if I might have some programming issue.
RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)
I using AWS p3.8xlarge(4 Tesla V100s), trying to train a CycleGAN with 5055 images of 1024x1024 resolution.
I checked that resized dataset of 512x512 works with batch size 4, but with 1024x1024, even batch size 1 doesn’t work.
I think we need p4d.24xlarge for this project, but it’s hard to get the instance due to the lack of zone capacity.
possible tries are:
-reduce num of the dataset (but I think 5055 images are still small for training)
-find a memory leak?
any comments or hints are appreciated.
Below is the log for reference
train.py --dataroot database/face2smile \
> --model cycle_gan \
> --log_dir logs/cycle_gan/face2smile/teacher_1080 \
> --netG inception_9blocks \
> --real_stat_A_path real_stat_1080/face2smile_A.npz \
> --real_stat_B_path real_stat_1080/face2smile_B.npz \
> --batch_size 1 \
> --num_threads 1 \
> --gpu_ids 0,1,2,3 \
> --norm_affine \
> --norm_affine_D \
> --channels_reduction_factor 6 \
> --kernel_sizes 1 3 5 \
> --save_latest_freq 10000 --save_epoch_freq 5 \
> --nepochs 1 --nepochs_decay 0 \
> --preprocess none
----------------- Options ---------------
active_fn: nn.ReLU
active_fn_D: nn.LeakyReLU
aspect_ratio: 1.0
batch_size: 4 [default: 1]
beta1: 0.5
channels: None
channels_reduction_factor: 6 [default: 1]
cityscapes_path: database/cityscapes-origin
crop_size: 256, 256
dataroot: database/face2smile [default: None]
dataset_mode: unaligned
direction: AtoB
display_winsize: 256
drn_path: drn-d-105_ms_cityscapes.pth
dropout_rate: 0
epoch_base: 1
eval_batch_size: 1
gan_mode: lsgan
gpu_ids: 0,1,2,3 [default: 0]
init_gain: 0.02
init_type: normal
input_nc: 3
isTrain: True [default: None]
iter_base: 1
kernel_sizes: [1, 3, 5] [default: [3, 5, 7]]
lambda_A: 10.0
lambda_B: 10.0
lambda_identity: 0.5
load_in_memory: False
load_size: 286
log_dir: logs/cycle_gan/face2smile/teacher_1080 [default: logs]
lr: 0.0002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: -1
model: cycle_gan [default: pix2pix]
moving_average_decay: 0.0
moving_average_decay_adjust: False
moving_average_decay_base_batch: 32
n_layers_D: 3
ndf: 64
nepochs: 1 [default: 100]
nepochs_decay: 0 [default: 100]
netD: n_layers
netG: inception_9blocks
ngf: 64
no_flip: False
norm: instance
norm_affine: True [default: False]
norm_affine_D: True [default: False]
norm_epsilon: 1e-05
norm_momentum: 0.1
norm_student: instance
norm_track_running_stats: False
num_threads: 32 [default: 4]
output_nc: 3
padding_type: reflect
phase: train
pool_size: 50
preprocess: none [default: resize_and_crop]
print_freq: 100
real_stat_A_path: real_stat_1080/face2smile_A.npz [default: None]
real_stat_B_path: real_stat_1080/face2smile_B.npz [default: None]
restore_D_A_path: None
restore_D_B_path: None
restore_G_A_path: None
restore_G_B_path: None
restore_O_path: None
save_epoch_freq: 5 [default: 20]
save_latest_freq: 10000 [default: 20000]
seed: 233
serial_batches: False
table_path: datasets/table.txt
tensorboard_dir: None
----------------- End -------------------
train.py --dataroot database/face2smile --model cycle_gan --log_dir logs/cycle_gan/face2smile/teacher_1080 --netG inception_9blocks --real_stat_A_path real_stat_1080/face2smile_A.npz --real_stat_B_path real_stat_1080/face2smile_B.npz --batch_size 4 --num_threads 32 --gpu_ids 0,1,2,3 --norm_affine --norm_affine_D --channels_reduction_factor 6 --kernel_sizes 1 3 5 --save_latest_freq 10000 --save_epoch_freq 5 --nepochs 1 --nepochs_decay 0 --preprocess none
dataset [UnalignedDataset] was created
The number of training images = 5055
data shape is: channel=3, height=1024, width=1024.
initialize network with normal
initialize network with normal
initialize network with normal
initialize network with normal
dataset [SingleDataset] was created
dataset [SingleDataset] was created
/home/ubuntu/.local/lib/python3.9/site-packages/torchvision/models/inception.py:80: FutureWarning: The default weight initialization of inception_v3 will be changed in future releases of torchvision. If you wish to keep the old behavior (which leads to long initialization times due to scipy/scipy#11299), please set init_weights=True.
warnings.warn('The default weight initialization of inception_v3 will be changed in future releases of '
model [CycleGANModel] was created
---------- Networks initialized -------------
DataParallel(
(module): InceptionGenerator(
(down_sampling): Sequential(
(0): ReflectionPad2d((3, 3, 3, 3))
(1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
(2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(3): ReLU(inplace=True)
(4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(6): ReLU(inplace=True)
(7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(9): ReLU(inplace=True)
)
(features): Sequential(
(0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
)
(up_sampling): Sequential(
(0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(2): ReLU(inplace=True)
(3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(5): ReLU(inplace=True)
(6): ReflectionPad2d((3, 3, 3, 3))
(7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
(8): Tanh()
)
)
)
[Network G_A] Total number of parameters : 8.154 M
DataParallel(
(module): InceptionGenerator(
(down_sampling): Sequential(
(0): ReflectionPad2d((3, 3, 3, 3))
(1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
(2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(3): ReLU(inplace=True)
(4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(6): ReLU(inplace=True)
(7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(9): ReLU(inplace=True)
)
(features): Sequential(
(0): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(1): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(2): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(3): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(4): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(5): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(6): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(7): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
(8): InvertedResidualChannels(256, 256, res_channels=[42, 42, 42], dw_channels=[42, 42, 42], res_kernel_sizes=[1, 3, 5], dw_kernel_sizes=[1, 3, 5])
)
(up_sampling): Sequential(
(0): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(2): ReLU(inplace=True)
(3): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(5): ReLU(inplace=True)
(6): ReflectionPad2d((3, 3, 3, 3))
(7): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
(8): Tanh()
)
)
)
[Network G_B] Total number of parameters : 8.154 M
DataParallel(
(module): NLayerDiscriminator(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
(9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(10): LeakyReLU(negative_slope=0.2, inplace=True)
(11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
)
)
)
[Network D_A] Total number of parameters : 2.767 M
DataParallel(
(module): NLayerDiscriminator(
(model): Sequential(
(0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): LeakyReLU(negative_slope=0.2, inplace=True)
(2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(3): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(4): LeakyReLU(negative_slope=0.2, inplace=True)
(5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(7): LeakyReLU(negative_slope=0.2, inplace=True)
(8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
(9): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
(10): LeakyReLU(negative_slope=0.2, inplace=True)
(11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
)
)
)
[Network D_B] Total number of parameters : 2.767 M
-----------------------------------------------
start_epoch: 1
end_epoch: 1
total_iter: 1
current memory allocated: 265.4296875
max memory allocated: 265.4296875
cached memory: 276.0
will set input data
Traceback (most recent call last):
File "/data/CAT/train.py", line 14, in <module>
trainer.start()
File "/data/CAT/trainer.py", line 159, in start
model.optimize_parameters(total_iter)
File "/data/CAT/models/cycle_gan_model.py", line 295, in optimize_parameters
self.forward()
File "/data/CAT/models/cycle_gan_model.py", line 235, in forward
self.rec_A = self.netG_B(self.fake_B)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/CAT/models/modules/inception_architecture/inception_generator.py", line 141, in forward
res = self.up_sampling(res)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/modules/padding.py", line 173, in forward
return F.pad(input, self.padding, 'reflect')
File "/home/ubuntu/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 4014, in _pad
return torch._C._nn.reflection_pad2d(input, pad)
RuntimeError: CUDA out of memory. Tried to allocate 260.00 MiB (GPU 0; 15.78 GiB total capacity; 14.02 GiB already allocated; 198.19 MiB free; 14.13 GiB reserved in total by PyTorch)