Update 3: I still get the error
Also, batch_size 6 sometimes does it (works for a few epochs I think and eventually fails or maybe itâs really just sometimes)
This is so weird.
To circle back to my initial suspicions
Tensorflow has a similar error msg and it would mean that the graph is disconnected.
Do you think this could be the case?
How does batch size fit into that?
This is where it happens:
<ipython-input-21-5f2cf50439c6> in train(save_model)
34 gen_loss = get_gen_loss(gen, disc, mask, image, adv_criterion, recon_criterion, 1000)
35 gen_loss.backward()
---> 36 gen_opt.step()
37
38 # Keep track of the average discriminator loss
/usr/local/lib/python3.7/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
86 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
87 with torch.autograd.profiler.record_function(profile_name):
---> 88 return func(*args, **kwargs)
89 return wrapper
90
/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
26 def decorate_context(*args, **kwargs):
27 with self.__class__():
---> 28 return func(*args, **kwargs)
29 return cast(F, decorate_context)
30
/usr/local/lib/python3.7/dist-packages/torch/optim/adam.py in step(self, closure)
116 lr=group['lr'],
117 weight_decay=group['weight_decay'],
--> 118 eps=group['eps'])
119 return loss
/usr/local/lib/python3.7/dist-packages/torch/optim/_functional.py in adam(params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, amsgrad, beta1, beta2, lr, weight_decay, eps)
85 # Decay the first and second moment running average coefficient
86 exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
---> 87 exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
88 if amsgrad:
89 # Maintains the maximum of all 2nd moment running avg. till no
w
Thanks a lot, guys, it works with batch_size 4 but not knowing what is causing this makes me nuts.