Thank you. I have now updated to CUDA 11 and most of the repo works. I just have one piece that doesn’t. The full error message is is the following:
File “train_continue.py”, line 201, in
fire.Fire()
File “/home/…/lib/python3.6/site-packages/fire/core.py”, line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File “/home/…/lib/python3.6/site-packages/fire/core.py”, line 471, in _Fire
target=component.name)
File “/home/…/lib/python3.6/site-packages/fire/core.py”, line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File “train_continue.py”, line 120, in train
loss = criterion(score_connect, target)
File “/home/…/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 1102, in _call_impl
return forward_call(*input, **kwargs)
File “/home/…/lib/python3.6/site-packages/torch/nn/modules/loss.py”, line 1152, in forward
label_smoothing=self.label_smoothing)
File “/home/…/lib/python3.6/site-packages/torch/nn/functional.py”, line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: “nll_loss_forward_reduce_cuda_kernel_2d_index” not implemented for ‘Bool’
And the code around line 120 of the script is:
for i, (input, target, existing) in tqdm(enumerate(train_dataloader)):
input = input.cuda()
target = target.cuda()
optimizer.zero_grad()
score_model = model(input)
existing = existing.cuda()
score_model = t.cat([score_model, existing], 1)
score_connect = connect(score_model)
loss = criterion(score_connect, target)
loss.backward()
optimizer.step()
Do you know why this might happen?