Hello,
I have a small net running in C++.
Input and target defined as following:
auto input = torch::randn({ TRAIN_BATCH_SIZE, 1, IMAGE_SIZE, IMAGE_SIZE }).to(device);
auto target = torch::zeros(TRAIN_BATCH_SIZE, torch::kInt64).to(device);
Filling the data with memcpy or cudaMemcpy if using CUDA or not. With C++ the net converges. With CUDA I get an exception in loss.backward()
// memcpy or cudaMemcpy
LoadEpoch(input, target, cvImage, cvResult, imgcount);
optimizer.zero_grad(); // zero the gradient buffers
auto output = net->forward(input);
auto loss = criterion(output, target);
loss.backward();
optimizer.step();
The layout of all tensors is the same using CUDA or not. Result of Forward()
works always and NLLLoss
. Just backward()
throws an exception
c10::Error address 0x0000005A15D6C770.
The net is set to device and the optimizer is built with net->parameters().
Many thanks for your help.