Why same model in CUDA and CPU got different result?

You should be able to print to stdout the values of the different tensors (both cpu and gpu), can you print the inputs, weights and outputs to see where the difference appears?

1 Like