net.eval()
net.cuda()
X = Variable(torch.from_numpy(x).float()).cuda()
Y = Variable(torch.from_numpy(y).long()).cuda()

If I remove .cuda() I got an error: input is not contigous.

I also have tried this, and still get the same error:

# net new have the same architecture with net
net_new.load_state_dict(net.state_dict())
net_new.eval()
X = Variable(torch.from_numpy(x).float())
Y = Variable(torch.from_numpy(y).long())

Your problem is not related to testing on cpu, but to that input is not contiguous. net.cpu() should be sufficient to pulling the network to CPU. There are two possibilities:

Your X or Y is not contiguous yet the first operation of your net expect them to be. .cuda() makes a contiguous CUDA tensor and copies from CPU so it was fine in training. Try using

X = Variable(torch.from_numpy(x).float().contiguous())
Y = Variable(torch.from_numpy(y).long().contiguous())

Some CPU kernel requires some tensor to be contiguous while the corresponding GPU one doesn’t. If this is the case, then it is a bug and you should submit it.