Validation not working as intended

Sivus · June 19, 2018, 7:23pm

i’m trying to get my validation to work within my training model
Currently

def train(epoch):
    model.train()
    correct = 0
    train_loss = 0
    vloss = 0
    vcorrect = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)

        optimizer.zero_grad()
        output = model(data, target)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        train_loss += F.nll_loss(output, target, size_average=False).item() # sum up batch loss
        pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()		

        # validation would most likely go here...

        if batch_idx % args.log_interval == 0:
            print('Time', time.time()-start,'Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))
#extra print for 
            print('Time', time.time()-start,'Valid Epoch: {} [{}/{} ({:.0f}%)]\tValid Loss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(validate_loader.dataset),
                100. * batch_idx / len(validate_loader), loss.data[0]))

I want to add validation, i think it should look like this, eval every 10 steps

  if batch_idx % 10 == 0:
        	model.eval()
        	for data, target in validate_loader:
        		if args.cuda:
        			data, target = data.cuda(), target.cuda()
        		data, target = Variable(data), Variable(target)
        		voutput = model(data)
        		vloss += F.nll_loss(voutput, target, size_average=False).item()
        		vpred = voutput.data.max(1, keepdim=True)[1]
        		vcorrect += vpred.eq(target.data.view_as(vpred)).cpu().sum()

I have run into several problems
first is var issue: data,target are used for both train and validation however, pytorch errors out when i try and rename one of the data,train vars
Is this just implemented wrong in PyTorch ?

ptrblck · June 19, 2018, 8:38pm

What kind of errors do you get?

Sivus · June 19, 2018, 9:32pm

Time 0.12518644332885742 Train Epoch: 1 [0/41937 (0%)]	Loss: 2.533277
Time 0.12536191940307617 Valid Epoch: 1 [0/4660 (0%)]	Valid Loss: 2.533277

Time 0.2388451099395752 Train Epoch: 1 [600/41937 (2%)]	Loss: 2.197046
Time 0.2390141487121582 Valid Epoch: 1 [600/4660 (21%)]	Valid Loss: 2.197046

Time 0.3443450927734375 Train Epoch: 1 [1200/41937 (3%)]	Loss: 2.194630
Time 0.34452247619628906 Valid Epoch: 1 [1200/4660 (43%)]	Valid Loss: 2.194630

Time 0.45102596282958984 Train Epoch: 1 [1800/41937 (5%)]	Loss: 2.192434
Time 0.45121240615844727 Valid Epoch: 1 [1800/4660 (64%)]	Valid Loss: 2.192434

Time 0.5626089572906494 Train Epoch: 1 [2400/41937 (6%)]	Loss: 2.190561
Time 0.5628125667572021 Valid Epoch: 1 [2400/4660 (85%)]	Valid Loss: 2.190561

Time 0.6735479831695557 Train Epoch: 1 [3000/41937 (8%)]	Loss: 2.186656
Time 0.6737353801727295 Valid Epoch: 1 [3000/4660 (106%)]	Valid Loss: 2.186656

loss and valid loss are the same value

If i change some vars around

        if batch_idx % 10 == 0:
        	model.eval()
        	for data, target in validate_loader:
        		if args.cuda:
        			vdata, vtarget = data.cuda(), target.cuda()
        		vdata, vtarget = Variable(data), Variable(target)
        		voutput = model(data)
        		vloss += F.nll_loss(voutput, target, size_average=False).item()
        		vpred = voutput.data.max(1, keepdim=True)[1]
        		vcorrect += vpred.eq(target.data.view_as(vpred)).cpu().sum()

            print('\nTime', time.time()-start,'Valid Epoch: {} [{}/{} ({:.0f}%)]\tValid Loss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(validate_loader.dataset),
                100. * batch_idx / len(validate_loader), vloss.data.item()))

if i try and change some vars around
i get

  File "custom.py", line 215, in train
    voutput = model(data)
  File "/home//anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "custom.py", line 167, in forward
    x = self.hidden1_bn(self.hidden1(x))
  File "/home//anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home//anaconda3/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home//anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 994, in linear
    output = input.matmul(weight.t())
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'mat2'

ptrblck · June 20, 2018, 5:51am

In the first case it makes sense, that the values are equal, since you are printing the same loss for training and validation. In fact you overwrite the training loss and just the the validation loss.

The second approach looks better. The error is thrown, because you are passing data to model instead of vdata during your validation.