Hi,
I’m working on switching over some code I have from Keras to Pytorch, and I’m trying to get a network training on CIFAR-100. So far, I’ve been bogged with issues telling me, in some way or another, that my output and target sizes are mismatched or that the batch size and target size are mismatched. The only architecture that works and trains on CIFAR-100 is the one from the “Training a classifier” tutorial (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)
I can manipulate the number of filters and the model will still successfully train, but changing the kernel size or adding/taking away layers break it so that it won’t train on CIFAR 100 and instead throws an error.
I’ll attach my code for your perusal. Please inform me of any fixes you know!
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR100(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=50,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR100(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=50,
shuffle=False, num_workers=2)
classes = (str(i) for i in range(100))
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 60, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(60, 160, 5)
#self.conv3 = nn.Conv2d(160, 160, 5)
self.fc1 = nn.Linear(160 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 100)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
#x = F.relu(self.conv3(x))
x = x.view(-1, 160 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
from torch.autograd import Variable
batch_size=50
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
inputs, labels = Variable(inputs), Variable(labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
prediction = outputs.data.max(1)[1]
accuracy = prediction.eq(labels.data).sum()/batch_size*100
if i % 1000 == 0:
print('Train Step: {}\tLoss: {:.10f}\tAccuracy: {:.10f}'.format(i, loss.data[0], accuracy))
print('Finished Training')
I’ve attached code that runs successfully for me on both Ubuntu 16.04 and Google Colab notebooks. If I uncomment the self.conv3 lines in both init and forward, then the code breaks. If I change the kernel size in any of the convolution lines, the code breaks.
Thanks in advance for your help!
Edit: I can also add/take away Linear layers without any ill effects.