I am doing regression on an image, I have a fully CNN (no fully connected layers) and Adam optimizer. For some reason unknown to me when I use batch size 1, my result is much better (In testing is almost 10 times better, in training more than 10 times) in training and testing as oposed to using higher batch sizes (64,128,150), which is contraty to what people have apparently found. My loss is MSE. I would like to know if you someone has run into this or knows what’s going on.
I also have exactly the same initialization when I do the training. Moreover, I examined with my training every epoch and this holds regardless of that.
Attached is my code.
This is my data loader:
class DriveData(Dataset):
def __init__(self,transform=None):
self.xs=pd.read_csv('data/train_input.csv')
self.ys=pd.read_csv('data/train_output.csv')
self.x_data = torch.from_numpy(np.asarray(self.xs,dtype=np.float32))
self.y_data = torch.from_numpy(np.asarray(self.ys,dtype=np.float32))
def __getitem__(self,index):
return self.x_data[index], self.y_data[index]
def __len__(self):
return len(self.xs)
dset_train = DriveData()
train_loader = DataLoader(dset_train, batch_size=1,shuffle=True, num_workers=4)
My training function:
def train(model,device,train_loader,optimizer,epoch):
model.train()
for batch_idx, (data,target) in enumerate(train_loader):
data, target = data, target
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.mse_loss(output,target)
model = Net()
optimizer = optim.Adam(model.parameters(),lr=.001)
for epoch in range(1):
train(model,device,train_loader,optimizer,epoch)
Batch size 1: Training loss: 0.000812 testing loss 0.002547
Batch size 128: Training loss 0.0171 testing loss 0.0226
Thanks