Pytorch to Skorch: zero_grad()

I’m looking to migrate an embedding model from pure torch to skorch so I can gridsearch to find the best parameters. In PyTorch, I set the embedding size, learning rate, epochs, and batch size. I set the skorch model up to do the same. Both models run and produce embeddings, but the produced embeddings are pretty different.

Going back through the loop vs. the skorch model, the only thing I can come up with is that the pure torch method uses a loop that sets the gradients back to zero. I don’t think that skorch is doing this, and I’m wondering if there is a way to implement it.

def fit(iterator, model, optimizer, criterion): 
  for x, y in iterator: 
    optimizer.zero_grad()
    y_hat = model(x.to(device))
    loss = criterion(y_hat, y.to(device))
    train_loss += loss.item() * x.shape[0]
    loss.backward()
    optimizer.step()
return train_loss / len(iterator.dataset)) 


It seems skorch does zero_grad() as well.

Fit loop:

train_step: