Suppose I have a matrix X with m-by-n matrix training examples in a numpy array in memory. Should I convert it directly to X = Variable ( torch.FloatTensor(X) ) or convert it to X = torch.FloatTensor(X) and then convert to Variable as needed? What are the differences between these two approaches? What are their pros and cons?
there’s no difference between the two approaches, you can do either.
I guess my question is if you should be careful about the time you are going to wrap a Tensor in a Variable, for example, because you may still want to manipulate the data in it, and it is in a Variable, it may keep track of these operations and do something unexpected during backward. Do I have to be more careful handling variables than tensors?
In the following example, memory was leaking with each call to train, but it stopped leaking when I wrapped tensors in variables only when calling model and margin_loss, rather than once in the beginning. Is this a bug or am I overlooking something?
def train(model, sents, tags, optimizer, kpred): margin_loss = nn.MultiMarginLoss(margin=1) model.train() for (s, t) in itertools.izip(sents, tags): len_t = t.size(0) t_pred = torch.LongTensor(len_t) t_pred[:kpred] = t[:kpred] s = Variable( s ) t = Variable( t ) t_pred = Variable( t_pred ) for idx in xrange(kpred, len_t): optimizer.zero_grad() scores = model(s, t_pred, idx) loss = margin_loss(scores, t[idx:idx + 1] ) loss.backward() optimizer.step() _, ti_pred = torch.max(scores, 1) t_pred[idx] = ti_pred[0, 0] t_pred[idx] = t[idx]
Variables hold a reference to the graph.
I presume you were keeping around Variables across the boundary of the inner for-loop, and in that case the graph will be held on across iterations of the loop. That’s why (maybe) you thought memory was leaking, but actually the Variable was remembering what was done to it across all of
for idx in xrange(kpred, len_t).
@smth as a side (but related) note, if we are selecting a random batch when doing SGD, do we also need to wrap things with variables or torch.tensors? E.g:
def get_batch2(X,Y,M,dtype): X,Y = X.data.numpy(), Y.data.numpy() N = len(Y) valid_indices = np.array( range(N) ) batch_indices = np.random.choice(valid_indices,size=M,replace=False) batch_xs = torch.FloatTensor(X[batch_indices,:]).type(dtype) batch_ys = torch.FloatTensor(Y[batch_indices]).type(dtype) return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)