Hello, I am working on transfer learning, trying to reduce the number of samples to use in the target training set.
To do so, I am reimplementing the following algorithm, taken from the paper “Instance Based Deep Transfer Learning”, 2018, Wang et Al.:
Basically, for each training sample of the target, you compute the jacobian (gradient) of the loss w.r.t. all the parameters and multiply it by the Hessian of the loss function (averaged for all the samples of the training) and by all the gradients of the loss when the net is fed with a validation sample, accumulating them. If the final result is positive, you discard that sample.
My implementation is as follows:
def instance_selection(model, X_train, y_train, X_valid, y_valid, criterion):
# Computation of Hessian
print('Computing Hessian matrix...', end='')
hessian_matrix = compute_hessian(model, X_train, y_train, criterion)
print('done!')
selected_indices = []
for i, train_sample in enumerate(tqdm(X_train, desc='Instance selection')):
# Computing jacobian of the loss in the training sample
model.zero_grad()
train_sample = np.expand_dims(train_sample, axis=0)
train_sample = torch.Tensor(train_sample)
label = y_train[i]
output = model(train_sample)
label = torch.LongTensor([label])
loss = criterion(output, label)
# This step will populate the parameters of the network with the gradients
loss.backward()
# Iteration on the parameters of the network to get all the gradients
jacobian_i = []
for param in model.parameters():
jacobian_i.append(param.grad.view(-1).cpu().data.numpy())
jacobian_i = np.concatenate(jacobian_i).ravel()
# Computing intermediate product between Hessian and Jacobian of training
sample
intermediate = np.matmul(hessian_matrix, np.transpose(jacobian_i))
j_loss = 0
for j, valid_sample in enumerate(X_valid):
# Computing jacobian of the loss in the validation sample
model.zero_grad()
valid_sample = np.expand_dims(valid_sample, axis=0)
valid_sample = torch.Tensor(valid_sample)
label = y_valid[j]
output = model(valid_sample)
label = torch.LongTensor([label])
loss = criterion(output, label)
loss.backward()
jacobian_j = []
for param in model.parameters():
jacobian_j.append(param.grad.view(-1).cpu().data.numpy())
jacobian_j = np.concatenate(jacobian_j).ravel()
# Computing the multiplication between the jacobian of the validation sample
and the intermediate matrix
j_loss += np.matmul((jacobian_j*(-1)), intermediate)
# If j_loss is negative, I won't keep the sample in the training set
if j_loss <= 0:
selected_indices.append(i)
return selected_indices
Is there anyone who has already done a similar thing or can give me a feedback whether the implementation is correct?
Thanks a lot!