What exactly is the behavior of register forward hook with multiple GPUs?
I want to save the outputs of each layer in my model. For now I have this code:
The problem is that, with multiple GPUs, this does not work; each GPU will receive a fraction of the input, so we need to aggregate the results coming from different GPUs.
This can be done easily, for example by making the outputs_layer a dict and concatenating the outputs with the same key. To make this work, though, we would need to be assured that the order in which the GPUs return the values is always the same, and the same order as the inputs.
So, in general, how can we use forward hooks with multiple GPUs?
I don’t think you can rely on the hooks running deterministically. However, the list of device ids you passed into DataParallel (or the default [0, 1, 2, 3] assuming 4 GPUs) specifies the order in which your data gets split across devices.
For example, for device ids [0, 1, 2, 3], the first quarter of data will get sent to device 0, etc, etc.You can use this information to reconstruct the order of your outputs: each output captured by a hook will be on a specific GPU.
Hi, I try to register hook and run it on multiple GPUs. However, it only return the result on GPU 0.
The data has been successfully split on multiple GPUs.
Anyone know why is it?
The code:
def forward(self, x):
self.activations = []
self.gradients = []
self.grad_index = 0
self.activation_to_layer = {}
activation_index = 0
for layer, module in self.model.named_modules():
if ('conv' in layer) or ('pool' in layer) or ('fc' in layer):
if 'fc6' in layer:
if isinstance(self.model, nn.DataParallel):
x = x.view(-1, self.model.module.fc6.in_features)
else:
x = x.view(-1, self.model.fc6.in_features)
x = module(x)
if isinstance(module, torch.nn.modules.conv.Conv3d):
# hook will registered on the output of the layer
x.register_hook(self.compute_rank)
self.activations.append(x)
self.activation_to_layer[activation_index] = layer
activation_index += 1
x = model.relu(x)
elif isinstance(module, torch.nn.modules.Linear) and layer != 'fc8':
x = model.dropout(model.relu(x))
return x
def compute_rank(self, grad):
"""
Compute the Taylor expansion without abs of each channel
return:
self.activations: feature map before relu in each layer
self.filter_ranks: Taylor value without abs over spatial and batch
"""
activation_index = len(self.activations) - self.grad_index - 1
activation = self.activations[activation_index]
if self.pruning_level == 'channel':
print(self.model)
print('device', grad.device)
values = torch.sum((activation * grad), dim = 4).\
sum(dim=3).sum(dim=2).sum(dim=0)
# Normalize the rank by the filter dimensions
values = values / (activation.size(0) * activation.size(2) \
* activation.size(3) * activation.size(4))
if activation_index not in self.filter_ranks:
self.filter_ranks[activation_index] = \
torch.FloatTensor(activation.size(1)).zero_().cuda()
I meet a similar problem. It seems that data parallel with forward hook cannot guarantee the activation to be on the same device as the model weights. I am not sure why, but perhaps because of using dictionary to store the activation, which is not ordered. Is there any suggestion on how to solve this? Thanks.
solved.
here is the basic idea:
instead of outputs list i have defined dictionary of list.
when i get a hook call i add the output to the right list according to the output.device
then when i return from forward i return the list according to the input.device