I’m creating layers A and B and both of them stores a tensor that will be used during the forward operation.
A projects the input onto a new space by performing a dot product with a basis (mel_basis)
B uses the input to sample a identity matrix.
My solution to have this code running on multi-GPU was to instantiate these tensors on the CPU and them move it to the GPU respective GPU once they are called by DataParallel. The problem with this approach is that these tensors keep being moved from the CPU to the GPU.
Is there a way to circumvent this? If at initialization time I instantiate these tensors on the device with .cuda(), they will all end-up on device[0].
Code is below:
class A(torch.nn.Module):
def __init__(self, n_fft, n_mel_channels=80, sampling_rate=16000):
super(A, self).__init__()
self.mel_basis = torch.from_numpy(
mel(sampling_rate, n_fft, n_mel_channels)).float()
def linear_to_mel(self, x):
if torch.cuda.is_available():
return torch.matmul(self.mel_basis.cuda(), x)
else:
return torch.matmul(self.mel_basis, x)
class B(torch.nn.Module):
def __init__(self, n_quantization_channels):
super(B, self).__init__()
self.n_quantization_channels = n_quantization_channels
self.identity_matrix = torch.eye(n_quantization_channels).float()
def encode(self, x):
if torch.cuda.is_available():
return self.identity_matrix.cuda()[x.view(-1)]
else:
return self.identity_matrix.[x.view(-1)]