What is the best way to assign bias/weights of a specific layer to be half precision? This layer is a simple linear layer (no dropout or non-linear function).
I code this and I don’t know why it works even without gradient scaling. But if I move the dtype=torch.float16 to the weight (self.fc1w), it yields an error:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1w = nn.Parameter(torch.rand((100,100), requires_grad=True)/100)
self.fc1b = nn.Parameter(torch.zeros(100, requires_grad=True, dtype=torch.float16))
self.fc2 = nn.Linear(100,100)
self.fc3 = nn.Linear(100,10)
def forward(self, x):
x = F.linear(x, self.fc1w, self.fc1b)
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
x = torch.rand(100)
y = torch.tensor([0]*9+[1], dtype=torch.float32)
for _ in range(20):
out = model(x)
loss = criterion(out, y)
print(loss)
optimizer.zero_grad()
loss.backward()
optimizer.step()
In general, I want to know, is it possible to assign half-precision manually to a specific simple linear layer? If possible, what is the best way?
Your current code might work if e.g. type promotion is used internally.
For a manual approach (assuming you don’t want to use the mixed-precisition training util. via torch.amp) you could either manually cast the tensors and parameters to the desired dtype in your forward method or you could also use the autocast context manager and specify the types in addition to casting as explained here.
half() is not an inplace operation on tensors, so either assign the flaot16 parameters to new tensors in the forward method or initialize them in the desired dtype in the __init__ before wrapping them into nn.Parameters.