So, I have trained a model and pruned it using unstructured pruning, which has converted some weights to zero. But the inference time hasn’t improved much, since the 0 weights are also participating in the matrix multiplication at the inference. Since autograd doesn’t support sparse matrix calculation, I’ve retrained the pruned model with it’s dense layer and later I converted it to sparse matrix. But when I used it for evaluation it’s giving me following error:
y_preds = model(X_batch)
File “/sys_apps_01/python/python310/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “/jupyterhubenc/e132293/Pruning/utils/nn_architecture.py”, line 105, in forward
out = self.fc1(x)
File “/sys_apps_01/python/python310/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File “/sys_apps_01/python/python310/lib/python3.10/site-packages/torch/nn/modules/linear.py”, line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: addmm_sparse_dense: expected either ‘mat1’ to have sparse layout and ‘mat2’ to have strided layout, got ‘mat1’ with layout Strided and ‘mat2’ with layout Sparse
code for the same:
def evaluation( model,test_loader,device,y_test_data):
model.eval()
all_predictions = []
start_time = time.time()
with torch.no_grad():
for X_batch in test_loader:
X_batch = X_batch.to(device)
y_preds = model(X_batch)
all_predictions.append(y_preds.view(-1, 2).cpu().numpy())
end_time = time.time()
inference_time = end_time - start_time
print("Inference time:", inference_time, "seconds")
Is there any way I can fix this or I can remove the zero weights from pruned model in order to reduce the inference time/