TimeSformer Pruning

Hello everyone, I am new to Pytorch, but I am loving the experience. Recently I have been trying to prune the TimeSformer model to get better inference times. I prune the model and save the new model as follows:

ARG = [12, 1,'model.pyth']
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = TimeSformer(img_size=224, num_classes=400, num_frames=8, attention_type='divided_space_time',ARGS=ARG).to(device=device)
    #model.head = torch.nn.Linear(in_features=768, out_features=50, bias=True)
    num_zeros, num_elements, sparsity = measure_global_sparsity(model)
    print(num_zeros,num_elements,sparsity)
    amount=0.9
    for module_name, module in model.named_modules():
        if isinstance(module, torch.nn.Linear):
            prune.l1_unstructured(module,name="weight",amount=amount)
    pruned_model = remove_parameters(model)
    num_zeros, num_elements, sparsity = measure_global_sparsity(pruned_model)
    print(num_zeros, num_elements, sparsity)
    torch.save(model.state_dict(),"pruned_model_"+str(amount)+".pyth")

measure_global_sparsity measures the number of zeros, elements and sparsity before and after pruning. While remove_parameters removes the weight and bias parameters after computing masks:

def remove_parameters(model):

    for module_name, module in model.named_modules():
        if isinstance(module, torch.nn.Conv2d):
            try:
                prune.remove(module, "weight")
            except:
                pass
            try:
                prune.remove(module, "bias")
            except:
                pass
        elif isinstance(module, torch.nn.Linear):
            try:
                prune.remove(module, "weight")
            except:
                pass
            try:
                prune.remove(module, "bias")
            except:
                pass

    return model

When I prune the model, the size of the model does not change and the inference speed is also unaffected. Although the measured sparsity does increase as I increase the amount of pruning. Why are the model size and inference speed unaffected? Does anyone know what am I doing wrong here?
Thank you.