Trying to understand the use of Depthwise separable convolutions in GANs

I’m currently trying to improve the performance of a CycleGAN model which has a couple of downsampling layers and upsampling layers combined with 6 ResNet blocks for the bottleneck.

I used the following implementation for it,

class DSConv1D(nn.Module):
def __init__(self, in, out, kernel, stride, padding, bias=False):
    super(DSConv1D, self).__init__()
    self.d = nn.Conv1d(in, in, kernel_size=kernel, stride=stride, padding=padding, groups=in, bias=bias)
    self.p = nn.Conv1d(in, out, kernel_size=1, stride=stride, bias=bias)

def forward(self, x):
    out = self.d(x)
    out = self.p(out)
    return out

Currently, I’ve swapped every possible conv1d and conv2d layer with DSC layers and it was able to reduce parameters from 22059265 to 7906710 and I gained a boost of performance.

So my question is, is it a good practice to replace all conv layers with DSCs and reduce this much of parameters, or is it better to only change the bottleneck layers with this? will it hurt the accuracy because of the parameter reduction?