I tracked it down to:
torch.ao.nn.quantized.modules.conv.Conv2d has an attribute weight thats a method rather than the actual weight tensor, and this is causing problem since the forward of CLIPVisionEmbeddings is calling
target_dtype = self.patch_embedding.weight.dtype
and this is giving the error: function object has no attribute ‘dtype’
is this a bug on Conv2d impl.? it is differently different from the non quantized counterpart, where weight is not a function, but the actual tensor.