Minimize cosine distance between multiple pairs of vectors

I am trying to create an encoder - decoder network that converts a vector U (dim = 300) in one vector space to another V (dim = 300). The training data is the collection of vector pairs (u,v) such that they both represent the same object in different vector spaces. Typically, we would just ensure that the reconstructed vector (_v) and the original vector (v) have cosine similarity of 1.

Cos (v, _v) =1

However, I am trying to extend the criteria to ensure that the reconstructed vector _v has the same cosine similarity to other vectors Vj in the transformed vector space.

Essentially, I’m insisting not only that the reconstructed and the original vectors be parallel to each other but also that the reconstructed vector have the same alignment to other vectors in the same space.

I am doing this by calculating the original cosine product between v and vj and the reconstructed cosine product between _v and vj and then optimizing this to zero.

loss_func = cos(v, vj) - cos(_v, vj)

This is repeated for many vectors vj for each vector v in a batch.

Here is the torch implementation of the loss function:

cosine_loss   =  torch.nn.CosineEmbeddingLoss(reduction='none')
def loss_fn(train_cog, train_y,  train_r):
    condition = torch.ones(train_y.shape[0]).to(device)
    sim_sum = torch.sum(torch.abs(train_cog + cosine_loss(train_y, train_r, condition) -1))/train_y.shape[0]
    return sim_sum

where train_y is a batch of 32 reconstructed vectors [32, 300] and train_r is the vector to be paired with the each of the vector in the training batch. so train_r is also [32, 300]

train_cog is the original cosine similarity between the original vector and each of the vectors in train_r. [32,1]

However, this program keeps running into a cuda illegal memory access error:

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Is there something wrong with what I am doing? Can anyone suggest modifications to the code or a better way of doing this?

Could you rerun the script via export CUDA_LAUNCH_BLOCKING=1 and post the stacktrace here, please?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-28-5f97a74b79a8> in <module>
     17 
     18         train_latent_x  = encoder(train_X)
---> 19         train_decoded_x = decoder(train_latent_x)
     20 
     21         #dim_0           = train_decoded_x.shape[0]

~/anaconda3/envs/vineeth/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

<ipython-input-16-f314176271f8> in forward(self, X)
     43 
     44     def forward(self, X):
---> 45         X= self.decoder(X)
     46         return X

~/anaconda3/envs/vineeth/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/vineeth/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
    137     def forward(self, input):
    138         for module in self:
--> 139             input = module(input)
    140         return input
    141 

~/anaconda3/envs/vineeth/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/envs/vineeth/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py in forward(self, input)
    176             bn_training,
    177             exponential_average_factor,
--> 178             self.eps,
    179         )
    180 

~/anaconda3/envs/vineeth/lib/python3.6/site-packages/torch/nn/functional.py in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
   2280 
   2281     return torch.batch_norm(
-> 2282         input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
   2283     )
   2284 

RuntimeError: CUDA error: an illegal memory access was encountered

It seems to have run for 3 epoch + 27/276 batches before it gave this error

Thank you! So it seems the batchnorm layer is failing.
Could you post the output of python -m torch.utils.collect_env and how to reproduce the issue?

Collecting environment information...                                                                                   PyTorch version: 1.9.0+cu111                                                                                            
Is debug build: False  
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A 

OS: Rocky Linux 8.4 (Green Obsidian) (x86_64) 
GCC version: (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1)                                                                   Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28

Python version: 3.8 (64-bit runtime)
Python platform: Linux-4.18.0-305.19.1.el8_4.x86_64-x86_64-with-glibc2.10 
Is CUDA available: True  CUDA runtime version: 11.4.120
GPU models and configuration:                                                                                           GPU 0: NVIDIA RTX A5000                                                                                                 GPU 1: NVIDIA RTX A5000                                                                                                 GPU 2: NVIDIA RTX A5000                                                                                                 GPU 3: NVIDIA RTX A5000 
                                                                                                                                                                                                                        Nvidia driver version: 470.57.02 
cuDNN version: Could not collect 
HIP runtime version: N/A 
MIOpen runtime version: N/A 
                                                                                                                                                                                                                    Versions of relevant libraries:
[pip3] numpy==1.20.1 
[pip3] numpydoc==1.1.0 
[pip3] torch==1.9.0+cu111 
[pip3] torchaudio==0.9.0
[pip3] torchvision==0.10.0+cu111 
[conda] blas                      1.0                         mkl
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py38h27cfd23_1
[conda] mkl_fft                   1.3.0            py38h42c9631_2
[conda] mkl_random                1.2.1            py38ha9443f7_2 
[conda] numpy                     1.20.1           py38h93e21f0_0  
[conda] numpy-base                1.20.1           py38h7d8b39e_0 
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1
[conda] torch                     1.9.0+cu111              pypi_0    pypi
[conda] torchaudio                0.9.0                    pypi_0    pypi
[conda] torchvision               0.10.0+cu111             pypi_0    pypi

Sorry, I don’t understand what you mean by “how to reproduce the issue”

Here is the encoder:

class Encoder(nn.Module):
    
    def __init__(self):
        super().__init__()
    
        self.encoder = nn.Sequential(
            nn.Linear(300, 280),
            nn.BatchNorm1d(280),
            nn.LeakyReLU(0.3),
            nn.Linear(280, 260),
            nn.BatchNorm1d(260),
            nn.LeakyReLU(0.3),
            nn.Linear(260, 240),
            nn.BatchNorm1d(240),
            #nn.Dropout(0.25),
            nn.LeakyReLU(0.3),
            
            nn.Linear(240, 220),
            nn.BatchNorm1d(220),
            nn.LeakyReLU(0.2),
            nn.Linear(220, 200),
            nn.BatchNorm1d(200),
            nn.LeakyReLU(0.2),
            nn.Linear(200, 180),
            nn.BatchNorm1d(180),
            nn.LeakyReLU(0.2),
            nn.Linear(180, 160),
            nn.BatchNorm1d(160),
            #nn.Dropout(0.25),
            nn.LeakyReLU(0.1),
            
            nn.Linear(160, 140),
            nn.BatchNorm1d(140),
            nn.LeakyReLU(0.1),
            nn.Linear(140,120),
            nn.BatchNorm1d(120),
            nn.LeakyReLU(0.1),
            nn.Linear(120, 100),
            nn.BatchNorm1d(100),
            nn.Dropout(0.25),
            nn.LeakyReLU(0.1),
            
        )
        
    def forward(self, X):
        X = self.encoder(X)
        return X

And decoder:

class Decoder(nn.Module):
    
    def __init__(self):
        super().__init__()

        self.decoder = nn.Sequential(
            nn.Linear(100, 120),
            nn.BatchNorm1d(120),
            nn.LeakyReLU(0.3),
            nn.Linear(120, 140),
            nn.BatchNorm1d(140),
            nn.LeakyReLU(0.3),
            nn.Linear(140, 160),
            nn.BatchNorm1d(160),
            #nn.Dropout(0.25),
            nn.LeakyReLU(0.3),
            
            nn.Linear(160, 180),
            nn.BatchNorm1d(180),
            nn.LeakyReLU(0.2),
            nn.Linear(180, 200),
            nn.BatchNorm1d(200),
            nn.LeakyReLU(0.2),
            nn.Linear(200, 220),
            nn.BatchNorm1d(220),
            nn.LeakyReLU(0.2),
            nn.Linear(220, 240),
            nn.BatchNorm1d(240),
            #nn.Dropout(0.25),
            nn.LeakyReLU(0.1),
            
            nn.Linear(240, 260),
            nn.BatchNorm1d(260),
            nn.LeakyReLU(0.1),
            nn.Linear(260,280),
            nn.BatchNorm1d(280),
            nn.LeakyReLU(0.1),
            nn.Linear(280, 300),
            nn.BatchNorm1d(300),
            nn.Dropout(0.25),
            nn.LeakyReLU(0.1),               
        )
        
    def forward(self, X):
        X= self.decoder(X)
        return X

Thanks for the env information. Could you update to the current nightly release with the CUDA11.3 or 11.6 runtime and check if you would still hit the issue?

I would need to get the input shapes and any executable code which I could run to reproduce the error to be ableto debug it.