Masked Tensor for specific PyTorch

I have version 1.12.1 PyTorch. How can I use the functionality of MaskedTensor?

You would need to update to torch>=1.13 as it seems masked tensors were introduced then as seen in the docs.

Are you shure that this feature are available? docs say it is prototype

Yes, and you can easily check it by installing 1.13+:

>>> import torch
>>> torch.__version__
'1.13.1+cu117'
>>> torch.masked
<module 'torch.masked' from ...
>>> from torch.masked import masked_tensor, as_masked_tensor
>>> data = torch.arange(24).reshape(2, 3, 4)
>>> mask = data % 2 == 0
>>> mt = masked_tensor(data.float(), mask)
>>> print("mt[0]:\n", mt[0])
mt[0]:
 MaskedTensor(
  [
    [  0.0000,       --,   2.0000,       --],
    [  4.0000,       --,   6.0000,       --],
    [  8.0000,       --,  10.0000,       --]
  ]
)
>>> print("mt[:, :, 2:4]:\n", mt[:, :, 2:4])
mt[:, :, 2:4]:
 MaskedTensor(
  [
    [
      [  2.0000,       --],
      [  6.0000,       --],
      [ 10.0000,       --]
    ],
    [
      [ 14.0000,       --],
      [ 18.0000,       --],
      [ 22.0000,       --]
    ]
  ]
)

In torch 1.13.0 I run into strange error with maskedtensor,

TypeError: mask must have dtype bool.

while forward(), but maskedtensor itself is created without errors. It is why my question about version.

Which dtype are you expecting for the mask if not bool?

Only bool
If my mask was not bool, maskedtensor would not be created
But this error occurs when forward() runs

In this case I would recommend checking the forward method and which dtype your mask currently has as it doesn’t seem to be a BoolTensor. Call mask = mask.bool() and it should work assuming it contains valid values which can be transformed to bool without issues.

No, it doesn’t work
Mask is bool, but the error remains

Could you post a minimal and executable code snippet reproducing the error, please?

device='cuda'
batchsize=32

 
x = torch.rand((batchsize, 100)).to(device)
x[:, 80:] = -1
batch = {
    'features': [
        torch.randint(0, 5, (batchsize, 100)).to(device),
        masked_tensor(x, (x != -1).bool(), requires_grad=True)
    ],
    'label': torch.randint(0, 2, (batchsize,)).to(device)
}

 
class Rnn(nn.Module):
    def __init__(self, rnn_units=64, top_classifier_units=32):
        super(Rnn, self).__init__()

        self._embeddings = nn.ModuleList(
            [self._create_embedding_projection(*(5, 2))])
        self._spatial_dropout = nn.Dropout2d(0.05)
        self._embeddings_concated_dim = 2 #2 - dim of the embedding
        self._rnn = nn.GRU(input_size=self._embeddings_concated_dim+1,#1 - number of continious features
                            hidden_size=128, batch_first=True, num_layers=1,
                            bidirectional=True)

        self._hidden_size = rnn_units
        n_pooling = 2
        in_features_linear = n_pooling * self._hidden_size * 2
        self._top_classifier = nn.Sequential(nn.Linear(in_features=in_features_linear,
                                                       out_features=top_classifier_units),
                                             nn.ReLU(),
                                             nn.Linear(in_features=top_classifier_units, out_features=1)
                                            )

    def forward(self, features):
        batch_size = features[0].shape[0]
        embeddings = [embedding(features[i]) \
                        for i, embedding in enumerate(self._embeddings)]
        numeric = [features[i][:, :, None] for i in range(
                    len(self._embeddings),
                    len(self._embeddings)+1
                    )]


        concated_embeddings = torch.cat(embeddings, dim=-1)
        concated_embeddings = concated_embeddings.permute(0, 2, 1).unsqueeze(3)
        dropout_embeddings = self._spatial_dropout(concated_embeddings)
        dropout_embeddings = dropout_embeddings.squeeze(3).permute(0, 2, 1)
        concated_embeddings = torch.cat([dropout_embeddings] + numeric, dim=-1)
        states, _ = self._rnn(concated_embeddings)
        poolings = []
        poolings.append(states.max(dim=1)[0])
        poolings.append(states.sum(dim=1) / states.shape[1])    
        combined_input = torch.cat(poolings, dim=-1)
        logit = self._top_classifier(combined_input)  
        return logit

    @classmethod
    def _create_embedding_projection(cls, cardinality, embed_size, add_missing=True, padding_idx=0, init=None):
        add_missing = 1 if add_missing else 0
        v = nn.Embedding(num_embeddings=cardinality+add_missing, embedding_dim=embed_size, padding_idx=padding_idx)
        if init is not None:
            v.weight.data.copy_(init)
        return v



model = Rnn(rnn_units=128, top_classifier_units=128).to(device)
optimizer = torch.optim.Adam(lr=1e-3, params=model.parameters())
 
for epoch in range(10):
    print(f'Starting epoch {epoch+1}')
    loss_function = nn.BCEWithLogitsLoss()
    num_batches = 1
    running_loss = 0.0
    model.train()
    output = torch.flatten(model(batch['features']))
    batch_loss = loss_function(output, batch['label'].float())
    batch_loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    running_loss += batch_loss

Starting epoch 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[70], line 8
      5 running_loss = 0.0
      6 model.train()
----> 8 output = torch.flatten(model(batch['features']))
      9 batch_loss = loss_function(output, batch['label'].float())
     11 batch_loss.backward()

 
File ~/venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[66], line 41, in Rnn.forward(self, features)
     39 dropout_embeddings = self._spatial_dropout(concated_embeddings)
     40 dropout_embeddings = dropout_embeddings.squeeze(3).permute(0, 2, 1)
---> 41 concated_embeddings = torch.cat([dropout_embeddings] + numeric, dim=-1)
     43 states, _ = self._rnn(concated_embeddings)
     44 poolings = []
...
    208     or data.dtype == torch.int64
    209 ):
    210     raise TypeError(f"{data.dtype} is not supported in MaskedTensor.")

TypeError: mask must have dtype bool.

The problem occurs when I concat embedding and masked tensor.

PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Clang version: Could not collect
CMake version: version 3.27.2
Libc version: glibc-2.17
Python version: 3.10.8 (main, Oct 28 2022, 08:43:57) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] (64-bit runtime)
Python platform: Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla V100-PCIE-32GB
Nvidia driver version: 460.106.00
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping: 4
CPU MHz: 2499.975
CPU max MHz: 3000.0000
CPU min MHz: 800.0000
BogoMIPS: 4400.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 14080K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke md_clear spec_ctrl intel_stibp flush_l1d

Versions of relevant libraries:
[pip3] numpy==1.23.1
[pip3] torch==2.0.1+cu118
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
[conda] Could not collect

output of nvidia-smi:

Hi! were you able to reproduce the error?

Yes, thanks for the code snippet.
You cannot concatenate a standard tensor with a masked one as seen here:

import torch
from torch.masked import masked_tensor, as_masked_tensor
data = torch.arange(24).reshape(2, 3, 4)
mask = data % 2 == 0
mt = masked_tensor(data.float(), mask)

print(mt)
print(torch.cat([mt, mt]))
# fails
#print(torch.cat([torch.zeros(2, 3, 4), mt]))
print(torch.cat([masked_tensor(torch.zeros(2, 3, 4), torch.ones_like(data).bool()), mt]))

and would need to create another masked tensor.

thank you very much, this thought never even occurred to me:)
I have a doubt that if I turn the embeddings into a masked_tensor (with a mask True, like torch.cat([masked_tensor(dropout_embeddings, torch.full_like(dropout_embeddings, True).bool())] + numeric, dim=-1)), then the embeddings themselves will learn worse, is that so?