How to repeat a vector batch wise?

I have a batch of 100, data vectors each of length 784, So my batch has size,

>>> images_vec.size() 
torch.Size([100, 784])

How do I copy each vector, seq_length=28 times? So that I now have a batch of data vectors, which are simply repeated seq_length times. So the batch should now have size,

torch.Size([100, 784, 28])

Here’s the batch loading code,

import torch 
import torchvision.datasets as dsets
import torchvision.transforms as transforms

# Hyper Parameters
sequence_length = 28
input_size = 28*28
batch_size = 100

# MNIST Dataset
train_dataset = dsets.MNIST(root='../data_tmp/',
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

for i, (images, labels) in enumerate(train_loader):
    #images_repeated = images.view(-1, sequence_length, input_size) 
    images_vec = images.view(-1, input_size)
    #images_repeated = torch.randn(100,784,28) # pre-allocate
    images_repeated = torch.randn(batch_size, input_size, sequence_length) # pre-allocate
    #for j in range(sequence_length):
        #images_repeated [,,j] = images_vec
    y_true = labels
    if i == 1:
        break

images.size()
images_vec.size() 

Cheers,

Aj

OK, answer my own question, this seems to work?

>>> images_repeated = images_vec.repeat(1,sequence_length) 
>>> images_repeated = images_repeated.view(-1,input_size, sequence_length)
>>> images_repeated.size()
torch.Size([100, 784, 28])

I guess it could be checked by something like this?

images_repeated[99,783,0] == images_vec

It should be all true? I expect there must be a more robust or cleaner way to do all this though?

You can use repeat – this will copy each vector 28 times.

X = torch.randn(100, 700)
X = X.unsqueeze(2).repeat(1, 1, 28)

Or you can use expand: this will only create a view, without copying any data.

X = torch.randn(100, 700)
X = X.unsqueeze(2).expand(100, 700, 28)
14 Likes

Ha ha, very nice - thank Q :smile:

How do check, all true vectorially in PyTorch?

What I mean is,

X = torch.randn(100, 700)
X_tmp = X.unsqueeze(2).repeat(1, 1, 28)

X[1] == X_tmp[1,:,0] # should be all true - want to check it vectorially

np.alltrue(X[1].numpy() == X_tmp[1, :, 0].numpy())?

1 Like

Ha ha – @elanmart – you’re quick dude – good to have you here :smile:

1 Like

If you want to stay in pytorch (e.g. for GPU arrays), you can do (X[1] != X_tmp[1,:,0]).sum()==0 . The caveat is that NaN != NaN (X != X seemed to be easiest way to check for NaN in pytorch a while back).

Best regards

Thomas

2 Likes

Thanks a lot @tom – seems you’re mastering PyTorch pretty quick!

Good to hear from you :smile:

1 Like

@elanmart His code for me perfectly.
My situation is this, I want to copy 128 times along a dimension:

x=torch.randn(100, 1, 32, 32)
print(x.shape)

you can get [100, 1, 32, 32]

x_hat = x.repeat(1,128,32,32)
print(x_hat)

then you can get [100, 128, 32, 32]. And you can also print out the inside value:

print(x[0,0,:20, :20])
print(x_hat[0,33,:20,:20])

X = torch.randn(100, 700)
X = X.unsqueeze(2).expand(100, -1, -1)

Passing -1 as the size for a dimension means not changing the size of
that dimension.