Create dataset out of x_train and y_train

How to put the x_train and y_train into a model for training?
the x_train is a tensor of size (3000, 13).
the y_train is of size (3000, 1)
That is for each element of x_train (1, 13), the respective y label is one digit from y_train.
if I do:

train_data = torch.hstack((train_feat, train_labels))
print(train_data[0].shape)
print(train_data[1].shape)

torch.Size([3082092, 13])
torch.Size([3082092, 1])
train_loader = data.DataLoader(dataset=train_data,
                                batch_size= 7,
                                shuffle=True)

The dataloader does not return the batch size, but returns the whole dataset instead

Are you sure about the sizes of train_feat and train_labels ? Your code should works.

As an example, the following works:

dummy_features = torch.randn(3000, 13)
dummy_labels   = torch.randint(2, (3000, 1))  # integers in [0, 1]

train_data = torch.hstack((dummy_features, dummy_labels))
print(train_data.shape)  # torch.Size([3000, 14])

train_loader = torch.utils.data.DataLoader(train_data, batch_size= 7, shuffle=True)

print(len(train_loader) == np.ceil(3000 / 7))  # True

batch = next(iter(train_loader))
print(batch.size())  # torch.Size([7, 14])

features, labels = batch[:, :-1], batch[:, -1]

Thank you for your reply.

Maybe I’m doing something wrong while iterating or in the model parameters. because using the model below, I get an error about the size. The input should be 7 frames, each of size (1, 13) (or (1, 14) considering the label.
and the output channels is 1024 as an example.

model = nn.Sequential(torch.nn.Conv1d(13, 1024, kernel_size=7, stride=1, padding=0, dilation=1, groups=1, bias=True),
torch.nn.BatchNorm1d(1024),
torch.nn.ReLU(inplace=True),
torch.nn.Conv1d(1024, 1024, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, bias=True),
torch.nn.BatchNorm1d(1024),
torch.nn.ReLU(inplace=True),
torch.nn.Conv1d(1024, 50, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, bias=True))

classifier = nn.Linear(1024, 50)

This is the training script:

class IterMeter(object):
    """keeps track of total iterations"""
    def __init__(self):
        self.val = 0

    def step(self):
        self.val += 1

    def get(self):
        return self.val


def train(model, device, train_loader, criterion, optimizer, scheduler, epoch, iter_meter, experiment):
    model.train()
    data_len = len(train_loader.dataset)
    with experiment.train():
        for batch_idx, _data in enumerate(train_loader):

            optimizer.zero_grad()

            output = model(x_train) 

            loss = criterion(output, y_train)
            loss.backward()

            experiment.log_metric('loss', loss.item(), step=iter_meter.get())
            experiment.log_metric('learning_rate', scheduler.get_lr(), step=iter_meter.get())

            optimizer.step()
            scheduler.step()
            iter_meter.step()
            if batch_idx % 100 == 0 or batch_idx == data_len:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(x_train), data_len,
                    100. * batch_idx / len(train_loader), loss.item()))

iter_meter = IterMeter()
for epoch in range(1, epochs + 1):
    train(model, device, train_loader, criterion, optimizer, scheduler, epoch, iter_meter, experiment)

I’m getting an error

RuntimeError: Expected 3-dimensional input for 3-dimensional weight [1024, 13, 7], but got 2-dimensional input of size [3082092, 13] instead

I’m not sure what you’re trying to do.
You want to apply a convolution with a kernel size of 7 to a 1d vector of size 13, is that correct ?
In that case your input should be a tensor with 1 channel containing a 1D vector of size 13.
So your input size should be [7, 1, 13] (batch_size, channels, dim).
And in your model, the first conv must have an one input channel instead of 13.

Given your error, you’re giving to your model an input of size [3082092, 13] so it seems you didn’t followed my first comment. Starting from what I wrote, we have:

features, labels = batch[:, :-1], batch[:, -1] # features size is [7, 13]

In order to obtain the needed dimension you simply need to create the channel dim:

features = features.unsqueeze(dim=1) # feature size is now [7, 1, 13]

Then you can apply your model (with the first conv corrected to have 1 input channel).

Then after this first convolution your tensor will be of shape [7, 1024, 7] (batch_size, output_dim of the fist conv, output_size in function of padding, dilation, and stride)
As you seem to apply two convolutions with a kernel size of 1, the output dim won’t change. So at the end of your model, the size is [7, 50, 7].

If you wanna feed that to a linear classifier, you can flatten the last two dims and feed the result to your classifier, and correct your classifier input size which should be 50 * 7.

Here is a complete example:

model = torch.nn.Sequential(
    torch.nn.Conv1d(1, 1024, kernel_size=7, stride=1, padding=0, dilation=1, groups=1, bias=True),
    torch.nn.BatchNorm1d(1024),
    torch.nn.ReLU(inplace=True),
    torch.nn.Conv1d(1024, 1024, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, bias=True),
    torch.nn.BatchNorm1d(1024),
    torch.nn.ReLU(inplace=True),
    torch.nn.Conv1d(1024, 50, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, bias=True)
    )

classifier = torch.nn.Linear(50 * 7, 50)


dummy_features = torch.randn(3000, 13)
dummy_labels   = torch.randint(2, (3000, 1))  # integers in [0, 1]

train_data = torch.hstack((dummy_features, dummy_labels))

train_loader = torch.utils.data.DataLoader(train_data, batch_size= 7, shuffle=True)

batch = next(iter(train_loader))
features, labels = batch[:, :-1], batch[:, -1]

features = features.unsqueeze(dim=1)

outputs = model(features)

outputs = outputs.view(outputs.size(0), -1)

scores = classifier(outputs) # size (7, 50)
1 Like

HI Luc… First let me tell you that if the internet is a good place, it’s just because of people like you… so thank you.

Actually, I have a dataset as follows:
x_train: of size (3082092, 13) (3082092 audio frames, each frame contains 13 features)
y_train: a (3082092, 1) , where each item is a label for the respective audio frame.
The classes = 50 (the number of different possible labels)

The goal is to do a classification for each frame. So for each single frame, there should be 50 probabilities and the correct label would be the highest probability.

That said, the input should be of batch_size = 7, ie: 7 frames each time. each frame has 13 features.
Also the kernel_size = 7, and that’s why the next layer has a kernel_size = 1.

You’re welcome !

Ok then I think the code i wrote above should fit your need just fine. Did you manage to make it work ?

If I may suggest you an other approach, you can try to transform your 1D audio signals into 2D signals (for instance by computing their spectrograms) and then apply a convolutional neural network onto those 2D signals. This allows you to use powerful CNN architectures, which, from my experience, lead to better results for audio classification.

Thank you very much for your help!
I’ll have to try the code first thing tomorrow morning.
As for the 2D spectrograms transformation, it’s a different approach that I’m not currently applying.
In fact, those 13 features above are the computed mfcc’s ( Mel Frequency Cepstral Coefficients) which I have to use for my project.