Thank you for the quick reply. Alright, it might be the activation that has been the culprit. If we may take a minute of your time, does the following make sense with regards to predicting using the F.binary_cross_entropy
function?
Preamble wrt. our work process is as follows:
We’ve normalized the data with mean and std, and we one hot encoded the pixels to 0 and 1’s for a given image. To make it fit into WaveNet, we flattened the vector, as we use Conv1D and Causal dilation layers. Lastly, when we try to make it predict, it is based on the following snippet of code:
MNIST_ = load_mnist(path=path, train=True)
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
print("Training using",device)
model = Wavenet(layers=3,blocks=2,output_size=1, output_channels=1).to(device)
model.train()
optimizer = optim.Adam(model.parameters(), lr=0.0003)
epochs = 5
# for epoch in range(epochs):
for i, batch in tqdm(enumerate(MNIST_)):
Xtrain, ytrain = batch
batch_size = Xtrain.shape[0]
Xtrain = Xtrain.reshape(batch_size,1,784)
# Xtrain = Xtrain.transpose(1, 2)
# encoded_MNIST = encode_MNIST(Xtrain)
inputs = Xtrain[:,:,:-1].to(device)
target = Xtrain[:,:,1:].to(device)
# inputs = encoded_MNIST[:,:,:-1].to(device)
# target = encoded_MNIST[:,:,1:].to(device)
output = model(inputs)
loss = F.binary_cross_entropy(output, target)
print("\nLoss:", loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i % 1000 == 0:
print("\nSaving model")
torch.save(model.state_dict(), "wavenet_MNIST.pt")
EDIT: Our advisor said that the sliding window could be used as such, even though we might have done the implementation wrong. Essentially, he said that using a sliding window and doing it i
times over the remainder of the vector would be the same as merely subtracting a window_size
in each tail ends