Adapting MNIST designed network for larger data

I am attempting to implement this deep clustering algorithm, which was designed to cluster the MNIST dataset. Single Channel 28x28 images.

The images I am trying to use are 416x416 and 3-Channel RGB. The script is initialised with the following functions.

class CachedMNIST(Dataset):
    def __init__(self, train, cuda, testing_mode=False):
        img_transform = transforms.Compose([transforms.Lambda(self._transformation)])
        # img_transform = transforms.Compose([transforms.Resize((28*28)), transforms.ToTensor(), transforms.Grayscale()])
        self.ds = torchvision.datasets.ImageFolder(root=train, transform=img_transform)
        self.cuda = cuda
        self.testing_mode = testing_mode
        self._cache = dict()

    @staticmethod
    def _transformation(img):
        return (torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes())).float()
                * 0.02
        )

If the images are left un-altered the resulting tensor shape output from the _transformation function is of size torch.Size{{256,519168]] far too large for the AutoEncoder network to calculate.

Error 1
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x519168 and 784x500)

When I attempted to resize the images a the result is a 4D Tensor, torch.Size([256,1,784,748]) which even when reducing the Batch Size to minuscule amounts the CUDA will crash as there is not enough memory.

Error 2
RuntimeError: CUDA out of memory.

I’m hoping someone can point me in the right direction to tackle this problem as there must be a more efficient way to adapt the network.

AutoEnocder Model

StackedDenoisingAutoEncoder(
  (encoder): Sequential(
    (0): Sequential(
      (linear): Linear(in_features=784, out_features=500, bias=True)
      (activation): ReLU()
    )
    (1): Sequential(
      (linear): Linear(in_features=500, out_features=500, bias=True)
      (activation): ReLU()
    )
    (2): Sequential(
      (linear): Linear(in_features=500, out_features=2000, bias=True)
      (activation): ReLU()
    )
    (3): Sequential(
      (linear): Linear(in_features=2000, out_features=10, bias=True)
    )
  )
  (decoder): Sequential(
    (0): Sequential(
      (linear): Linear(in_features=10, out_features=2000, bias=True)
      (activation): ReLU()
    )
    (1): Sequential(
      (linear): Linear(in_features=2000, out_features=500, bias=True)
      (activation): ReLU()
    )
    (2): Sequential(
      (linear): Linear(in_features=500, out_features=500, bias=True)
      (activation): ReLU()
    )
    (3): Sequential(
      (linear): Linear(in_features=500, out_features=784, bias=True)
    )
  )
)