The current issue looks like it may be coming from the dataset definition as it is not converting to a format that the default collate function can, well, collate. You can likely fix this by adding the to_tensor transformation to the dataset like this example.
Something is strange here because your resulting image shape is now 1x1. Can you check the intermediate shapes at each step? e.g., imagetun.shape and image.shape after each operation?
Since this looks like a pretty standard vision classification model, please take a look at the MNIST example. It is a good template to base a simple classification model on, particularly the data loading pipeline which seems to be the main issue here.
Please take a look at the MNIST and ImageNet examples as they show great examples of how typical classification model training pipelines are built. You can likely use the ImageFolder dataset class as is done in the ImageNet example.