I try to follow an approach of a paper, where guys
- use
librosa
to create a melspectrogram plot - use this picture as input for their pytorch pipeline.
Now in the meantime torch audio can create a basis for the plot. And matplotlib.imshow()
generates the picture I would like to use as the input (here you can see my related question with a plot)
I can generate a grey scale image from the data using
def scale_minmax(X, XMIN, XMAX, min=0.0, max=1.0):
X_std = (X - XMIN) / (XMAX - XMIN)
X_scaled = X_std * (max - min) + min
return X_scaled
But I have the impression, that I am losing some details here, as I cannot recreate the results of the paper.
Thus I would follow the footsteps and process the colorful image and let pytorch
do its magic.
By this I mean, that I would like to get a 3D array representing RGB channels, which later will be used as input for pretrained ResNet model (which needs 3D arrays)
One very tedious way is something like
import torch
from PIL import Image
import matplotlib.pyplot as plt
# Generate a sample plot
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
plt.plot(x, y)
# Save the plot as an image
plt.savefig('plot.png')
# Open the image using PIL
img = Image.open('plot.png')
# Convert the image to a PyTorch tensor
tensor_img = torch.from_numpy(numpy.array(img))
but this requires the intermediate save of a picture I don’t actually need.
Thus, my question: Is there any way to bypass this save?
Either some RGB extraction from my initial array (instead the grey scale I got) or at least passing the matplotlib.imshow()
directly to torch
.