it’s possible to work with patches (5,5) with 8 conv layer ?
yes or no? plzz
it’s possible to work with patches (5,5) with 8 conv layer ?
yes or no? plzz
I’m not sure what the 5x5
patches mean exactly, but if this would be the input shape then no, it won’t be possible, since you are reducing the spatial size via the max pooling layers, which would create an empty tensor at one point or fail in a conv layer, since the activation shape would be smaller than the conv kernel.
it’s correct or no @ptrblck???
The screenshot seems to show another model, as the initial one had 3 pooling layers, while the current one would work and use an activation of a single pixel for the majority of the layers.
If the training, validation, and test accuracy is sufficiently high, I would claim that your model works.
Not necessarily. If the training accuracy is at 100%, while the validation accuracy is going down or generally shows a gap, your model would be overfitting. If the train, val, and test accuracies are all at 100%, I would be concerned about a data leak and would try to increase the dataset, since your are not getting a valid signal from the training.
I don’t understand the last question. Could you explain a bit more what the concern about the kernel and activation shapes is as well as what the feature
would mean?
The kernel size and number of kernels are defined by the user, so you can change it to whatever works for your use case. The posted shapes are described in a variety of papers, which often claim superior performance using e.g. a kernel size of 3x3
.
hi @ptrblck , when i increase the kernel size, the performance increases?
concerning the number of fealter, when I increase the number of fealter gives more precise characteristics of fealter??
Changing the hyperparameters such as the kernel size and number of kernels can increase or decrease the performance and depends on the model architecture as well as your use case, so I don’t think you can generalize it via your statements.
You could try to derive some insights about the number of filters, thus number of parameters, and the related “capacity” of the model.
I’m not sure what the exact question is.
As mentioned before, the channels as well as the kernel size can be picked by the user and is one step of defining the model architecture. I would not claim that one approach is superior to another one, as model architectures change a lot and these kind of general claims might be bogus after a while.
If you are not sure which setup to use, I would recommend to start with a known architecture such as a ResNet and use it’s setup.
@ptrblck how I chose the number of kernel size and number of feature of the entry?
The in_channels
are defined by the number of channels in the input tensor for the first conv layer and can be picked for the out_channels
as well as the following layers.
The kernel size can be picked by the user as long as it’s larger than the spatial size of the input.
thank you @ptrblck
Both can improve or reduce the performance.
As described before, the model performance depends on the general architecture and you, as the researcher, would have to run experiments to see which kernel sizes, number or kernels, and other hyperparamters work well for your use case.
If there would be a general rule such as: the larger the filter, the better the model performance, one could simply use the max. size and would get the best model (which is of course not the case).
I would recommend to start with an online course such as FastAI or any other course which gives a good introduction to ML/DL.
But @ptrblck if image is hperspectral with channel =3 so the number of feature must be big to present the data well?
The number of channels and size of the input image doesn’t necessarily reflect the data domain.
E.g. you could create input images in the shape [3, 224, 224]
, which would be completely black for class0 and white for class1. Even though these images have a high dimensionality, the use case is really simple and you could create a perfect “model” by just using:
if input_pixel_value > 0:
return 1
else:
return 0
As you can see, the necessary model architecture depends on the use case and I would be careful in claiming general statements.
@ptrblck thank you very much
@ptrblck just anather question, the kernel size is must be less or the same as the size of the patch?
Yes, the patches are created based on the kernel size.
Take a look at CS231n for a general overview of DL operations.
thank you @ptrblck
hi @ptrblck the train percentage is 70% is sufficient??