Constructing a Model

I have a question, it is about how many layers comprising of conv-relu-maxpool should I use If I want to classify between 50 different objects(50 classes)?

I would like to know how to calculate this.

What should be my method to predict them?

There is no rule. There is neither a hard answer. If you have few blocks your accuracy will be worse, but not null. With just one block, you accuracy will be higher than random. The more blocks you add, the higher accuracy you will have (exponentially saturaring).

You may find interesting the ResNet paper, as it shows how accuracy increases by adding more blocks.

So suppose I have 50 classes and I am using 3 layers is it sufficient?
And is the method for prediction given below sufficient…

    with torch.no_grad():
        z="This is "+classes[x]
    return 0

assuming w is my single input image
Edit: Each class has 400 images of dimension 3x100x100

I have no idea about what yout outp is, thus, cannot answer to that.
Probably, 3 layers are not enough (depending on what do you consider as a layer.)

1 layer comprises of conv-relu-maxpool.
And I am classifying different faces.