Searching a project in Deep Learning

LsTam91 · November 20, 2020, 6:24pm

Hello,

I have to find and realize a project in deep learning for my master degree, but I don’t know what to choose. We will have 20 days to realize it, and my partner and I are beginner in deep learning, we have already done a project in image classification using transfer learning. Do you have some idea for interesting beginner project?

Ivan_A · November 20, 2020, 6:50pm

@ptrblck @albanD @Deeply
Hello, I hope can they help us, we are really interested in hearing your opinion on this topic, I think that will be really illustrative for all of us.

ptrblck · November 20, 2020, 8:29pm

20 days is not a lot of time and I would suggest to use at least an already available (and clean) dataset.
Given this requirement I would then look for projects, which both of you find interesting from a personal point of view, and try to make sure this topic meets the requirements from your university.
Do you have any recommended projects, topics, or any other information or are you completely free to pick your topic for 20 days?

LsTam91 · November 21, 2020, 9:02am

They proposed us to participate to a challenge or take a dataset and test new algortihm on it, but we should bring a new/interesting approach to the implementation, and we haven’t find good or rechable idea to start with.

ptrblck · November 21, 2020, 9:53am

I don’t know if this would fit the criteria (so you should make sure your advisor would be OK with it), but you could take a look at (past) competitions from e.g. DrivenData and see if new approaches could yield any new results/findings.

LsTam91 · November 21, 2020, 9:57am

Thanks, I will take a look

Deeply · November 23, 2020, 10:57am

Here is an idea that occurred to me but was not able to try due to lack of time. I’m not sure if it has been tackled before! My guess is not, but you can do some search to find out.

We report the classification accuracy in any classification problem. This metric, however, might be deceptive due to lots of reasons; data imbalance, the percentage of incorrectly labeled samples (especially in the test set, if any) and the number of classes (the latter would be interesting to tackle).

Now, the simplest problems to consider are CIFAR-10 and CIFAR-100. I think the best reported performance is by this work and the code is available in PyTorch (EffNet-L2 achieved 99.70% and 96.08 on CIFAR-10 and CIFAR-100, respectively).
The research question: Is EffNet-L2 doing a better job at CIFAR-10 or CIFAR-100?

The problem hence is how to statistically quantify the Model Performance when the number of classes/categories increases? This should be an additional metric that complements the classification accuracy. One dilemma that might affect our judgement is the 10% of samples (which I consider a bit low) used in the testing phase, as it might affect model generalization.