Wasting my time?

First off, I’m relatively new to python in general and extremely new to pytorch, so go easy on me if you check out the modules. (I’m still working through pylint issues after adapting from a notebook… :slight_smile:)

Either way, I’ve written a few modules that allow me to generate an arbitrary number of pytorch CNNs with randomized convolution, pooling, and linear layers. All “random” values (e.g. number of convolution layers, convolution out channels/kernel size, linear layer out features, include/don’t include a max pool layer, etc.) are subject to constraints that can be supplied using the kwargs of the builder class (NetDictionary).

My original use case was analyzing the data in the Mexican Covid19 dataset by converting it into an image and then using a CNN to “label” each case based on actual outcome. The results were ultimately on par (and slightly better) than models based on traditional linear/PCA techniques. The ‘utilities’ module I created is specific to this use case, but the other two modules could be used in other applications, which leads me to my question:

Could this functionality (i.e. the ability to generate and test random CNN layer structures and parameterizations) be applied to other use cases? The ultimate end game might be setting up a NN that builds and (partially) trains a series of CNNs and then automatically adjusts the constraints (number of layers of each type + layer parameterization) on the CNNs generated in the next pass based on the results.

Is this reinventing the wheel? Completely bonkers? Improbable but maybe possibly possible? All thoughts are appreciated. Example basic usage (i.e. I haven’t tried to build an external network to dial in my CNNs… :sweat_smile:):

from modules.network_dictionary_builder import NetDictionary
from modules.network_dictionary_analyzer import NetDictionaryAnalyzer
from modules.utilities import *

#global constants
NET_PATH = './networks.tar'
DATA_PATH = './datasets.tar'
IMAGE_DEPTH = 4
LABELS = ('Hospitalized', 'Intubated', 'Deceased', 'Pneumonia')
COLUMNS = ('Male', 'Pregnant', 'Diabetes', 'Asthma', 'Immunocompromised',
           'Hypertension', 'Other Disease', 'Cardiovascular Disease', 'Obesity', 'Kidney Disease',
           'Tobacco Use', 'COPD')
itc = ImageTensorCreator(IMAGE_DEPTH, COLUMNS)
LOSS_RECORDING_RATE = 1

#main functions
test_tensor = itc.create_fake_data(40,["Diabetes"])
network_dictionary = NetDictionary(3, test_tensor, len(LABELS), NET_PATH, force_rebuild=False, force_training=True)

ccd = CovidCnnDataset(DATA_PATH, itc, pyodbc_conn_string='DSN=covid;UID=seh;PWD=Welcome2020!;',
                                      query="{CALL getpydatav2}",
                                      #force_rebuild=True,
                                      approx_dataset_size=54000,
                                      validation_ratio=0.4)

network_dictionary.train_validate_networks(ccd.train_data, ccd.validation_images,
                                           ccd.validation_labels, LOSS_RECORDING_RATE)

network_dictionary.export_networks()

network_analysis = NetDictionaryAnalyzer(network_dictionary)
network_analysis.plot_losses()
3 Likes

Hi,

Just a quick note, you might want to double check meta learning and for example the higher library that is geared towards this.
Unfortunately, I’m not familiar enough with the litterature to be of much help beyond that :confused:

2 Likes

That seems like a special case of hyperparameter tuning, just instead of playing around with learning rates etc. you are adjusting layers. I’ve used tune and it should be able to do what you are looking to do, you would just need to fill out the config to whatever parameters you want to search over (in your case the layer information that you want to play around with) and then you can do a grid search over those. You can then use those values in your network instantiation.
For something more advanced than a plain grid search especially if there are computation constraints you can use something like hyperopt with tune which should be better than a blind grid search.

3 Likes

Your use case sounds really interesting. :slight_smile:

Additionally to the mentioned approaches, you might also want to checkout differentiable architecture search approaches, such as DARTS. (I don’t know, if it’s the state-of-the art or what is used nowadays :wink: ).

2 Likes

It sound a little bit like a task for a genetic algorithm. You start with a population of random genomes, where each genome encodes the architecture of a model, and through selection/mutation/reproduction you create new models. The fitness function has to depend on the test accuracy and probably on the network complexity so you networks don’t blow up.

I’m not knowledgeable enough to say if this truly a way to go in principle. Obviously, there are practical issues, particularly that each generation requires the training and testing of the whole current population of networks.

2 Likes

@vdw Agreed. Amount of compute will be the biggest obstacle to something like this actually being useful, but thanks for breaking down a problem where it might actually have a use case…

Thanks to all for the suggestions! @ptrblck DARTS looks like the mature version of what I’ve been playing around with. Thanks for the link! I’ll be doing some reading for the next few weeks… :sweat_smile:

Well it turns out that there’s actually a ton of literature out there regarding automated CNN and RNN architecture searches (thanks for the DARTS link and corresponding rabbit hole @ptrblck :wink:).

I’m only 4 articles deep at this point, but none of the approaches thus far utilize truly randomized layer structures/parameterizations for a seed. They all seem to require the user to supply a predetermined set of potential architectures and then build networks via ensembles of those possible networks (i.e. to scale a simple CIFAR net to a multi use case like ImageNet through repetition of a proven “cell” like so) or to examine multiple connectivity schemas for a predefined set of cells as DARTS does.

With that in mind, do you think that randomized approaches were examined and discarded due performance issues or computational issues? If the latter, do you think the amount of progress in the last couple of years might have loosened computational constraints sufficiently to allow an approach like this to actually work?

Thanks for any feedback!
–SEH

1 Like

I would guess that providing potential building blocks might reduce the overall complexity of the search space and could thus yield results in a reasonable time frame.
It would be also be a bit similar to “manually” manipulating a model a la: “what if I put the batchnorm layer before the ReLU?”

That being said, I’m not deeply familiar with the latest research in this direction, but a complete generation of a model “from scratch” would be fascinating to see. :slight_smile:

1 Like