I am working on implementing some custom function (and its corresponding module) that will be part of the sequential execution in a layer. When running it I get a very bad performance in terms of speed of training. I even commented the computations and left only the loops and the calls to the custom function, and even then it is still super slow. It could be that the looping is the main cause. This seems unlikely, however, since the size of the tensor is [20, 1, 28, 28] and [20, 32, 12, 12], corresponding to the two layers. The most interesting parts of the code are the following:
class FrankWolfe_MinCutSum_Canonical(Function): @staticmethod def in_bounds(ctx, coord): (x, y) = coord return 0 <= x < ctx.width and 0 <= y < ctx.height @staticmethod def neighbors(ctx, coord): (x, y) = coord return list(filter(lambda p: FrankWolfe_MinCutSum_Canonical.in_bounds(ctx, p), [(x+1, y), (x, y-1), (x-1, y), (x, y+1)])) @staticmethod def relaxed_taylor_closed_form_solution(ctx, w): u_star = ctx.beta for x in range(ctx.width): for y in range(ctx.height): i = (x, y) u_star[i] -= ctx.alpha*sum([(1 if w[i] > w[j] else -1) for j in FrankWolfe_MinCutSum_Canonical.neighbors(ctx, i)]) return u_star @staticmethod def forward(ctx, beta, alpha=1, tol=1e-6, max_iter=20): # print(beta) ctx.height, ctx.width = list(beta.shape) ctx.alpha, ctx.beta = alpha, beta v = beta # for t in range(max_iter): # w = v.exp().div(v.exp().add(1)) # u_star = FrankWolfe_MinCutSum_Canonical.relaxed_taylor_closed_form_solution(ctx, w) # gamma_t = 2 / (t + 2) # v_1 = v # v = (1 - gamma_t)*v_1 + gamma_t*u_star # if torch.norm(v - v_1) < tol: # break ctx.save_for_backward(v.clone()) return v @staticmethod def backward(ctx, grad_output): v, = ctx.saved_variables # return v * grad_output return grad_output class LateralInteractions(nn.Module): def __init__(self): super(LateralInteractions, self).__init__() def forward(self, x): out = x batch_size, channels, _, _ = list(x.shape) for b in range(batch_size): for c in range(channels): # continue # out[b, c] = FrankWolfe_MinCutSum_Canonical.apply(Variable(x[b, c], requires_grad=True)) out[b, c] = FrankWolfe_MinCutSum_Canonical.apply(x[b, c]) return out
I left in the comments so that you can see that right now the function is basically passing the input as output without changing anything. Then the network looks as follows:
class ConvNetLat(nn.Module): def __init__(self, n = 10): super(ConvNetLat, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 32, kernel_size=7, stride=2, padding=1), LateralInteractions()) self.layer2 = nn.Sequential( nn.Conv2d(32, n, kernel_size=7, stride=1, padding=1), LateralInteractions(), nn.AdaptiveAvgPool2d(1)) self.log_softmax = nn.LogSoftmax(dim=1) def forward(self, x): # [20, 1, 28, 28] out = self.layer1(x) # [20, 32, 12, 12] out = self.layer2(out) # [20, 10, 1, 1] out = out.reshape(out.size(0), -1) out = self.log_softmax(out) return out
What could I do to improve the training performance? This is literally taking hours to train, as is (with commented code and all).
Any help will be greatly appreciated.