Mini batches in a Pytorch custom model

Hi All,

I have built a custom autoencoder and have it working reasonably well. In an attempt to improve speed/performance, I have attempted to implement batch training.

Looking at the PyTorch.org site, it appeared that setting the batch size in the dataloader and implementing an extra loop under the epoch loop would be enough for PyTorch to ‘somehow’ figure out that the model was being fed batches and to optimize appropriately.

https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html#full-implementation

I couldn’t see any additional advice so I did just that. On running, I get an error inside the model from tensors not lining up and checking the the forward method in the model, I find that (as we might expect) the full batch hits the model.

Is there some additional work I need to do inside the model so it knows that a batch is being used rather than a series of single training samples? Should I actually be implementing processes that just ‘do the math’ on tensors with an extra dimension then average out a result? If so, does PyTorch supply methods to efficiently apply custom functions to ‘slices’ of tensors? (they don’t have variable parameters so don’t need gradients)

Keen for advice on how a person would typically implement minibatches using a custom model (based on nn.Module).

Thanks and regards,

Simon

Your description is mostly correct and creating a Dataset and wrapping it into a DataLoader will create batches of samples in each iteration where the typical code would look like:

dataset = MyDataset()
loader = DataLoader(dataset, batch_size=batch_size, ...)

# iterate the DataLoader
for data, target in loader:
   # data and target will contain a batch of samples

PyTorch’s modules accept inputs in the shape [batch_size, *], where * denotes additional dimensions which depend on the type of layer. This would mean that no additional changes are needed to use batches of data in your model.

Could you explain a bit more where exactly you are stuck and post a code snippet in case it would be helpful to explain the confusion?

Hi @ptrblck,
Thanks so much for your response. Understanding a little about what should be going on helped me get to the heart of the matter.
I think these lines are the problem:

        z = x[0].unsqueeze(0) if len(x[0].shape) == 1 else x[0]
        dz = x[1].unsqueeze(0) if len(x[1].shape) == 1 else x[1]

The inbound tensor was 3x2601 which represents three 51x51 images. I had put in the lines above since when I split the three ‘slices’ of the tensor into three separate data objects, I am left with lists rather than tensors.

However, now, this same sequence is receiving a 1000x3x2601 tensor and its unsquishing the wrong bit.

With your comments previously, I’m thinking I need to refactor to ensure that the submitted object always has an explicit 3 dimensions. Then getting those two slices I go something like

z=x[:,0,:]
dz=x[:,1,:]

I imagine there will be this sort of funny stuff all the way through my custom code but I think I get the pattern.

One issue I am going to get to is that at one point, I pass dz into a function where a bunch of math gets done and the loss function gets the result of the computation. As I said previously, the function is a straight map so no variables get used so no gradient is needed. However, for that function, my first instinct is to use a ‘for’ loop to iterate over the whole batch to create a batch of results I will then put into a torch.cat or something. My second instinct is to feel horror about the fact I’m putting in a python for loop on a process that I want to be fast. If you know any approach I might take to get around that ‘for’ loop with my function then I’m all ears.

I really appreciate your help - you’ve already made a big difference to what was looking to be a real headache. Out of interest, I am building a version of the below system that was originally designed in TensorFlow 1. It has been a great learning opportunity and although the original developers did a great job, I look forward to trying new tricks once it is in a modern framework.

Regards,

Simon

Your concern is reasonable and often a loop can be replaced with a batched operation, however it depends on the used operation(s) of course.
Could you post the loop approach with (random) input tensors so that I can take a look at it and see if a batched reference implementation might be possible?

Hi @ptrblck
Thanks so much for your kind offer.

Here is the loop as it looks for a single input. This is the original (or close to it) as it was implemented in Tensorflow 1. Although I make no claims on being an expert in reading the TF1 code, my implementation in PyTorch did seem to show signs of convergence. It’s not quite as good at the TF code and it takes longer to run but I think that’s because first, the batches seem to just get ‘handled’ in TF1 implicitly and secondly, the training set involves patterns with significant geometric symmetry. I can believe that batching across a full period of the experiment would lead to gradients that cancel or reinforce in many cases to provide a really good learning surface.

Bear in mind that at a batch size of 1000, there would be 1000 of these and the x_inputs.shape changes from [3, 1] to [1000, 3, 1]

The original [3,1] test tensor would look something like:
tensor([[-3.1019e-01], [-1.5864e-03], [ 9.3070e-03]])

For the dictionary values used, we have:

    parameters = {"latent_dim" : 1, "model_order" : 2, "poly_order" : 3, "include_sine" : True}

This is so much more than I would expect anyone to deal with - if going into the detail is a bit more than you want to deal with, I’d appreciate any pointers to useful torch functions.

Regards and thanks again for the interest,

Simon

    def build_sindy_theta(self, x_inputs: torch.Tensor) -> torch.Tensor:
        """
        This function creates the state for Theta given input latent variable(s). The function is implemented for first
        or second order systems. The second order system includes library terms for the derivatives of the latent
        variable. Note that in the case where a second order system is used, z here is actually a concatenation of z and
        dz to z,dz. It is assumed that z and dz are elements 0 and 1 on the first axis of the tensor x_inputs supplied
        in the first argument. This z and dz are the latent variables produced by the encoder part of the autoencoder
        prior to decoding.
        The parameters dictionary must include:
            * latent_dim   - the number of latent variables needed to describe the system.
            * poly_order   - the maximum number of arguments in the polynomial describing the system.
            * include_sine - Whether to include sine in the arguments used to describe the system.
            * model_order  - the order of the model needed to describe the system (1 or 2)
                             If a second order model is needed, then dx, the derivative of x_inputs will be used as well
                             as x_inputs.
        """

        # TODO: remove the nested iterations. Also replace 'if' with range(poly_order)
        # z_slice: torch.Tensor = x_inputs[0].unsqueeze(0) if len(
        #     x_inputs[0].shape
        # ) == 1 else x_inputs[0]
        z_slice: torch.Tensor = x_inputs[0]
        library = [torch.ones(z_slice.shape[0], device=self.target_device)]

        latent_dim = self.params["latent_dim"]

        # In the case where a non-linear dynamic process is suspected, both z and dz will be modelled. In that case, a
        # set of parameters will be needed for both, so we double the number of parameters needed in the latent
        # dimensions (latent_dim) parameter.The latent variable list used in the model is reconstructed to be a
        # concatenation of both z and dz.
        if self.params["model_order"] == 2:
            # dz_slice = x_inputs[1].unsqueeze(0) if len(x_inputs[1].shape) == 1 else x_inputs[1]
            dz_slice = x_inputs[1]
            z_slice = torch.cat([z_slice, dz_slice], 1)  # concatenate.
            latent_dim = 2 * self.params["latent_dim"]

        for i in range(latent_dim):
            library.append(z_slice[:, i])

        # The parameters for z are now provided for each latent variable.
        # A list is constructed such that each parameter is included with a group of all possible combinations of each
        # other parameters.
        if self.params["poly_order"] > 1:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    library.append(torch.mul(z_slice[:, i], z_slice[:, j]))

        if self.params["poly_order"] > 2:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    for k in range(j, latent_dim):
                        library.append(
                            z_slice[:, i] * z_slice[:, j] * z_slice[:, k]
                        )

        if self.params["poly_order"] > 3:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    for k in range(j, latent_dim):
                        for p in range(k, latent_dim):
                            library.append(
                                z_slice[:, i]
                                * z_slice[:, j]
                                * z_slice[:, k]
                                * z_slice[:, p]
                            )

        if self.params["poly_order"] > 4:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    for k in range(j, latent_dim):
                        for p in range(k, latent_dim):
                            for q in range(p, latent_dim):
                                library.append(
                                    z_slice[:, i]
                                    * z_slice[:, j]
                                    * z_slice[:, k]
                                    * z_slice[:, p]
                                    * z_slice[:, q]
                                )

        if self.params["include_sine"]:
            for i in range(latent_dim):
                library.append(torch.sin(z_slice[:, i]))

        # this is the variable Theta.
        # torch.stack(library, axis=1).float()
        return torch.stack(library).float()

It’s hard to tell if a speedup would be expected as the operations are quite small by themselves.
While a loop would add a certain overhead, the actual dispatching of these workloads could also be visible.
In any case, here is a draft of a code avoiding loops for the first operations (the higher poly ops could most likely also be avoided but are also unused in your current use case):

def build_sindy_theta(x_inputs: torch.Tensor) -> torch.Tensor:
    """
    This function creates the state for Theta given input latent variable(s). The function is implemented for first
    or second order systems. The second order system includes library terms for the derivatives of the latent
    variable. Note that in the case where a second order system is used, z here is actually a concatenation of z and
    dz to z,dz. It is assumed that z and dz are elements 0 and 1 on the first axis of the tensor x_inputs supplied
    in the first argument. This z and dz are the latent variables produced by the encoder part of the autoencoder
    prior to decoding.
    The parameters dictionary must include:
        * latent_dim   - the number of latent variables needed to describe the system.
        * poly_order   - the maximum number of arguments in the polynomial describing the system.
        * include_sine - Whether to include sine in the arguments used to describe the system.
        * model_order  - the order of the model needed to describe the system (1 or 2)
                         If a second order model is needed, then dx, the derivative of x_inputs will be used as well
                         as x_inputs.
    """

    # TODO: remove the nested iterations. Also replace 'if' with range(poly_order)
    # z_slice: torch.Tensor = x_inputs[0].unsqueeze(0) if len(
    #     x_inputs[0].shape
    # ) == 1 else x_inputs[0]
    z_slice: torch.Tensor = x_inputs[0]
    library = [torch.ones(z_slice.shape[0], device=x_inputs.device)]

    latent_dim = parameters["latent_dim"]

    # In the case where a non-linear dynamic process is suspected, both z and dz will be modelled. In that case, a
    # set of parameters will be needed for both, so we double the number of parameters needed in the latent
    # dimensions (latent_dim) parameter.The latent variable list used in the model is reconstructed to be a
    # concatenation of both z and dz.
    if parameters["model_order"] == 2:
        # dz_slice = x_inputs[1].unsqueeze(0) if len(x_inputs[1].shape) == 1 else x_inputs[1]
        dz_slice = x_inputs[1]
        z_slice = torch.cat([z_slice, dz_slice], 1)  # concatenate.
        latent_dim = 2 * parameters["latent_dim"]

    for i in range(latent_dim):
        library.append(z_slice[:, i])

    # The parameters for z are now provided for each latent variable.
    # A list is constructed such that each parameter is included with a group of all possible combinations of each
    # other parameters.
    if parameters["poly_order"] > 1:
        for i in range(latent_dim):
            for j in range(i, latent_dim):
                library.append(torch.mul(z_slice[:, i], z_slice[:, j]))

    if parameters["poly_order"] > 2:
        for i in range(latent_dim):
            for j in range(i, latent_dim):
                for k in range(j, latent_dim):
                    library.append(
                        z_slice[:, i] * z_slice[:, j] * z_slice[:, k]
                    )

    if parameters["poly_order"] > 3:
        for i in range(latent_dim):
            for j in range(i, latent_dim):
                for k in range(j, latent_dim):
                    for p in range(k, latent_dim):
                        library.append(
                            z_slice[:, i]
                            * z_slice[:, j]
                            * z_slice[:, k]
                            * z_slice[:, p]
                        )

    if parameters["poly_order"] > 4:
        for i in range(latent_dim):
            for j in range(i, latent_dim):
                for k in range(j, latent_dim):
                    for p in range(k, latent_dim):
                        for q in range(p, latent_dim):
                            library.append(
                                z_slice[:, i]
                                * z_slice[:, j]
                                * z_slice[:, k]
                                * z_slice[:, p]
                                * z_slice[:, q]
                            )

    if parameters["include_sine"]:
        for i in range(latent_dim):
            library.append(torch.sin(z_slice[:, i]))

    # this is the variable Theta.
    # torch.stack(library, axis=1).float()
    return torch.stack(library).float()



def my_build_sindy_theta(x_inputs: torch.Tensor) -> torch.Tensor:
    """
    This function creates the state for Theta given input latent variable(s). The function is implemented for first
    or second order systems. The second order system includes library terms for the derivatives of the latent
    variable. Note that in the case where a second order system is used, z here is actually a concatenation of z and
    dz to z,dz. It is assumed that z and dz are elements 0 and 1 on the first axis of the tensor x_inputs supplied
    in the first argument. This z and dz are the latent variables produced by the encoder part of the autoencoder
    prior to decoding.
    The parameters dictionary must include:
        * latent_dim   - the number of latent variables needed to describe the system.
        * poly_order   - the maximum number of arguments in the polynomial describing the system.
        * include_sine - Whether to include sine in the arguments used to describe the system.
        * model_order  - the order of the model needed to describe the system (1 or 2)
                         If a second order model is needed, then dx, the derivative of x_inputs will be used as well
                         as x_inputs.
    """

    # TODO: remove the nested iterations. Also replace 'if' with range(poly_order)
    # z_slice: torch.Tensor = x_inputs[0].unsqueeze(0) if len(
    #     x_inputs[0].shape
    # ) == 1 else x_inputs[0]
    z_slice: torch.Tensor = x_inputs[0]
    library = [torch.ones(1, z_slice.shape[0], device=x_inputs.device)]

    latent_dim = parameters["latent_dim"]

    # In the case where a non-linear dynamic process is suspected, both z and dz will be modelled. In that case, a
    # set of parameters will be needed for both, so we double the number of parameters needed in the latent
    # dimensions (latent_dim) parameter.The latent variable list used in the model is reconstructed to be a
    # concatenation of both z and dz.
    if parameters["model_order"] == 2:
        # dz_slice = x_inputs[1].unsqueeze(0) if len(x_inputs[1].shape) == 1 else x_inputs[1]
        dz_slice = x_inputs[1]
        z_slice = torch.cat([z_slice, dz_slice], 1)  # concatenate.
        latent_dim = 2 * parameters["latent_dim"]
    
    library.append(z_slice[:, torch.arange(latent_dim)].t())

    # The parameters for z are now provided for each latent variable.
    # A list is constructed such that each parameter is included with a group of all possible combinations of each
    # other parameters.   
    if parameters["poly_order"] > 1:
        tmp = z_slice.unsqueeze(1) * z_slice.unsqueeze(2)    
        mask = torch.empty_like(tmp).bool().fill_(True)
        mask = mask.tril()
        library.append(tmp[mask].view(tmp.size(0), torch.prod(torch.tensor(tmp.size()[1:]))-1).t())
        
    if parameters["poly_order"] > 2:
        tmp = z_slice.unsqueeze(1) * z_slice.unsqueeze(2) * z_slice.unsqueeze(2)
        library.append(tmp.view(tmp.size(0), -1).t())

    if parameters["poly_order"] > 3:
        for i in range(latent_dim):
            for j in range(i, latent_dim):
                for k in range(j, latent_dim):
                    for p in range(k, latent_dim):
                        print(z_slice[:, i]
                        * z_slice[:, j]
                        * z_slice[:, k]
                        * z_slice[:, p])
                        library.append(
                            z_slice[:, i]
                            * z_slice[:, j]
                            * z_slice[:, k]
                            * z_slice[:, p]
                        )


    if parameters["poly_order"] > 4:
        for i in range(latent_dim):
            for j in range(i, latent_dim):
                for k in range(j, latent_dim):
                    for p in range(k, latent_dim):
                        for q in range(p, latent_dim):
                            library.append(
                                z_slice[:, i]
                                * z_slice[:, j]
                                * z_slice[:, k]
                                * z_slice[:, p]
                                * z_slice[:, q]
                            )

    library.append(torch.sin(z_slice).t())

    # this is the variable Theta.
    return torch.cat(library).float()


parameters = {"latent_dim" : 1, "model_order" : 2, "poly_order" : 3, "include_sine" : True}


for i in range(3, 10):
    x = torch.randn(1000, i, 1)
    ref = build_sindy_theta(x)
    out = my_build_sindy_theta(x)

    print((ref - out).abs().max())

# tensor(2.9802e-08)
# tensor(2.3283e-10)
# tensor(7.4506e-09)
# tensor(2.9802e-08)
# tensor(5.9605e-08)
# tensor(2.3842e-07)
# tensor(9.5367e-07)

As you can see, some unused values are calculated and masked afterwards, which could also kill the performance gains (and would use more memory), so you should definitely profile your use case and see if a speedup is visible.

Hi @ptrblck,
I didn’t think you’d be on the case on the weekends!!
I trimmed a lot of the fat on the below but I will revisit again with your hints here.

One more thing - I’m getting killed with my tensors suddenly turning up on the CPU after being sent to the GPU using .to(torch.device(“cuda”))

Do you have any ‘tricks for young players’? In particular, I wondered if defining torch.Tensor/torch.tensor instead of torch.tensor.cuda was an issue. Not a big deal for now but I’ve only got my code to run by peppering with ‘to’ statements and I can only imagine what kind of back and forth is happening in the background.

As I say, I will come back with results after looking at your speed ups but this is what I have for now.

Thanks again so much for taking the time with this.

Simon

    def build_sindy_theta(self, x_inputs):
        """
        This function creates the state for Theta given input latent variable(s). The function is implemented for first
        or second order systems. The second order system includes library terms for the derivatives of the latent
        variable. Note that in the case where a second order system is used, z here is actually a concatenation of z and
        dz to z,dz. It is assumed that z and dz are elements 0 and 1 on the first axis of the tensor x_inputs supplied
        in the first argument. This z and dz are the latent variables produced by the encoder part of the autoencoder
        prior to decoding.
        The parameters dictionary must include:
            * latent_dim   - the number of latent variables needed to describe the system.
            * poly_order   - the maximum number of arguments in the polynomial describing the system.
            * include_sine - Whether to include sine in the arguments used to describe the system.
            * model_order  - the order of the model needed to describe the system (1 or 2)
                             If a second order model is needed, then dx, the derivative of x_inputs will be used as well
                             as x_inputs.
        """
        # TODO: generalise to cover move case where latent variable is >1 (not sure it would work at present)
        # TODO: remove the nested iterations. Also replace 'if' with range(poly_order)
        # z_slice = x_inputs[0].unsqueeze(0) if len(
        #     x_inputs[0].shape
        # ) == 1 else x_inputs[0]

        # OLD:
        # z_slice= x_inputs[0]

        # x_inputs now 1000x3x1 this was 3 in old version, needs to be 1000x3 in new version
        # NEW
        z_slice = x_inputs[:, 0, :]
        # this is 1000x1 in new version and was 1 in old version.
        # OLD:
        # library = [torch.ones(z_slice.shape[0], device=self.target_device)]
        ##########
        # z_slice.shape[0] was 1 (z_slice.shape was 1)
        ##########
        # since z_slice.shape is now 1000x1, we need the library to have an extra dimension (so 1000x1)
        # This is because the library is based off z_slice and this is different for each sample.
        # NEW:
        library = torch.ones(z_slice.shape, device=self.target_device)
        ##########

        # OLD & NEW:
        latent_dim = self.params["latent_dim"]

        # In the case where a non-linear dynamic process is suspected, both z and dz will be modelled. In that case, a
        # set of parameters will be needed for both, so we double the number of parameters needed in the latent
        # dimensions (latent_dim) parameter.The latent variable list used in the model is reconstructed to be a
        # concatenation of both z and dz.
        if self.params["model_order"] == 2:
            # dz_slice = x_inputs[1].unsqueeze(0) if len(x_inputs[1].shape) == 1 else x_inputs[1]
            # OLD:
            # dz_slice = x_inputs[1]
            # NEW:
            dz_slice = x_inputs[:, 1, :]
            # this is 1000x1 in new version and was 1 in old version.

            # OLD & NEW:
            z_slice = torch.cat([z_slice, dz_slice], 1)  # concatenate.
            # z_slice - since z_slice and dz_slice are now 1000x1
            # we stack them on axis 2 to get 1000x2 ### SO NOTHING CHANGES FOR THIS LINE ###
        

            # OLD & NEW:
            latent_dim = 2 * self.params["latent_dim"]

        for i in range(latent_dim):
            library = torch.cat([library, z_slice[:, i].unsqueeze(1)], dim=1)
            # The unsqueeze here is to make sure that each axis in the tensor has a dimension.
            # to recap, z_slice is 1000x1 and library (at this point) is 1000x1.
            # this means the slice starts with a slice of 1s (constants then we do first and then second up to
            # however many latent variables we want to solve for). 

        # The parameters for z are now provided for each latent variable.
        # A list is constructed such that each parameter is included with a group of all possible combinations of each
        # other parameters.
        if self.params["poly_order"] > 1:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    library = torch.cat([library, torch.mul(z_slice[:, i], z_slice[:, j]).unsqueeze(1)], dim=1)

        if self.params["poly_order"] > 2:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    for k in range(j, latent_dim):
                        library = torch.cat([library, (z_slice[:, i]
                                                       * z_slice[:, j]
                                                       * z_slice[:, k]).unsqueeze(1)], dim=1)

        if self.params["poly_order"] > 3:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    for k in range(j, latent_dim):
                        for p in range(k, latent_dim):
                            library = torch.cat([library, (z_slice[:, i]
                                                           * z_slice[:, j]
                                                           * z_slice[:, k]
                                                           * z_slice[:, p]).unsqueeze(1)], dim=1)

        if self.params["poly_order"] > 4:
            for i in range(latent_dim):
                for j in range(i, latent_dim):
                    for k in range(j, latent_dim):
                        for p in range(k, latent_dim):
                            for q in range(p, latent_dim):
                                library = torch.cat([library, (z_slice[:, i]
                                                               * z_slice[:, j]
                                                               * z_slice[:, k]
                                                               * z_slice[:, p]
                                                               * z_slice[:, q]).unsqueeze(1)], dim=1)

        if self.params["include_sine"]:
            for i in range(latent_dim):
                library = torch.cat([library, (torch.sin(z_slice[:, i])).unsqueeze(1)], dim=1)

        return library

Sending the tensor to the GPU via tensor = tensor.to('cuda') is the right approach and make sure you are assigning the return value again since to() is not an inplace operation on tensors. Another common issue is if you are creating a temporary tensor e.g. in the forward pass using:

def forward(self, x):
    # x is on the GPU
    tmp = torch.randn(x.size(0))
    # tmp is on the CPU as the device of x wasn't used!
    tmp = torch.randn(x.size(0), device=x.device)
    # now tmp and x are on the same device

Thanks @ptrblck Yep - I was assuming in place assignment. All fixed up once I passed it back to the variable as you suggest. Using my new model seems to result in no reduction in loss now. I’ll have a go with your theta build above and see if I’ve stuffed something up with mine. After that, I’ll drop the batch size to 1 and see how the new model diverges from the old.
An opportunity to learn about debugging PyTorch awaits me.
Simon

Does this mean that my posted code snippet breaks the training now or is the “new model” a generally new architecture?

Hi @ptrblck,
Not at all!!! I moved from single samples training in a Jupyter notebook to a batches method in a python project. I’ve changed too much at once. My plan is to save my test data to a pickle (currently it’s pseudo-experiment data produced on the fly) then compare the old set-up with the new set-up (with a batch size of 1) using the same inputs and I’ll see if there are any obvious divergences between the models.
I will let you know how I go.
Simon

Hi @ptrblck,
Still working through this - I think your solution didn’t account for the extra dimension (at the front) which comes from using a batch rather than individual training elements. so that means if we had say torch.Tensor([3,4]) for one training element and we are now doing batches of a thousand, we have torch.Tensor([1000,3,4]). This means if we took a slice say obj[0], we now take a slice obj[:, 0].
Also, for some reason, I think your solution comes out with Theta transposed. It’s easy enough to fix up with a .T. I’m thinking that I’ll post a github link or something to the new version and the original. For posterity!
Thanks and regards,
Simon

This is strange, as my posted code snippet also compares your reference results to mine:

for i in range(3, 10):
    x = torch.randn(1000, i, 1)
    ref = build_sindy_theta(x)
    out = my_build_sindy_theta(x)

    print((ref - out).abs().max())

# tensor(2.9802e-08)
# tensor(2.3283e-10)
# tensor(7.4506e-09)
# tensor(2.9802e-08)
# tensor(5.9605e-08)
# tensor(2.3842e-07)
# tensor(9.5367e-07)

using the batched input (unless another dimension is missing).

Hi @ptrblck - It appears to be related to my batch size being 1 on the tests I’m running. What I find quite strange is that I get good agreement with the first forward pass of my old and new models but basically the new model just flatlines. broadly the error is calculated based on the performance of the autoencoder and that theta calculation. I think there is something wrong in there so I see a small improvement as the autoencoder improves but then nothing as the theta error comes to dominate and learning essentially stops. The big issue here is I’ve moved from a working individual sample based model in Jupyter Notebooks to a batch model written in a python project (how good is Pycharm???!!!).
What I should have done is replicate the functionality of the existing model first in Python then moved to a batch process flow. I"m doing that now so when I see what’s up, I’ll let you know. For what it’s worth, my money is on what will (in hindsight of course) be an obvious error in my linear algebra.
Thanks again for your help.
Simon