Improved WGAN implementation slower than tensorflow

skyser2003 · February 12, 2019, 3:26pm

Hello, I’ve implemented Improved WGAN myself looking at code from https://github.com/caogang/wgan-gp,
but it is about two times slower than the one implemented in tensorflow, which is also my code.

Both codes are implemented to use DCGAN + Improved WGAN,
and it takes about 10 seconds in tensorflow and 20 seconds in pytorch for 100 training iterations.

I’ve tested a few things, and by changing the loss function to LSGAN,
training time reduced to 9 seconds for both versions,
so I’m guessing the problem is on WGAN part.

Here’s my code, and I can’t find the problem,
so I would appreciate if anyone could help me find the problem.

                lr = 0.0002

                disc = Discriminator()
                gen = Generator()

                opt_gen = optim.Adam(gen.parameters(), lr)
                opt_disc = optim.Adam(disc.parameters(), lr)

                for _ in range(3):
                    disc.zero_grad()

                    # Real & fake x
                    batch_data = torch.tensor(dataloader.get_batch(num_batch), dtype=torch.float32)

                    z_val, cat_input = generate_z_val(num_batch, num_z, num_cat)
                    x_gen = gen(z_val)

                    # Disc
                    disc_real, _ = disc(batch_data)
                    disc_fake, cat_output = disc(x_gen)

                    disc_real = disc_real.mean()
                    disc_fake = disc_fake.mean()

                    # Improved WGAN
                    eps = torch.rand((num_batch, 1)).expand(batch_data.size())
                    scale_fn = 10

                    x_pn = eps * batch_data + (1 - eps) * x_gen
                    disc_pn, _ = disc(x_pn)

                    grad = \
                        autograd.grad(disc_pn, x_pn, grad_outputs=wgan_grad_output, create_graph=True,
                                      retain_graph=True)[0]
                    grad = grad.norm(dim=1)

                    ddx = scale_fn * (grad - 1) ** 2
                    ddx = ddx.mean()

                    loss_real = disc_real - disc_fake + ddx

                    loss_real.backward()

                    opt_disc.step()

                # Generator train
                for param in disc.parameters():
                    param.requires_grad_(False)

                gen.zero_grad()

                z_val, cat_input = generate_z_val(num_batch, num_z, num_cat)
                x_gen = gen(z_val)
                disc_fake, cat_output = disc(x_gen)

                disc_fake = disc_fake.mean()

                loss_fake = disc_fake

                loss_fake.backward()

                opt_gen.step()

If you would like to see the full code,

Pytorch version:

github.com

skyser2003/ML2/blob/master/python/pytorch/dcgan.py

import math
import random
import time
import itertools

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim

from tensorboardX import SummaryWriter

import util.helper as helper
from util.plot_utils import Plot_Reproduce_Performance
from cq.cq_data import CqData, CqDataType


class DCGan:
    def __init__(self):

This file has been truncated. show original

Tensorflow versio:

github.com

skyser2003/ML2/blob/master/python/cq/cq_dcgan.py

import itertools
import time

import tensorflow as tf
import tensorflow.contrib as tc
from tensorflow.python.client import timeline

import util.helper as helper
import util.plot_utils as plot_utils
from cq.cq_data import CqData, CqDataType
from cq.cq_gan import CqGAN, BaseGanComponent

# For FACE images
scale = 3


class CqInfoGAN(BaseGanComponent):
    def __init__(self, num_z: int, num_latent_cat: int):
        self.nchw_bool = True
        self.nchw_str = "NCHW"

This file has been truncated. show original

codes are here.

Thanks.