Building Autoencoder in Pytorch

Vipul Vaibhaw
3 min readNov 25, 2018

--

In this story, We will be building a simple convolutional autoencoder in pytorch with CIFAR-10 dataset.

Quoting Wikipedia “An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction.”

For building an autoencoder, three things are needed: an encoding function, a decoding function, and a distance function between the amount of information loss between the compressed representation of your data and the decompressed representation (i.e. a “loss” function).

Now to code an autoencoder in pytorch we need to have a Autoencoder class and have to inherit __init__ from parent class using super() .

We start writing our convolutional autoencoder by importing necessary pytorch modules.

import torchimport torchvision as tvimport torchvision.transforms as transformsimport torch.nn as nnimport torch.nn.functional as Ffrom torch.autograd import Variablefrom torchvision.utils import save_image

Now we are set to download CIFAR-10 dataset and apply our transformations to it. We apply two transformations to our dataset —

  1. ToTensor() — It converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
  2. Normalize() — It normalizes a tensor image with mean and standard deviation.

Basically after applying the transformations we get the values in the range of (-2,2).

# Loading and Transforming datatransform = transforms.Compose([transforms.ToTensor(),  transforms.Normalize((0.4914, 0.4822, 0.4466), (0.247,            0.243, 0.261))])trainTransform  = tv.transforms.Compose([tv.transforms.ToTensor(), tv.transforms.Normalize((0.4914, 0.4822, 0.4466), (0.247, 0.243, 0.261))])trainset = tv.datasets.CIFAR10(root='./data',  train=True,download=True, transform=transform)dataloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=False, num_workers=4)testset = tv.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

you can read more about transformations mentioned above here.

Now the next step is to write the Autoencoder class.

# Writing our modelclass Autoencoder(nn.Module):    def __init__(self):
super(Autoencoder,self).__init__()

self.encoder = nn.Sequential(
nn.Conv2d(3, 6, kernel_size=5),
nn.ReLU(True),
nn.Conv2d(6,16,kernel_size=5),
nn.ReLU(True))
self.decoder = nn.Sequential(
nn.ConvTranspose2d(16,6,kernel_size=5),
nn.ReLU(True),
nn.ConvTranspose2d(6,3,kernel_size=5),
nn.ReLU(True))
def forward(self,x):
x = self.encoder(x)
x = self.decoder(x)
return x

The convolutional encoder neural network has some Conv2d and ReLU activation function is being used.

Now we define some parameters —

#defining some paramsnum_epochs = 5 #you can go for more epochs, I am using a macbatch_size = 128

Then it is time to setup the model for training. We call the model and configure it to run on cpu. You can use cuda if you have a gpu.

We use Mean Squared Error as loss function. For optimizer, we use adam.

model = Autoencoder().cpu()distance = nn.MSELoss()optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-5)

Let’s start the training —

for epoch in range(num_epochs):
for data in dataloader:
img, _ = data
img = Variable(img).cpu()
# ===================forward=====================
output = model(img)
loss = distance(output, img)
# ===================backward====================
optimizer.zero_grad()
loss.backward()
optimizer.step()
# ===================log========================
print('epoch [{}/{}], loss:{:.4f}'.format(epoch+1, num_epochs, loss.data()))

This was a simple post to show how one can build autoencoder in pytorch.

However, if you want to include MaxPool2d() in your model make sure you set return_indices=True and then in decoder you can use MaxUnpool2d() layer.

Keep Learning and sharing knowledge. Follow me on github, stackoverflow, linkedin or twitter.

Edit —

Comments —

Choosing CIFAR for autoencoding example isn’t the best choice, since it provides no way to understand the progress or performance in a classifying based dataset

Response —

CIFAR is just used as a demo. I agree with the above comment.

--

--

Vipul Vaibhaw
Vipul Vaibhaw

Written by Vipul Vaibhaw

I am passionate about computer engineering. Building scalable distributed systems| Web3 | Data Engineering | Contact me — vaibhaw[dot]vipul[at]gmail[dot]com

Responses (3)