pytorch_intro

6.7 KiB

Raw Blame History

title	author
PyTorch Intro I: SSH, Jupyter and Cuda	Tom Weber

Preliminaries

Make sure we are only using our reserved GPUs.

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible

Training a Standard Vision Classifier

Bulding a Model with Sequential()

Let's do a standard image classification task.

import torch
import torch.nn as nn

Sequential works very similar to the Keras concept. A container wraps around individual layers in the order they are given.

net = nn.Sequential(nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
                    nn.ReLU(), # non-linearity
                    nn.MaxPool2d((2,2)), # pooling
                    nn.Conv2d(6, 16, 5), # 16 filters this time
                    nn.ReLU(), # non-linearity
                    nn.MaxPool2d((2,2)), # pooling
                    nn.Flatten(), # flatten feature maps
                    nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
                    nn.Linear(100, 10)
)

net = net.cuda() # put the model on the GPU

Creating dataloaders

For simplicity sake, I will just take a premade dataset that is supplied with torch. The dataset is part of the torchvision module, which we don't have yet.

!pip install torchvision

import torchvision

Datasets can easily created with custom data buy subclassing torch.nn.Dataset, see next jupypter notebook. (The datasets and preprocessing options used here are torchvision specific.)

transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                                            torchvision.transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=torchvision.transforms.ToTensor())
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=torchvision.transforms.ToTensor())

A dataloader takes a dataset and bunch of other arguments and provides convenient data access to feed to the network.

trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                          shuffle=True, num_workers=2)

testloader = torch.utils.data.DataLoader(testset, batch_size=32,
                                         shuffle=False, num_workers=2)

Inspect the model with tensorboard

Tensorboard, while originally from TensorFlow, also works with PyTorch pretty well.

!pip install tensorboard

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('runs') # initialize the writer with folder "./runs"
imgs, _ = next(iter(trainloader)) # get some input to trace the graph
writer.add_graph(net, imgs.cuda()) # trace the graph once and store it

Now we can start tensorboard in the same location where the notebook is located with tensorboard --logdir=runs and open it in our browser at localhost:6006

!tensorboard --logdir=runs --port=6007

Prepare training function

We still need a loss and an optimizer

import numpy as np # for later use
loss = nn.CrossEntropyLoss() # takes logits as predictions and int label
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer needs to be supplied with the parameters to optimize

Build a function that trains the model on the data for one epoch

def train(net, dataloader, optimizer, loss):
    epoch_loss = [] # save a running loss
    net.train() # tell the model that it's training time
    for img, lbl in dataloader:
        img, lbl = img.cuda(), lbl.cuda() # put data on GPU
        optimizer.zero_grad() # free the optimizer from previous gradients
        out = net(img) # compute image lbls
        batch_loss = loss(out, lbl) # compute loss
        batch_loss.backward() # compute gradients
        optimizer.step() # update weights
        epoch_loss.append(batch_loss.item()) # record the batch loss
    return np.mean(epoch_loss) # return the epoch loss

Train the model

Train the model for a couple of epochs and save checkpoints periodically

for epoch in range(5):
    epoch_loss = train(net, trainloader, optimizer, loss)
    print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
    writer.add_scalar("epoch loss", epoch_loss, epoch+1)
    if (epoch+1) % 5 == 0:
        torch.save(net.state_dict(), "../saved_models/net_{}_epochs.pth".format(epoch+1))

!tensorboard --logdir=runs --port=6007

Evaluate the Model

Since the images are small we can run the evaluation just fine on the CPU. The model has to be brought back to the CPU for that purpose.

Each model has .train() and .eval() flags that specify the behaviour of certain layers.

net = net.cpu() # bring the network back from the GPU
net.eval() # tell the network that it's testing time
correct = 0
total = 0
for img, lbl in testloader:
    out = net(img)
    logits, indices = torch.max(out, 1)
    correct += torch.sum(indices == lbl).item()
    total += len(lbl)
print("The model correctly classified ", correct/total*100, "% of the images.")

Train the model on multiple GPUs

Create the network again, but then generate an instance of it with nn.DataParallel.

net_parallel = nn.Sequential(
		     nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
                     nn.ReLU(), # non-linearity
                     nn.MaxPool2d(2,2), # pooling
                     nn.Conv2d(6, 16, 5), # 16 filters this time
                     nn.ReLU(), # non-linearity
                     nn.MaxPool2d(2,2), # pooling
                     nn.Flatten(),
                     nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
                     nn.Linear(100, 10)
)
net_parallel = torch.nn.DataParallel(net_parallel, device_ids=[0,1])
net_parallel = net_parallel.cuda() # put the model on the first GPU
optimizer_parallel = torch.optim.SGD(net_parallel.parameters(),
				     lr=0.001, momentum=0.9) # dont forget to inform the optimizer

Take it for a test drive. Keep your eyes peeled at a terminal with e.g. watch -d nvidia-smi. There will be no speed increase in this case as it is a relatively small model. On the contrary, the overhead of copying the model to the other GPUs will probably result in a net training time loss.

for epoch in range(10):
    epoch_loss = train(net_parallel, trainloader, optimizer_parallel, loss)
    print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)

6.7 KiB Raw Blame History