You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
pytorch_intro/markdown/1_Simple_Vision_Classificat...

198 lines
6.7 KiB

---
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
author: Tom Weber
---
## Preliminaries
Make sure we are only using our reserved GPUs.
``` code
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
```
## Training a Standard Vision Classifier
### Bulding a Model with Sequential()
Let's do a standard image classification task.
``` code
import torch
import torch.nn as nn
```
Sequential works very similar to the Keras concept. A container wraps around individual layers in the order they are given.
``` code
net = nn.Sequential(nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
nn.ReLU(), # non-linearity
nn.MaxPool2d((2,2)), # pooling
nn.Conv2d(6, 16, 5), # 16 filters this time
nn.ReLU(), # non-linearity
nn.MaxPool2d((2,2)), # pooling
nn.Flatten(), # flatten feature maps
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
nn.Linear(100, 10)
)
net = net.cuda() # put the model on the GPU
```
### Creating dataloaders
For simplicity sake, I will just take a premade dataset that is supplied with torch.
The dataset is part of the torchvision module, which we don't have yet.
``` code
!pip install torchvision
```
``` code
import torchvision
```
Datasets can easily created with custom data buy subclassing torch.nn.Dataset, see next jupypter notebook.
(The datasets and preprocessing options used here are torchvision specific.)
``` code
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=torchvision.transforms.ToTensor())
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=torchvision.transforms.ToTensor())
```
A dataloader takes a dataset and bunch of other arguments and provides convenient data access to feed to the network.
``` code
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
shuffle=False, num_workers=2)
```
### Inspect the model with tensorboard
Tensorboard, while originally from TensorFlow, also works with PyTorch pretty well.
``` code
!pip install tensorboard
```
``` code
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs') # initialize the writer with folder "./runs"
imgs, _ = next(iter(trainloader)) # get some input to trace the graph
writer.add_graph(net, imgs.cuda()) # trace the graph once and store it
```
Now we can start tensorboard in the same location where the notebook is located with `tensorboard --logdir=runs`
and open it in our browser at [localhost:6006](localhost:6006)
``` code
!tensorboard --logdir=runs --port=6007
```
### Prepare training function
We still need a loss and an optimizer
``` code
import numpy as np # for later use
loss = nn.CrossEntropyLoss() # takes logits as predictions and int label
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer needs to be supplied with the parameters to optimize
```
Build a function that trains the model on the data for one epoch
``` code
def train(net, dataloader, optimizer, loss):
epoch_loss = [] # save a running loss
net.train() # tell the model that it's training time
for img, lbl in dataloader:
img, lbl = img.cuda(), lbl.cuda() # put data on GPU
optimizer.zero_grad() # free the optimizer from previous gradients
out = net(img) # compute image lbls
batch_loss = loss(out, lbl) # compute loss
batch_loss.backward() # compute gradients
optimizer.step() # update weights
epoch_loss.append(batch_loss.item()) # record the batch loss
return np.mean(epoch_loss) # return the epoch loss
```
### Train the model
Train the model for a couple of epochs and save checkpoints periodically
``` code
for epoch in range(5):
epoch_loss = train(net, trainloader, optimizer, loss)
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
writer.add_scalar("epoch loss", epoch_loss, epoch+1)
if (epoch+1) % 5 == 0:
torch.save(net.state_dict(), "../saved_models/net_{}_epochs.pth".format(epoch+1))
```
``` code
!tensorboard --logdir=runs --port=6007
```
### Evaluate the Model
Since the images are small we can run the evaluation just fine on the CPU. The model has to be brought back to the CPU for that purpose.
Each model has .train() and .eval() flags that specify the behaviour of certain layers.
``` code
net = net.cpu() # bring the network back from the GPU
net.eval() # tell the network that it's testing time
correct = 0
total = 0
for img, lbl in testloader:
out = net(img)
logits, indices = torch.max(out, 1)
correct += torch.sum(indices == lbl).item()
total += len(lbl)
print("The model correctly classified ", correct/total*100, "% of the images.")
```
### Train the model on multiple GPUs
Create the network again, but then generate an instance of it with nn.DataParallel.
``` code
net_parallel = nn.Sequential(
nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
nn.ReLU(), # non-linearity
nn.MaxPool2d(2,2), # pooling
nn.Conv2d(6, 16, 5), # 16 filters this time
nn.ReLU(), # non-linearity
nn.MaxPool2d(2,2), # pooling
nn.Flatten(),
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
nn.Linear(100, 10)
)
net_parallel = torch.nn.DataParallel(net_parallel, device_ids=[0,1])
net_parallel = net_parallel.cuda() # put the model on the first GPU
optimizer_parallel = torch.optim.SGD(net_parallel.parameters(),
lr=0.001, momentum=0.9) # dont forget to inform the optimizer
```
Take it for a test drive. Keep your eyes peeled at a terminal with e.g. `watch -d nvidia-smi`. There will be no speed increase in this case as it is a relatively small model. On the contrary, the overhead of copying the model to the other GPUs will probably result in a net training time loss.
``` code
for epoch in range(10):
epoch_loss = train(net_parallel, trainloader, optimizer_parallel, loss)
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
```