You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
198 lines
6.7 KiB
198 lines
6.7 KiB
---
|
|
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
|
|
author: Tom Weber
|
|
---
|
|
|
|
## Preliminaries
|
|
|
|
Make sure we are only using our reserved GPUs.
|
|
|
|
``` code
|
|
import os
|
|
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
|
|
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
|
|
```
|
|
|
|
## Training a Standard Vision Classifier
|
|
|
|
### Bulding a Model with Sequential()
|
|
|
|
Let's do a standard image classification task.
|
|
|
|
``` code
|
|
import torch
|
|
import torch.nn as nn
|
|
```
|
|
|
|
Sequential works very similar to the Keras concept. A container wraps around individual layers in the order they are given.
|
|
|
|
``` code
|
|
net = nn.Sequential(nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
|
|
nn.ReLU(), # non-linearity
|
|
nn.MaxPool2d((2,2)), # pooling
|
|
nn.Conv2d(6, 16, 5), # 16 filters this time
|
|
nn.ReLU(), # non-linearity
|
|
nn.MaxPool2d((2,2)), # pooling
|
|
nn.Flatten(), # flatten feature maps
|
|
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
|
|
nn.Linear(100, 10)
|
|
)
|
|
|
|
net = net.cuda() # put the model on the GPU
|
|
```
|
|
|
|
### Creating dataloaders
|
|
|
|
For simplicity sake, I will just take a premade dataset that is supplied with torch.
|
|
The dataset is part of the torchvision module, which we don't have yet.
|
|
|
|
``` code
|
|
!pip install torchvision
|
|
```
|
|
|
|
``` code
|
|
import torchvision
|
|
```
|
|
|
|
Datasets can easily created with custom data buy subclassing torch.nn.Dataset, see next jupypter notebook.
|
|
(The datasets and preprocessing options used here are torchvision specific.)
|
|
|
|
``` code
|
|
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
|
|
torchvision.transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
|
|
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
|
|
download=True, transform=torchvision.transforms.ToTensor())
|
|
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
|
|
download=True, transform=torchvision.transforms.ToTensor())
|
|
```
|
|
|
|
|
|
A dataloader takes a dataset and bunch of other arguments and provides convenient data access to feed to the network.
|
|
|
|
``` code
|
|
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
|
|
shuffle=True, num_workers=2)
|
|
|
|
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
|
|
shuffle=False, num_workers=2)
|
|
```
|
|
|
|
### Inspect the model with tensorboard
|
|
|
|
Tensorboard, while originally from TensorFlow, also works with PyTorch pretty well.
|
|
|
|
``` code
|
|
!pip install tensorboard
|
|
```
|
|
|
|
``` code
|
|
from torch.utils.tensorboard import SummaryWriter
|
|
|
|
writer = SummaryWriter('runs') # initialize the writer with folder "./runs"
|
|
imgs, _ = next(iter(trainloader)) # get some input to trace the graph
|
|
writer.add_graph(net, imgs.cuda()) # trace the graph once and store it
|
|
```
|
|
|
|
Now we can start tensorboard in the same location where the notebook is located with `tensorboard --logdir=runs`
|
|
and open it in our browser at [localhost:6006](localhost:6006)
|
|
|
|
``` code
|
|
!tensorboard --logdir=runs --port=6007
|
|
```
|
|
|
|
|
|
### Prepare training function
|
|
|
|
We still need a loss and an optimizer
|
|
|
|
``` code
|
|
import numpy as np # for later use
|
|
loss = nn.CrossEntropyLoss() # takes logits as predictions and int label
|
|
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer needs to be supplied with the parameters to optimize
|
|
```
|
|
|
|
Build a function that trains the model on the data for one epoch
|
|
|
|
``` code
|
|
def train(net, dataloader, optimizer, loss):
|
|
epoch_loss = [] # save a running loss
|
|
net.train() # tell the model that it's training time
|
|
for img, lbl in dataloader:
|
|
img, lbl = img.cuda(), lbl.cuda() # put data on GPU
|
|
optimizer.zero_grad() # free the optimizer from previous gradients
|
|
out = net(img) # compute image lbls
|
|
batch_loss = loss(out, lbl) # compute loss
|
|
batch_loss.backward() # compute gradients
|
|
optimizer.step() # update weights
|
|
epoch_loss.append(batch_loss.item()) # record the batch loss
|
|
return np.mean(epoch_loss) # return the epoch loss
|
|
```
|
|
|
|
|
|
### Train the model
|
|
|
|
Train the model for a couple of epochs and save checkpoints periodically
|
|
|
|
``` code
|
|
for epoch in range(5):
|
|
epoch_loss = train(net, trainloader, optimizer, loss)
|
|
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
|
|
writer.add_scalar("epoch loss", epoch_loss, epoch+1)
|
|
if (epoch+1) % 5 == 0:
|
|
torch.save(net.state_dict(), "../saved_models/net_{}_epochs.pth".format(epoch+1))
|
|
```
|
|
|
|
``` code
|
|
!tensorboard --logdir=runs --port=6007
|
|
```
|
|
|
|
|
|
### Evaluate the Model
|
|
|
|
Since the images are small we can run the evaluation just fine on the CPU. The model has to be brought back to the CPU for that purpose.
|
|
|
|
Each model has .train() and .eval() flags that specify the behaviour of certain layers.
|
|
|
|
``` code
|
|
net = net.cpu() # bring the network back from the GPU
|
|
net.eval() # tell the network that it's testing time
|
|
correct = 0
|
|
total = 0
|
|
for img, lbl in testloader:
|
|
out = net(img)
|
|
logits, indices = torch.max(out, 1)
|
|
correct += torch.sum(indices == lbl).item()
|
|
total += len(lbl)
|
|
print("The model correctly classified ", correct/total*100, "% of the images.")
|
|
```
|
|
|
|
### Train the model on multiple GPUs
|
|
|
|
Create the network again, but then generate an instance of it with nn.DataParallel.
|
|
|
|
``` code
|
|
net_parallel = nn.Sequential(
|
|
nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
|
|
nn.ReLU(), # non-linearity
|
|
nn.MaxPool2d(2,2), # pooling
|
|
nn.Conv2d(6, 16, 5), # 16 filters this time
|
|
nn.ReLU(), # non-linearity
|
|
nn.MaxPool2d(2,2), # pooling
|
|
nn.Flatten(),
|
|
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
|
|
nn.Linear(100, 10)
|
|
)
|
|
net_parallel = torch.nn.DataParallel(net_parallel, device_ids=[0,1])
|
|
net_parallel = net_parallel.cuda() # put the model on the first GPU
|
|
optimizer_parallel = torch.optim.SGD(net_parallel.parameters(),
|
|
lr=0.001, momentum=0.9) # dont forget to inform the optimizer
|
|
```
|
|
|
|
Take it for a test drive. Keep your eyes peeled at a terminal with e.g. `watch -d nvidia-smi`. There will be no speed increase in this case as it is a relatively small model. On the contrary, the overhead of copying the model to the other GPUs will probably result in a net training time loss.
|
|
|
|
``` code
|
|
for epoch in range(10):
|
|
epoch_loss = train(net_parallel, trainloader, optimizer_parallel, loss)
|
|
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
|
|
```
|