--- title: "PyTorch Intro I: SSH, Jupyter and Cuda" author: Tom Weber --- ## Preliminaries Make sure we are only using our reserved GPUs. ``` code import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible ``` ## Training a Standard Vision Classifier ### Bulding a Model with Sequential() Let's do a standard image classification task. ``` code import torch import torch.nn as nn ``` Sequential works very similar to the Keras concept. A container wraps around individual layers in the order they are given. ``` code net = nn.Sequential(nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5 nn.ReLU(), # non-linearity nn.MaxPool2d((2,2)), # pooling nn.Conv2d(6, 16, 5), # 16 filters this time nn.ReLU(), # non-linearity nn.MaxPool2d((2,2)), # pooling nn.Flatten(), # flatten feature maps nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons nn.Linear(100, 10) ) net = net.cuda() # put the model on the GPU ``` ### Creating dataloaders For simplicity sake, I will just take a premade dataset that is supplied with torch. The dataset is part of the torchvision module, which we don't have yet. ``` code !pip install torchvision ``` ``` code import torchvision ``` Datasets can easily created with custom data buy subclassing torch.nn.Dataset, see next jupypter notebook. (The datasets and preprocessing options used here are torchvision specific.) ``` code transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(), torchvision.transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))]) trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor()) testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=torchvision.transforms.ToTensor()) ``` A dataloader takes a dataset and bunch of other arguments and provides convenient data access to feed to the network. ``` code trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2) testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=2) ``` ### Inspect the model with tensorboard Tensorboard, while originally from TensorFlow, also works with PyTorch pretty well. ``` code !pip install tensorboard ``` ``` code from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter('runs') # initialize the writer with folder "./runs" imgs, _ = next(iter(trainloader)) # get some input to trace the graph writer.add_graph(net, imgs.cuda()) # trace the graph once and store it ``` Now we can start tensorboard in the same location where the notebook is located with `tensorboard --logdir=runs` and open it in our browser at [localhost:6006](localhost:6006) ``` code !tensorboard --logdir=runs --port=6007 ``` ### Prepare training function We still need a loss and an optimizer ``` code import numpy as np # for later use loss = nn.CrossEntropyLoss() # takes logits as predictions and int label optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer needs to be supplied with the parameters to optimize ``` Build a function that trains the model on the data for one epoch ``` code def train(net, dataloader, optimizer, loss): epoch_loss = [] # save a running loss net.train() # tell the model that it's training time for img, lbl in dataloader: img, lbl = img.cuda(), lbl.cuda() # put data on GPU optimizer.zero_grad() # free the optimizer from previous gradients out = net(img) # compute image lbls batch_loss = loss(out, lbl) # compute loss batch_loss.backward() # compute gradients optimizer.step() # update weights epoch_loss.append(batch_loss.item()) # record the batch loss return np.mean(epoch_loss) # return the epoch loss ``` ### Train the model Train the model for a couple of epochs and save checkpoints periodically ``` code for epoch in range(5): epoch_loss = train(net, trainloader, optimizer, loss) print("Epoch ",epoch+1," finished, Loss: ", epoch_loss) writer.add_scalar("epoch loss", epoch_loss, epoch+1) if (epoch+1) % 5 == 0: torch.save(net.state_dict(), "../saved_models/net_{}_epochs.pth".format(epoch+1)) ``` ``` code !tensorboard --logdir=runs --port=6007 ``` ### Evaluate the Model Since the images are small we can run the evaluation just fine on the CPU. The model has to be brought back to the CPU for that purpose. Each model has .train() and .eval() flags that specify the behaviour of certain layers. ``` code net = net.cpu() # bring the network back from the GPU net.eval() # tell the network that it's testing time correct = 0 total = 0 for img, lbl in testloader: out = net(img) logits, indices = torch.max(out, 1) correct += torch.sum(indices == lbl).item() total += len(lbl) print("The model correctly classified ", correct/total*100, "% of the images.") ``` ### Train the model on multiple GPUs Create the network again, but then generate an instance of it with nn.DataParallel. ``` code net_parallel = nn.Sequential( nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5 nn.ReLU(), # non-linearity nn.MaxPool2d(2,2), # pooling nn.Conv2d(6, 16, 5), # 16 filters this time nn.ReLU(), # non-linearity nn.MaxPool2d(2,2), # pooling nn.Flatten(), nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons nn.Linear(100, 10) ) net_parallel = torch.nn.DataParallel(net_parallel, device_ids=[0,1]) net_parallel = net_parallel.cuda() # put the model on the first GPU optimizer_parallel = torch.optim.SGD(net_parallel.parameters(), lr=0.001, momentum=0.9) # dont forget to inform the optimizer ``` Take it for a test drive. Keep your eyes peeled at a terminal with e.g. `watch -d nvidia-smi`. There will be no speed increase in this case as it is a relatively small model. On the contrary, the overhead of copying the model to the other GPUs will probably result in a net training time loss. ``` code for epoch in range(10): epoch_loss = train(net_parallel, trainloader, optimizer_parallel, loss) print("Epoch ",epoch+1," finished, Loss: ", epoch_loss) ```