initial commit

master
Tom Weber 3 years ago
commit f7dc3ea5ce

8
.gitignore vendored

@ -0,0 +1,8 @@
venv
.venv
.ipynb_checkpoints
*.ipynb
*.html
*.pdf
saved_models
*.pth

@ -0,0 +1,38 @@
# Makefile for creating notebooks, pdfs, html out of the tracked markdown files
.PHONY: clean
venv:
python3 -m venv .venv
./.venv/bin/python -m pip install wheel jupyter
notebooks: createnotebookdir
$(foreach file, $(wildcard markdown/*), pandoc $(basename $(file)).md -o notebooks/$(notdir $(basename $(file))).ipynb ;)
pdf: createpdfdir notebooks
$(foreach file, $(wildcard notebooks/*), jupyter nbconvert --output-dir='./pdf/' --to pdf $(basename $(file)).ipynb ;)
html: createhtmldir notebooks
$(foreach file, $(wildcard notebooks/*), jupyter nbconvert --output-dir='./html/' --to html $(basename $(file)).ipynb ;)
# helper functions to create folders
createnotebookdir:
mkdir -p notebooks
createhtmldir:
mkdir -p html
createpdfdir:
mkdir -p pdf
# clean up helper functions for the individual formats
cleannotebooks:
rm -rf notebooks
cleanpdf:
rm -rf pdf
cleanhtml:
rm -rf html
# clean directory from files and folders that are not tracked
clean: cleannotebooks cleanpdf cleanhtml

@ -0,0 +1,40 @@
# Repository for Introductory Information on PyTorch
* author(s): Tom Weber
* date: June 2020
Due to the nature of jupyter notebooks, they don't integrate nicely with git. It is not always straight forward to see what was changed in the commit history. Therefore, I opted to write the notebooks in markdown and then compile them with pandoc into jupyter notebooks.
Documentation for pandoc can be found [here](https://pandoc.org/MANUAL.html#creating-jupyter-notebooks-with-pandoc).
I have included a install script for pandoc on debian-based systems, as they tend to use outdated packages.
## Dependencies
* **pandoc >2.6**: to compile markdown files into jupyter notebooks
* **python3**: to install the environment and run the jupyter notebooks
* **latex**: to convert notebooks into pdf
* setting up a python requirement is advised, see Erik's introduction
* **python packages**:
- jupyter
- torch
- torchvision
## Getting Started
1) Install the virtual environment and jupyter with `make venv`
2) Activate the environment `source .venv/bin/activate`
3) Create the notebooks `make notebooks`
3) Open the first notebook `jupyter notebook notebooks/0_Intro_Jupyter_Cuda.ipynb`
## Make commands
The creation of notebooks and the corresponding html and pdf files is handled by Make.
* `make notebooks`: creates jupyter notebooks in ./notebooks *(needs pandoc)*
* `make pdf`: creates pdf files of the notebooks in ./pdf *(needs jupyter and latex)*
* `make html`: creates html files of the notebooks in ./html *(needs jupyter)*
* `make venv`: create the environment and install jupyter *(needs python3)*

@ -0,0 +1,6 @@
#!/bin/sh
sudo apt remove pandoc pandoc-citeproc
wget https://github.com/jgm/pandoc/releases/download/2.9.2.1/pandoc-2.9.2.1-1-amd64.deb
sudo dpkg -i pandoc-2.9.2.1-1-amd64.deb
rm pandoc-2.9.2.1-1-amd64.deb

@ -0,0 +1,203 @@
---
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
author: Tom Weber
---
## Connecting to GWS
The ssh command can be equipped with additional statements to allow port forwarding.
Thereby, one can use jupyter notebooks on remote servers. *(not recommended for actual projects!)*
E.g. `ssh -L 8000:localhost:8888 tomweber@REMOTESERVER`
This enables our machine to listen on the remote port 8888 (jupyter notebook port) and foward it to our local port 8000.
## Setting up the environment
This notebook assumes that it is run in a virtual environment. Using environments is encouraged in order to avoid package conflicts.
**Quick setup**
Close the jupyter server and execute the following shell commands, one after the other:
```shell
python3 -m venv .venv # install the environment
source .venv/bin/activate # activate the environment
pip install jupyter # install jupyter into the environment
```
## Installing PyTorch
In order to see if torch is installed, check the output of the next cell
(prepending an exclamation mark executes shell code inside the jupyter notebook).
``` code
!pip list --format columns | grep torch
```
If there is no output, it is _not_ installed. Therefore we want to install it.
Install PyTorch with:
``` code
!pip install torch
```
As long as the environment is activated (and we are hopefully running the notebook from there), pip will install the package and dependencies into the appropriate venv folder.
Global packages are masked and won't be conflicting with our local packages.
## Figuring out CUDA with PyTorch
``` code
import torch
```
Let's begin by checking if CUDA works with PyTorch at all:
``` code
if torch.cuda.is_available():
print("CUDA available")
else:
print("Could not find CUDA, possibly encountering problems with current CUDA version")
```
In contrast to most local machines, the servers are usually equipped with multiple GPUs. Let's see how many there are:
``` code
print("GPUs available: ", torch.cuda.device_count()) # show number of cuda devices
```
### Computing with tensors on the GPU
In PyTorch, tensors are always associated with a device on which they are running, i.e. CPU or GPU/CUDA. Operations can be arbitrarily executed on tensors no matter which device they are on.
By default, tensors are created on the "CPU" device.
``` code
x = torch.ones((3,3)) # create 3x3 tensor consisting of ones
print(x.device) # show associated device of x
```
In order to run computations on the GPU, the associated tensors must be explicitly copied there.
``` code
x = x.cuda() # copy tensor to cuda device
print(x.device) # show associated device of x
```
Let's look at an example:
``` code
cpu1 = torch.rand((400,400))
# create 400x400 tensor consisting of random (normal) numbers
cpu2 = torch.rand((400,400))
%timeit torch.matmul(cpu1,cpu2) # time the execution of matrix multiplication
```
``` code
gpu1 = torch.rand((400,400)).cuda()
# create 400x400 tensor consisting of random (normal) numbers and copy to CUDA device
gpu2 = torch.rand((400,400)).cuda()
%timeit torch.matmul(gpu1,gpu2) # time the execution of matrix multiplication
```
### Single GPU use case
By default, PyTorch will always use the "first" GPU (i.e. lowest device number) as the current device.
CAUTION: CUDA numbering is not necessarily the same as it is shown in `nvidia-smi`!
`nvidia-smi` orders by PCI-Bus.
We can check the selected device number with:
``` code
print("The currently selected GPU is number:", torch.cuda.current_device(),
", it's a ", torch.cuda.get_device_name(device=None))
```
One should always cross-reference if that is actually the device one wants to use. Which is easy in the case when there are different GPUs on the server. However, in our case, there are two GPUs with the same name.
``` code
!nvidia-smi -L # show the GPUs installed on the machine
```
If one wants to change the current device, there are several possible ways to achieve this.
1.) Best practice is to explicitly whitelist the GPU your code can see, effectively masking the rest. This will avoid any accidental overlap with other GPUs that you did not book.
Note: This way we can also bring consistency in the ordering by telling CUDA to order the GPUs by pci bus id.
Due to how jupyter notebooks work, executing the cell will not have any effect, because we already imported torch and intialized cuda.
Therefore, restart the the kernel and execute the cell again.
``` code
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
```
Now let's check how many devices we can see:
``` code
import torch
print("GPUs available: ", torch.cuda.device_count())
for device in range(torch.cuda.device_count()):
print("Device",device, ":", torch.cuda.get_device_name(device=device))
```
2.) One can set the cuda device manually.
``` code
torch.cuda.set_device(1) # make cuda device nr. 1 the current device
print(torch.cuda.get_device_name(device=None))
```
3.) But better practice would be to embed your code into a cuda device context
``` code
with torch.cuda.device(0): # context manager for specific cuda device
# your code here
print(torch.cuda.get_device_name(device=None))
```
Alternatively, on can also copy the tensors to a specific device:
``` code
x = torch.ones((3,3))
x_on_1 = x.to("cuda:0")
x_on_2 = x.to("cuda:1")
print(x_on_1.device)
print(x_on_2.device)
```
Often times, in various tutorials on the internet, you can find the following:
``` code
torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)
print(x.device)
```
This way on can create CUDA agnostic code, that works on both, machines with and without GPU. However, only use this if you have made your reserved GPU explicitly visible and hid the rest. Otherwise this will automatically select GPU 0 as your CUDA device.
### Parallelize on multiple GPUs
Parallellizing training on multiple GPUs is in most cases a one-liner.
PyTorch comes with torch.nn.Parallel that makes it easy to split batches across GPUs.
Essentially, the model gets copied to each GPU and receives part of the minibatch to process.
``` code
net = torch.nn.DataParallel(net, device_ids=[0,1])
```

@ -0,0 +1,197 @@
---
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
author: Tom Weber
---
## Preliminaries
Make sure we are only using our reserved GPUs.
``` code
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
```
## Training a Standard Vision Classifier
### Bulding a Model with Sequential()
Let's do a standard image classification task.
``` code
import torch
import torch.nn as nn
```
Sequential works very similar to the Keras concept. A container wraps around individual layers in the order they are given.
``` code
net = nn.Sequential(nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
nn.ReLU(), # non-linearity
nn.MaxPool2d((2,2)), # pooling
nn.Conv2d(6, 16, 5), # 16 filters this time
nn.ReLU(), # non-linearity
nn.MaxPool2d((2,2)), # pooling
nn.Flatten(), # flatten feature maps
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
nn.Linear(100, 10)
)
net = net.cuda() # put the model on the GPU
```
### Creating dataloaders
For simplicity sake, I will just take a premade dataset that is supplied with torch.
The dataset is part of the torchvision module, which we don't have yet.
``` code
!pip install torchvision
```
``` code
import torchvision
```
Datasets can easily created with custom data buy subclassing torch.nn.Dataset, see next jupypter notebook.
(The datasets and preprocessing options used here are torchvision specific.)
``` code
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=torchvision.transforms.ToTensor())
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=torchvision.transforms.ToTensor())
```
A dataloader takes a dataset and bunch of other arguments and provides convenient data access to feed to the network.
``` code
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
shuffle=False, num_workers=2)
```
### Inspect the model with tensorboard
Tensorboard, while originally from TensorFlow, also works with PyTorch pretty well.
``` code
!pip install tensorboard
```
``` code
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs') # initialize the writer with folder "./runs"
imgs, _ = next(iter(trainloader)) # get some input to trace the graph
writer.add_graph(net, imgs.cuda()) # trace the graph once and store it
```
Now we can start tensorboard in the same location where the notebook is located with `tensorboard --logdir=runs`
and open it in our browser at [localhost:6006](localhost:6006)
``` code
!tensorboard --logdir=runs --port=6007
```
### Prepare training function
We still need a loss and an optimizer
``` code
import numpy as np # for later use
loss = nn.CrossEntropyLoss() # takes logits as predictions and int label
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer needs to be supplied with the parameters to optimize
```
Build a function that trains the model on the data for one epoch
``` code
def train(net, dataloader, optimizer, loss):
epoch_loss = [] # save a running loss
net.train() # tell the model that it's training time
for img, lbl in dataloader:
img, lbl = img.cuda(), lbl.cuda() # put data on GPU
optimizer.zero_grad() # free the optimizer from previous gradients
out = net(img) # compute image lbls
batch_loss = loss(out, lbl) # compute loss
batch_loss.backward() # compute gradients
optimizer.step() # update weights
epoch_loss.append(batch_loss.item()) # record the batch loss
return np.mean(epoch_loss) # return the epoch loss
```
### Train the model
Train the model for a couple of epochs and save checkpoints periodically
``` code
for epoch in range(5):
epoch_loss = train(net, trainloader, optimizer, loss)
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
writer.add_scalar("epoch loss", epoch_loss, epoch+1)
if (epoch+1) % 5 == 0:
torch.save(net.state_dict(), "../saved_models/net_{}_epochs.pth".format(epoch+1))
```
``` code
!tensorboard --logdir=runs --port=6007
```
### Evaluate the Model
Since the images are small we can run the evaluation just fine on the CPU. The model has to be brought back to the CPU for that purpose.
Each model has .train() and .eval() flags that specify the behaviour of certain layers.
``` code
net = net.cpu() # bring the network back from the GPU
net.eval() # tell the network that it's testing time
correct = 0
total = 0
for img, lbl in testloader:
out = net(img)
logits, indices = torch.max(out, 1)
correct += torch.sum(indices == lbl).item()
total += len(lbl)
print("The model correctly classified ", correct/total*100, "% of the images.")
```
### Train the model on multiple GPUs
Create the network again, but then generate an instance of it with nn.DataParallel.
``` code
net_parallel = nn.Sequential(
nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
nn.ReLU(), # non-linearity
nn.MaxPool2d(2,2), # pooling
nn.Conv2d(6, 16, 5), # 16 filters this time
nn.ReLU(), # non-linearity
nn.MaxPool2d(2,2), # pooling
nn.Flatten(),
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
nn.Linear(100, 10)
)
net_parallel = torch.nn.DataParallel(net_parallel, device_ids=[0,1])
net_parallel = net_parallel.cuda() # put the model on the first GPU
optimizer_parallel = torch.optim.SGD(net_parallel.parameters(),
lr=0.001, momentum=0.9) # dont forget to inform the optimizer
```
Take it for a test drive. Keep your eyes peeled at a terminal with e.g. `watch -d nvidia-smi`. There will be no speed increase in this case as it is a relatively small model. On the contrary, the overhead of copying the model to the other GPUs will probably result in a net training time loss.
``` code
for epoch in range(10):
epoch_loss = train(net_parallel, trainloader, optimizer_parallel, loss)
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
```

@ -0,0 +1,186 @@
---
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
author: Tom Weber
---
## Preliminaries
Make sure we are only using our reserved GPUs.
``` code
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
```
## Using Torch Modules and Datasets
This part of the PyTorch introduction will focus on creating custom torch modules and datasets, while applying those concepts to a fun character-level text generation task.
### Preparation
``` code
import torch
import numpy as np
from urllib.request import urlopen # for importing the data
```
Let us borrow a nice text dataset from TensorFlow.
``` code
text_source = "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt"
text = urlopen(text_source).read().decode(encoding="utf-8")
```
Do some general NLP preprocessing.
``` code
def preprocess(text):
alphabet = sorted(set(text))
letter_to_int = {let: ind for ind, let in enumerate(alphabet)}
int_to_letter = {ind: let for ind, let in enumerate(alphabet)}
letter_ints = [letter_to_int[letter] for letter in text]
alphabet_size = len(alphabet)
return int_to_letter, letter_to_int, alphabet_size, letter_ints
```
Now we can transform our text into a sequence of integers, where each integer represents are character.
``` code
int_to_letter, letter_to_int, alphabet_size, letter_ints = preprocess(text)
print("Alphabet size:", alphabet_size)
print("Length of letter sequence:", len(text))
```
## Custom Datasets
Previously we imported a pre-made dataset and created a dataloader. This time, we want to create our own dataset that can be used to construct a dataloader.
A custom dataset needs to at least implement the `__len__(self)` and the `__getitem(self, index)___` method.
`__len__(self)` only needs to return the size/length of the dataset, while `__getitem(self, index)___` needs to map an index to a tuple of (sample, label). Batching will be automatically handled by the dataloader, so there is no need to think about that for now.
We want our model to predict the probability of all possible characters, that can succeed the input character. Hence, our samples will be sequences of a certain length, while the ground truth will be the same sequence but shifted forward by one character.
CAUTION: Not always the fastest method. If dataset is sufficiently simple and small, (as in our case here), manual batching is probably faster.
``` code
class Shakespeare_Dataset(torch.utils.data.Dataset):
def __init__(self, text, seq_len):
self.x = torch.LongTensor(text[:-1]) # get the data
self.y = torch.LongTensor(text[1:])
self.seq_len = seq_len # set the sequence length
def __len__(self):
return len(text) - self.seq_len - 1# length of corpora minus sequence length minus shift
def __getitem__(self, index):
return (self.x[index:index+self.seq_len],
self.x[index:index+self.seq_len]) # return tuple of (sample, label)
```
Now, we can easily instatiate our dataset and let a dataloader handle the shuffling, batching etc.
``` code
shakespeare_dset = Shakespeare_Dataset(letter_ints, seq_len=100)
trainloader = torch.utils.data.DataLoader(shakespeare_dset, batch_size=32,
shuffle=True, num_workers=2,
drop_last=True)
```
## Custom Modules (models, layers, operations...)
The majority of high level computations in PyTorch are modeled as torch.nn.Modules, be it whole models or individual layers. A nn.Module needs to implement the `forward(self, input)` method which defines the operations the Module computes.
Let us define a Recurrent Network consisting of an embedding, two GRU layers and a dense output layer (called linear layer in PyTorch terms).
``` code
class RNN(torch.nn.Module):
def __init__(self, vocab_size, hidden_size, embedding_size, batch=32, layers=2):
super(RNN, self).__init__()
self.hidden_size = hidden_size # size of the GRU layers
self.batch = batch
self.layers = layers # how many GRU layers
self.word_embeds = torch.nn.Embedding(vocab_size, embedding_size) # Embedding layer
self.gru = torch.nn.GRU(embedding_size, hidden_size, layers, batch_first=True) # GRU layer(s)
self.output_layer = torch.nn.Linear(hidden_size, vocab_size)
def forward(self, inputs, hidden):
x = self.word_embeds(inputs) # transform the input integer into a high dimensional embedding
output, hidden = self.gru(x, hidden) # Compute the output of the GRU layer(s)
output = self.output_layer(output) # compute the logits
return output, hidden
def initHidden(self):
return torch.zeros(self.layers, self.batch, self.hidden_size)
```
### Training
Let us set up the model, some hyperparameters and define a training function
``` code
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # let us do the quick way this time
rnn = RNN(alphabet_size, 1024, 256, layers=2)
loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.005)
```
``` code
def train(model, optim, loss, device):
current_loss = [] # record running loss
model.to(device) # put the model on the specified device
hidden = model.initHidden().to(device) # create the hidden state
model.train() # tell the model its training time
for X, y in trainloader:
X, y = X.to(device), y.to(device) # collect the data and labels from the dataloader and put them on the device
optimizer.zero_grad() # empty the gradients
output, hidden = model(X, hidden) # compute the output
hidden = hidden.detach() # take the hidden state out of the graph
batch_loss = loss(output.transpose(1,2), y) # compute loss
batch_loss.backward() # compute gradients
optimizer.step() # update weights
current_loss.append(batch_loss.item()) # record loss
epoch_loss = np.mean(current_loss)
return epoch_loss
```
Train the model for some epochs.
``` code
epochs = 200
for e in range(epochs):
l = train(rnn, optimizer, loss, device)
print("Epoch ",e+1, ", Loss: ", l)
torch.save(rnn.state_dict(), "../saved_models/rnn_{}epochs.pth".format(epochs+1))
```
### Text generation
Load our previously saved model.
``` code
rnn = RNN(alphabet_size, 1024, 256, layers=2, batch=1) # instantiate model
rnn.load_state_dict(torch.load("../saved_models/rnn_2epochs.pth")) # load weights
rnn.eval() # tell model its time to evaluate
```
Give the model a starting sequence.
``` code
seq = "NICO: " # starting sequence which we give the model
max_seq_len = 1000 # max sequence length
temp = 0.7 # temperature for sampling, the higher the temperature the more random the sampling, the colder the temperature the more conservative
hidden = rnn.initHidden()
input_idx = torch.LongTensor([[letter_to_int[s] for s in seq]]) # input characters to ints
```
``` code
for i in range(max_seq_len):
output, hidden = rnn(input_idx, hidden) # predict the logits for the next character
pred = torch.squeeze(output, 0)[-1]
pred = pred / temp # apply temperature
pred_id = torch.distributions.categorical.Categorical(logits=pred).sample() # sample from the distribution
input_idx = torch.cat((input_idx[:,1:], pred_id.reshape(1,-1)), 1) # predicted character is added to our input
seq += int_to_letter[pred_id.item()] # add predicted character to sequence
print(seq) # show us the sequence
```
Loading…
Cancel
Save