commit
f7dc3ea5ce
@ -0,0 +1,8 @@
|
||||
venv
|
||||
.venv
|
||||
.ipynb_checkpoints
|
||||
*.ipynb
|
||||
*.html
|
||||
*.pdf
|
||||
saved_models
|
||||
*.pth
|
@ -0,0 +1,38 @@
|
||||
# Makefile for creating notebooks, pdfs, html out of the tracked markdown files
|
||||
.PHONY: clean
|
||||
|
||||
venv:
|
||||
python3 -m venv .venv
|
||||
./.venv/bin/python -m pip install wheel jupyter
|
||||
|
||||
notebooks: createnotebookdir
|
||||
$(foreach file, $(wildcard markdown/*), pandoc $(basename $(file)).md -o notebooks/$(notdir $(basename $(file))).ipynb ;)
|
||||
|
||||
pdf: createpdfdir notebooks
|
||||
$(foreach file, $(wildcard notebooks/*), jupyter nbconvert --output-dir='./pdf/' --to pdf $(basename $(file)).ipynb ;)
|
||||
|
||||
html: createhtmldir notebooks
|
||||
$(foreach file, $(wildcard notebooks/*), jupyter nbconvert --output-dir='./html/' --to html $(basename $(file)).ipynb ;)
|
||||
|
||||
# helper functions to create folders
|
||||
createnotebookdir:
|
||||
mkdir -p notebooks
|
||||
|
||||
createhtmldir:
|
||||
mkdir -p html
|
||||
|
||||
createpdfdir:
|
||||
mkdir -p pdf
|
||||
|
||||
# clean up helper functions for the individual formats
|
||||
cleannotebooks:
|
||||
rm -rf notebooks
|
||||
|
||||
cleanpdf:
|
||||
rm -rf pdf
|
||||
|
||||
cleanhtml:
|
||||
rm -rf html
|
||||
|
||||
# clean directory from files and folders that are not tracked
|
||||
clean: cleannotebooks cleanpdf cleanhtml
|
@ -0,0 +1,40 @@
|
||||
# Repository for Introductory Information on PyTorch
|
||||
|
||||
* author(s): Tom Weber
|
||||
* date: June 2020
|
||||
|
||||
|
||||
Due to the nature of jupyter notebooks, they don't integrate nicely with git. It is not always straight forward to see what was changed in the commit history. Therefore, I opted to write the notebooks in markdown and then compile them with pandoc into jupyter notebooks.
|
||||
|
||||
Documentation for pandoc can be found [here](https://pandoc.org/MANUAL.html#creating-jupyter-notebooks-with-pandoc).
|
||||
|
||||
I have included a install script for pandoc on debian-based systems, as they tend to use outdated packages.
|
||||
|
||||
## Dependencies
|
||||
|
||||
* **pandoc >2.6**: to compile markdown files into jupyter notebooks
|
||||
* **python3**: to install the environment and run the jupyter notebooks
|
||||
* **latex**: to convert notebooks into pdf
|
||||
|
||||
* setting up a python requirement is advised, see Erik's introduction
|
||||
|
||||
* **python packages**:
|
||||
- jupyter
|
||||
- torch
|
||||
- torchvision
|
||||
|
||||
## Getting Started
|
||||
|
||||
1) Install the virtual environment and jupyter with `make venv`
|
||||
2) Activate the environment `source .venv/bin/activate`
|
||||
3) Create the notebooks `make notebooks`
|
||||
3) Open the first notebook `jupyter notebook notebooks/0_Intro_Jupyter_Cuda.ipynb`
|
||||
|
||||
## Make commands
|
||||
|
||||
The creation of notebooks and the corresponding html and pdf files is handled by Make.
|
||||
|
||||
* `make notebooks`: creates jupyter notebooks in ./notebooks *(needs pandoc)*
|
||||
* `make pdf`: creates pdf files of the notebooks in ./pdf *(needs jupyter and latex)*
|
||||
* `make html`: creates html files of the notebooks in ./html *(needs jupyter)*
|
||||
* `make venv`: create the environment and install jupyter *(needs python3)*
|
@ -0,0 +1,6 @@
|
||||
#!/bin/sh
|
||||
sudo apt remove pandoc pandoc-citeproc
|
||||
wget https://github.com/jgm/pandoc/releases/download/2.9.2.1/pandoc-2.9.2.1-1-amd64.deb
|
||||
sudo dpkg -i pandoc-2.9.2.1-1-amd64.deb
|
||||
rm pandoc-2.9.2.1-1-amd64.deb
|
||||
|
@ -0,0 +1,203 @@
|
||||
---
|
||||
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
|
||||
author: Tom Weber
|
||||
---
|
||||
|
||||
## Connecting to GWS
|
||||
|
||||
The ssh command can be equipped with additional statements to allow port forwarding.
|
||||
|
||||
Thereby, one can use jupyter notebooks on remote servers. *(not recommended for actual projects!)*
|
||||
|
||||
E.g. `ssh -L 8000:localhost:8888 tomweber@REMOTESERVER`
|
||||
|
||||
This enables our machine to listen on the remote port 8888 (jupyter notebook port) and foward it to our local port 8000.
|
||||
|
||||
|
||||
## Setting up the environment
|
||||
|
||||
This notebook assumes that it is run in a virtual environment. Using environments is encouraged in order to avoid package conflicts.
|
||||
|
||||
**Quick setup**
|
||||
|
||||
Close the jupyter server and execute the following shell commands, one after the other:
|
||||
|
||||
|
||||
```shell
|
||||
python3 -m venv .venv # install the environment
|
||||
source .venv/bin/activate # activate the environment
|
||||
pip install jupyter # install jupyter into the environment
|
||||
```
|
||||
|
||||
## Installing PyTorch
|
||||
|
||||
In order to see if torch is installed, check the output of the next cell
|
||||
(prepending an exclamation mark executes shell code inside the jupyter notebook).
|
||||
|
||||
|
||||
``` code
|
||||
!pip list --format columns | grep torch
|
||||
```
|
||||
|
||||
If there is no output, it is _not_ installed. Therefore we want to install it.
|
||||
|
||||
Install PyTorch with:
|
||||
|
||||
``` code
|
||||
!pip install torch
|
||||
```
|
||||
|
||||
As long as the environment is activated (and we are hopefully running the notebook from there), pip will install the package and dependencies into the appropriate venv folder.
|
||||
Global packages are masked and won't be conflicting with our local packages.
|
||||
|
||||
|
||||
## Figuring out CUDA with PyTorch
|
||||
|
||||
``` code
|
||||
import torch
|
||||
```
|
||||
|
||||
Let's begin by checking if CUDA works with PyTorch at all:
|
||||
|
||||
``` code
|
||||
if torch.cuda.is_available():
|
||||
print("CUDA available")
|
||||
else:
|
||||
print("Could not find CUDA, possibly encountering problems with current CUDA version")
|
||||
```
|
||||
|
||||
In contrast to most local machines, the servers are usually equipped with multiple GPUs. Let's see how many there are:
|
||||
|
||||
``` code
|
||||
print("GPUs available: ", torch.cuda.device_count()) # show number of cuda devices
|
||||
```
|
||||
|
||||
### Computing with tensors on the GPU
|
||||
|
||||
In PyTorch, tensors are always associated with a device on which they are running, i.e. CPU or GPU/CUDA. Operations can be arbitrarily executed on tensors no matter which device they are on.
|
||||
|
||||
By default, tensors are created on the "CPU" device.
|
||||
|
||||
``` code
|
||||
x = torch.ones((3,3)) # create 3x3 tensor consisting of ones
|
||||
print(x.device) # show associated device of x
|
||||
```
|
||||
|
||||
In order to run computations on the GPU, the associated tensors must be explicitly copied there.
|
||||
|
||||
``` code
|
||||
x = x.cuda() # copy tensor to cuda device
|
||||
print(x.device) # show associated device of x
|
||||
```
|
||||
|
||||
Let's look at an example:
|
||||
|
||||
``` code
|
||||
cpu1 = torch.rand((400,400))
|
||||
# create 400x400 tensor consisting of random (normal) numbers
|
||||
cpu2 = torch.rand((400,400))
|
||||
|
||||
%timeit torch.matmul(cpu1,cpu2) # time the execution of matrix multiplication
|
||||
```
|
||||
|
||||
``` code
|
||||
gpu1 = torch.rand((400,400)).cuda()
|
||||
# create 400x400 tensor consisting of random (normal) numbers and copy to CUDA device
|
||||
gpu2 = torch.rand((400,400)).cuda()
|
||||
|
||||
%timeit torch.matmul(gpu1,gpu2) # time the execution of matrix multiplication
|
||||
```
|
||||
|
||||
### Single GPU use case
|
||||
|
||||
By default, PyTorch will always use the "first" GPU (i.e. lowest device number) as the current device.
|
||||
CAUTION: CUDA numbering is not necessarily the same as it is shown in `nvidia-smi`!
|
||||
`nvidia-smi` orders by PCI-Bus.
|
||||
|
||||
We can check the selected device number with:
|
||||
|
||||
``` code
|
||||
print("The currently selected GPU is number:", torch.cuda.current_device(),
|
||||
", it's a ", torch.cuda.get_device_name(device=None))
|
||||
```
|
||||
|
||||
One should always cross-reference if that is actually the device one wants to use. Which is easy in the case when there are different GPUs on the server. However, in our case, there are two GPUs with the same name.
|
||||
|
||||
``` code
|
||||
!nvidia-smi -L # show the GPUs installed on the machine
|
||||
```
|
||||
|
||||
If one wants to change the current device, there are several possible ways to achieve this.
|
||||
|
||||
1.) Best practice is to explicitly whitelist the GPU your code can see, effectively masking the rest. This will avoid any accidental overlap with other GPUs that you did not book.
|
||||
|
||||
Note: This way we can also bring consistency in the ordering by telling CUDA to order the GPUs by pci bus id.
|
||||
|
||||
Due to how jupyter notebooks work, executing the cell will not have any effect, because we already imported torch and intialized cuda.
|
||||
Therefore, restart the the kernel and execute the cell again.
|
||||
|
||||
|
||||
``` code
|
||||
import os
|
||||
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
|
||||
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
|
||||
```
|
||||
|
||||
Now let's check how many devices we can see:
|
||||
|
||||
|
||||
``` code
|
||||
import torch
|
||||
print("GPUs available: ", torch.cuda.device_count())
|
||||
for device in range(torch.cuda.device_count()):
|
||||
print("Device",device, ":", torch.cuda.get_device_name(device=device))
|
||||
```
|
||||
|
||||
|
||||
2.) One can set the cuda device manually.
|
||||
|
||||
``` code
|
||||
torch.cuda.set_device(1) # make cuda device nr. 1 the current device
|
||||
print(torch.cuda.get_device_name(device=None))
|
||||
```
|
||||
|
||||
3.) But better practice would be to embed your code into a cuda device context
|
||||
|
||||
``` code
|
||||
with torch.cuda.device(0): # context manager for specific cuda device
|
||||
# your code here
|
||||
print(torch.cuda.get_device_name(device=None))
|
||||
```
|
||||
|
||||
Alternatively, on can also copy the tensors to a specific device:
|
||||
|
||||
``` code
|
||||
x = torch.ones((3,3))
|
||||
x_on_1 = x.to("cuda:0")
|
||||
x_on_2 = x.to("cuda:1")
|
||||
print(x_on_1.device)
|
||||
print(x_on_2.device)
|
||||
```
|
||||
|
||||
Often times, in various tutorials on the internet, you can find the following:
|
||||
|
||||
``` code
|
||||
torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
x = x.to(device)
|
||||
print(x.device)
|
||||
```
|
||||
|
||||
This way on can create CUDA agnostic code, that works on both, machines with and without GPU. However, only use this if you have made your reserved GPU explicitly visible and hid the rest. Otherwise this will automatically select GPU 0 as your CUDA device.
|
||||
|
||||
|
||||
### Parallelize on multiple GPUs
|
||||
|
||||
Parallellizing training on multiple GPUs is in most cases a one-liner.
|
||||
|
||||
PyTorch comes with torch.nn.Parallel that makes it easy to split batches across GPUs.
|
||||
|
||||
Essentially, the model gets copied to each GPU and receives part of the minibatch to process.
|
||||
|
||||
``` code
|
||||
net = torch.nn.DataParallel(net, device_ids=[0,1])
|
||||
```
|
@ -0,0 +1,197 @@
|
||||
---
|
||||
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
|
||||
author: Tom Weber
|
||||
---
|
||||
|
||||
## Preliminaries
|
||||
|
||||
Make sure we are only using our reserved GPUs.
|
||||
|
||||
``` code
|
||||
import os
|
||||
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
|
||||
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
|
||||
```
|
||||
|
||||
## Training a Standard Vision Classifier
|
||||
|
||||
### Bulding a Model with Sequential()
|
||||
|
||||
Let's do a standard image classification task.
|
||||
|
||||
``` code
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
```
|
||||
|
||||
Sequential works very similar to the Keras concept. A container wraps around individual layers in the order they are given.
|
||||
|
||||
``` code
|
||||
net = nn.Sequential(nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
|
||||
nn.ReLU(), # non-linearity
|
||||
nn.MaxPool2d((2,2)), # pooling
|
||||
nn.Conv2d(6, 16, 5), # 16 filters this time
|
||||
nn.ReLU(), # non-linearity
|
||||
nn.MaxPool2d((2,2)), # pooling
|
||||
nn.Flatten(), # flatten feature maps
|
||||
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
|
||||
nn.Linear(100, 10)
|
||||
)
|
||||
|
||||
net = net.cuda() # put the model on the GPU
|
||||
```
|
||||
|
||||
### Creating dataloaders
|
||||
|
||||
For simplicity sake, I will just take a premade dataset that is supplied with torch.
|
||||
The dataset is part of the torchvision module, which we don't have yet.
|
||||
|
||||
``` code
|
||||
!pip install torchvision
|
||||
```
|
||||
|
||||
``` code
|
||||
import torchvision
|
||||
```
|
||||
|
||||
Datasets can easily created with custom data buy subclassing torch.nn.Dataset, see next jupypter notebook.
|
||||
(The datasets and preprocessing options used here are torchvision specific.)
|
||||
|
||||
``` code
|
||||
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
|
||||
torchvision.transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))])
|
||||
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
|
||||
download=True, transform=torchvision.transforms.ToTensor())
|
||||
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
|
||||
download=True, transform=torchvision.transforms.ToTensor())
|
||||
```
|
||||
|
||||
|
||||
A dataloader takes a dataset and bunch of other arguments and provides convenient data access to feed to the network.
|
||||
|
||||
``` code
|
||||
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
|
||||
shuffle=True, num_workers=2)
|
||||
|
||||
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
|
||||
shuffle=False, num_workers=2)
|
||||
```
|
||||
|
||||
### Inspect the model with tensorboard
|
||||
|
||||
Tensorboard, while originally from TensorFlow, also works with PyTorch pretty well.
|
||||
|
||||
``` code
|
||||
!pip install tensorboard
|
||||
```
|
||||
|
||||
``` code
|
||||
from torch.utils.tensorboard import SummaryWriter
|
||||
|
||||
writer = SummaryWriter('runs') # initialize the writer with folder "./runs"
|
||||
imgs, _ = next(iter(trainloader)) # get some input to trace the graph
|
||||
writer.add_graph(net, imgs.cuda()) # trace the graph once and store it
|
||||
```
|
||||
|
||||
Now we can start tensorboard in the same location where the notebook is located with `tensorboard --logdir=runs`
|
||||
and open it in our browser at [localhost:6006](localhost:6006)
|
||||
|
||||
``` code
|
||||
!tensorboard --logdir=runs --port=6007
|
||||
```
|
||||
|
||||
|
||||
### Prepare training function
|
||||
|
||||
We still need a loss and an optimizer
|
||||
|
||||
``` code
|
||||
import numpy as np # for later use
|
||||
loss = nn.CrossEntropyLoss() # takes logits as predictions and int label
|
||||
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9) # optimizer needs to be supplied with the parameters to optimize
|
||||
```
|
||||
|
||||
Build a function that trains the model on the data for one epoch
|
||||
|
||||
``` code
|
||||
def train(net, dataloader, optimizer, loss):
|
||||
epoch_loss = [] # save a running loss
|
||||
net.train() # tell the model that it's training time
|
||||
for img, lbl in dataloader:
|
||||
img, lbl = img.cuda(), lbl.cuda() # put data on GPU
|
||||
optimizer.zero_grad() # free the optimizer from previous gradients
|
||||
out = net(img) # compute image lbls
|
||||
batch_loss = loss(out, lbl) # compute loss
|
||||
batch_loss.backward() # compute gradients
|
||||
optimizer.step() # update weights
|
||||
epoch_loss.append(batch_loss.item()) # record the batch loss
|
||||
return np.mean(epoch_loss) # return the epoch loss
|
||||
```
|
||||
|
||||
|
||||
### Train the model
|
||||
|
||||
Train the model for a couple of epochs and save checkpoints periodically
|
||||
|
||||
``` code
|
||||
for epoch in range(5):
|
||||
epoch_loss = train(net, trainloader, optimizer, loss)
|
||||
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
|
||||
writer.add_scalar("epoch loss", epoch_loss, epoch+1)
|
||||
if (epoch+1) % 5 == 0:
|
||||
torch.save(net.state_dict(), "../saved_models/net_{}_epochs.pth".format(epoch+1))
|
||||
```
|
||||
|
||||
``` code
|
||||
!tensorboard --logdir=runs --port=6007
|
||||
```
|
||||
|
||||
|
||||
### Evaluate the Model
|
||||
|
||||
Since the images are small we can run the evaluation just fine on the CPU. The model has to be brought back to the CPU for that purpose.
|
||||
|
||||
Each model has .train() and .eval() flags that specify the behaviour of certain layers.
|
||||
|
||||
``` code
|
||||
net = net.cpu() # bring the network back from the GPU
|
||||
net.eval() # tell the network that it's testing time
|
||||
correct = 0
|
||||
total = 0
|
||||
for img, lbl in testloader:
|
||||
out = net(img)
|
||||
logits, indices = torch.max(out, 1)
|
||||
correct += torch.sum(indices == lbl).item()
|
||||
total += len(lbl)
|
||||
print("The model correctly classified ", correct/total*100, "% of the images.")
|
||||
```
|
||||
|
||||
### Train the model on multiple GPUs
|
||||
|
||||
Create the network again, but then generate an instance of it with nn.DataParallel.
|
||||
|
||||
``` code
|
||||
net_parallel = nn.Sequential(
|
||||
nn.Conv2d(3, 6, 5), # 3 input channels, 6 filters each 5x5
|
||||
nn.ReLU(), # non-linearity
|
||||
nn.MaxPool2d(2,2), # pooling
|
||||
nn.Conv2d(6, 16, 5), # 16 filters this time
|
||||
nn.ReLU(), # non-linearity
|
||||
nn.MaxPool2d(2,2), # pooling
|
||||
nn.Flatten(),
|
||||
nn.Linear(16*5*5, 100), # 16x5x5 input neurons, 100 output neurons
|
||||
nn.Linear(100, 10)
|
||||
)
|
||||
net_parallel = torch.nn.DataParallel(net_parallel, device_ids=[0,1])
|
||||
net_parallel = net_parallel.cuda() # put the model on the first GPU
|
||||
optimizer_parallel = torch.optim.SGD(net_parallel.parameters(),
|
||||
lr=0.001, momentum=0.9) # dont forget to inform the optimizer
|
||||
```
|
||||
|
||||
Take it for a test drive. Keep your eyes peeled at a terminal with e.g. `watch -d nvidia-smi`. There will be no speed increase in this case as it is a relatively small model. On the contrary, the overhead of copying the model to the other GPUs will probably result in a net training time loss.
|
||||
|
||||
``` code
|
||||
for epoch in range(10):
|
||||
epoch_loss = train(net_parallel, trainloader, optimizer_parallel, loss)
|
||||
print("Epoch ",epoch+1," finished, Loss: ", epoch_loss)
|
||||
```
|
@ -0,0 +1,186 @@
|
||||
---
|
||||
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
|
||||
author: Tom Weber
|
||||
---
|
||||
|
||||
## Preliminaries
|
||||
|
||||
Make sure we are only using our reserved GPUs.
|
||||
|
||||
``` code
|
||||
import os
|
||||
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
|
||||
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
|
||||
```
|
||||
|
||||
## Using Torch Modules and Datasets
|
||||
|
||||
This part of the PyTorch introduction will focus on creating custom torch modules and datasets, while applying those concepts to a fun character-level text generation task.
|
||||
|
||||
### Preparation
|
||||
|
||||
``` code
|
||||
import torch
|
||||
import numpy as np
|
||||
from urllib.request import urlopen # for importing the data
|
||||
```
|
||||
|
||||
Let us borrow a nice text dataset from TensorFlow.
|
||||
|
||||
``` code
|
||||
text_source = "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt"
|
||||
text = urlopen(text_source).read().decode(encoding="utf-8")
|
||||
```
|
||||
|
||||
Do some general NLP preprocessing.
|
||||
|
||||
``` code
|
||||
def preprocess(text):
|
||||
alphabet = sorted(set(text))
|
||||
letter_to_int = {let: ind for ind, let in enumerate(alphabet)}
|
||||
int_to_letter = {ind: let for ind, let in enumerate(alphabet)}
|
||||
letter_ints = [letter_to_int[letter] for letter in text]
|
||||
alphabet_size = len(alphabet)
|
||||
return int_to_letter, letter_to_int, alphabet_size, letter_ints
|
||||
```
|
||||
Now we can transform our text into a sequence of integers, where each integer represents are character.
|
||||
|
||||
``` code
|
||||
int_to_letter, letter_to_int, alphabet_size, letter_ints = preprocess(text)
|
||||
print("Alphabet size:", alphabet_size)
|
||||
print("Length of letter sequence:", len(text))
|
||||
```
|
||||
|
||||
## Custom Datasets
|
||||
|
||||
Previously we imported a pre-made dataset and created a dataloader. This time, we want to create our own dataset that can be used to construct a dataloader.
|
||||
|
||||
A custom dataset needs to at least implement the `__len__(self)` and the `__getitem(self, index)___` method.
|
||||
`__len__(self)` only needs to return the size/length of the dataset, while `__getitem(self, index)___` needs to map an index to a tuple of (sample, label). Batching will be automatically handled by the dataloader, so there is no need to think about that for now.
|
||||
|
||||
We want our model to predict the probability of all possible characters, that can succeed the input character. Hence, our samples will be sequences of a certain length, while the ground truth will be the same sequence but shifted forward by one character.
|
||||
|
||||
CAUTION: Not always the fastest method. If dataset is sufficiently simple and small, (as in our case here), manual batching is probably faster.
|
||||
|
||||
``` code
|
||||
class Shakespeare_Dataset(torch.utils.data.Dataset):
|
||||
def __init__(self, text, seq_len):
|
||||
self.x = torch.LongTensor(text[:-1]) # get the data
|
||||
self.y = torch.LongTensor(text[1:])
|
||||
self.seq_len = seq_len # set the sequence length
|
||||
|
||||
def __len__(self):
|
||||
return len(text) - self.seq_len - 1# length of corpora minus sequence length minus shift
|
||||
|
||||
def __getitem__(self, index):
|
||||
return (self.x[index:index+self.seq_len],
|
||||
self.x[index:index+self.seq_len]) # return tuple of (sample, label)
|
||||
|
||||
```
|
||||
|
||||
Now, we can easily instatiate our dataset and let a dataloader handle the shuffling, batching etc.
|
||||
|
||||
``` code
|
||||
shakespeare_dset = Shakespeare_Dataset(letter_ints, seq_len=100)
|
||||
trainloader = torch.utils.data.DataLoader(shakespeare_dset, batch_size=32,
|
||||
shuffle=True, num_workers=2,
|
||||
drop_last=True)
|
||||
```
|
||||
|
||||
## Custom Modules (models, layers, operations...)
|
||||
|
||||
The majority of high level computations in PyTorch are modeled as torch.nn.Modules, be it whole models or individual layers. A nn.Module needs to implement the `forward(self, input)` method which defines the operations the Module computes.
|
||||
|
||||
Let us define a Recurrent Network consisting of an embedding, two GRU layers and a dense output layer (called linear layer in PyTorch terms).
|
||||
|
||||
``` code
|
||||
class RNN(torch.nn.Module):
|
||||
def __init__(self, vocab_size, hidden_size, embedding_size, batch=32, layers=2):
|
||||
super(RNN, self).__init__()
|
||||
self.hidden_size = hidden_size # size of the GRU layers
|
||||
self.batch = batch
|
||||
self.layers = layers # how many GRU layers
|
||||
self.word_embeds = torch.nn.Embedding(vocab_size, embedding_size) # Embedding layer
|
||||
self.gru = torch.nn.GRU(embedding_size, hidden_size, layers, batch_first=True) # GRU layer(s)
|
||||
self.output_layer = torch.nn.Linear(hidden_size, vocab_size)
|
||||
|
||||
def forward(self, inputs, hidden):
|
||||
x = self.word_embeds(inputs) # transform the input integer into a high dimensional embedding
|
||||
output, hidden = self.gru(x, hidden) # Compute the output of the GRU layer(s)
|
||||
output = self.output_layer(output) # compute the logits
|
||||
return output, hidden
|
||||
|
||||
def initHidden(self):
|
||||
return torch.zeros(self.layers, self.batch, self.hidden_size)
|
||||
```
|
||||
|
||||
### Training
|
||||
|
||||
Let us set up the model, some hyperparameters and define a training function
|
||||
|
||||
``` code
|
||||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # let us do the quick way this time
|
||||
rnn = RNN(alphabet_size, 1024, 256, layers=2)
|
||||
loss = torch.nn.CrossEntropyLoss()
|
||||
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.005)
|
||||
```
|
||||
|
||||
``` code
|
||||
def train(model, optim, loss, device):
|
||||
current_loss = [] # record running loss
|
||||
model.to(device) # put the model on the specified device
|
||||
hidden = model.initHidden().to(device) # create the hidden state
|
||||
model.train() # tell the model its training time
|
||||
for X, y in trainloader:
|
||||
X, y = X.to(device), y.to(device) # collect the data and labels from the dataloader and put them on the device
|
||||
optimizer.zero_grad() # empty the gradients
|
||||
output, hidden = model(X, hidden) # compute the output
|
||||
hidden = hidden.detach() # take the hidden state out of the graph
|
||||
batch_loss = loss(output.transpose(1,2), y) # compute loss
|
||||
batch_loss.backward() # compute gradients
|
||||
optimizer.step() # update weights
|
||||
current_loss.append(batch_loss.item()) # record loss
|
||||
epoch_loss = np.mean(current_loss)
|
||||
return epoch_loss
|
||||
```
|
||||
|
||||
Train the model for some epochs.
|
||||
|
||||
``` code
|
||||
epochs = 200
|
||||
for e in range(epochs):
|
||||
l = train(rnn, optimizer, loss, device)
|
||||
print("Epoch ",e+1, ", Loss: ", l)
|
||||
torch.save(rnn.state_dict(), "../saved_models/rnn_{}epochs.pth".format(epochs+1))
|
||||
```
|
||||
|
||||
### Text generation
|
||||
|
||||
Load our previously saved model.
|
||||
|
||||
``` code
|
||||
rnn = RNN(alphabet_size, 1024, 256, layers=2, batch=1) # instantiate model
|
||||
rnn.load_state_dict(torch.load("../saved_models/rnn_2epochs.pth")) # load weights
|
||||
rnn.eval() # tell model its time to evaluate
|
||||
```
|
||||
|
||||
Give the model a starting sequence.
|
||||
|
||||
``` code
|
||||
seq = "NICO: " # starting sequence which we give the model
|
||||
max_seq_len = 1000 # max sequence length
|
||||
temp = 0.7 # temperature for sampling, the higher the temperature the more random the sampling, the colder the temperature the more conservative
|
||||
hidden = rnn.initHidden()
|
||||
input_idx = torch.LongTensor([[letter_to_int[s] for s in seq]]) # input characters to ints
|
||||
```
|
||||
|
||||
``` code
|
||||
for i in range(max_seq_len):
|
||||
output, hidden = rnn(input_idx, hidden) # predict the logits for the next character
|
||||
pred = torch.squeeze(output, 0)[-1]
|
||||
pred = pred / temp # apply temperature
|
||||
pred_id = torch.distributions.categorical.Categorical(logits=pred).sample() # sample from the distribution
|
||||
input_idx = torch.cat((input_idx[:,1:], pred_id.reshape(1,-1)), 1) # predicted character is added to our input
|
||||
seq += int_to_letter[pred_id.item()] # add predicted character to sequence
|
||||
print(seq) # show us the sequence
|
||||
```
|
Loading…
Reference in new issue