You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
187 lines
7.4 KiB
187 lines
7.4 KiB
---
|
|
title: "PyTorch Intro I: SSH, Jupyter and Cuda"
|
|
author: Tom Weber
|
|
---
|
|
|
|
## Preliminaries
|
|
|
|
Make sure we are only using our reserved GPUs.
|
|
|
|
``` code
|
|
import os
|
|
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
|
|
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
|
|
```
|
|
|
|
## Using Torch Modules and Datasets
|
|
|
|
This part of the PyTorch introduction will focus on creating custom torch modules and datasets, while applying those concepts to a fun character-level text generation task.
|
|
|
|
### Preparation
|
|
|
|
``` code
|
|
import torch
|
|
import numpy as np
|
|
from urllib.request import urlopen # for importing the data
|
|
```
|
|
|
|
Let us borrow a nice text dataset from TensorFlow.
|
|
|
|
``` code
|
|
text_source = "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt"
|
|
text = urlopen(text_source).read().decode(encoding="utf-8")
|
|
```
|
|
|
|
Do some general NLP preprocessing.
|
|
|
|
``` code
|
|
def preprocess(text):
|
|
alphabet = sorted(set(text))
|
|
letter_to_int = {let: ind for ind, let in enumerate(alphabet)}
|
|
int_to_letter = {ind: let for ind, let in enumerate(alphabet)}
|
|
letter_ints = [letter_to_int[letter] for letter in text]
|
|
alphabet_size = len(alphabet)
|
|
return int_to_letter, letter_to_int, alphabet_size, letter_ints
|
|
```
|
|
Now we can transform our text into a sequence of integers, where each integer represents are character.
|
|
|
|
``` code
|
|
int_to_letter, letter_to_int, alphabet_size, letter_ints = preprocess(text)
|
|
print("Alphabet size:", alphabet_size)
|
|
print("Length of letter sequence:", len(text))
|
|
```
|
|
|
|
## Custom Datasets
|
|
|
|
Previously we imported a pre-made dataset and created a dataloader. This time, we want to create our own dataset that can be used to construct a dataloader.
|
|
|
|
A custom dataset needs to at least implement the `__len__(self)` and the `__getitem(self, index)___` method.
|
|
`__len__(self)` only needs to return the size/length of the dataset, while `__getitem(self, index)___` needs to map an index to a tuple of (sample, label). Batching will be automatically handled by the dataloader, so there is no need to think about that for now.
|
|
|
|
We want our model to predict the probability of all possible characters, that can succeed the input character. Hence, our samples will be sequences of a certain length, while the ground truth will be the same sequence but shifted forward by one character.
|
|
|
|
CAUTION: Not always the fastest method. If dataset is sufficiently simple and small, (as in our case here), manual batching is probably faster.
|
|
|
|
``` code
|
|
class Shakespeare_Dataset(torch.utils.data.Dataset):
|
|
def __init__(self, text, seq_len):
|
|
self.x = torch.LongTensor(text[:-1]) # get the data
|
|
self.y = torch.LongTensor(text[1:])
|
|
self.seq_len = seq_len # set the sequence length
|
|
|
|
def __len__(self):
|
|
return len(text) - self.seq_len - 1# length of corpora minus sequence length minus shift
|
|
|
|
def __getitem__(self, index):
|
|
return (self.x[index:index+self.seq_len],
|
|
self.x[index:index+self.seq_len]) # return tuple of (sample, label)
|
|
|
|
```
|
|
|
|
Now, we can easily instatiate our dataset and let a dataloader handle the shuffling, batching etc.
|
|
|
|
``` code
|
|
shakespeare_dset = Shakespeare_Dataset(letter_ints, seq_len=100)
|
|
trainloader = torch.utils.data.DataLoader(shakespeare_dset, batch_size=32,
|
|
shuffle=True, num_workers=2,
|
|
drop_last=True)
|
|
```
|
|
|
|
## Custom Modules (models, layers, operations...)
|
|
|
|
The majority of high level computations in PyTorch are modeled as torch.nn.Modules, be it whole models or individual layers. A nn.Module needs to implement the `forward(self, input)` method which defines the operations the Module computes.
|
|
|
|
Let us define a Recurrent Network consisting of an embedding, two GRU layers and a dense output layer (called linear layer in PyTorch terms).
|
|
|
|
``` code
|
|
class RNN(torch.nn.Module):
|
|
def __init__(self, vocab_size, hidden_size, embedding_size, batch=32, layers=2):
|
|
super(RNN, self).__init__()
|
|
self.hidden_size = hidden_size # size of the GRU layers
|
|
self.batch = batch
|
|
self.layers = layers # how many GRU layers
|
|
self.word_embeds = torch.nn.Embedding(vocab_size, embedding_size) # Embedding layer
|
|
self.gru = torch.nn.GRU(embedding_size, hidden_size, layers, batch_first=True) # GRU layer(s)
|
|
self.output_layer = torch.nn.Linear(hidden_size, vocab_size)
|
|
|
|
def forward(self, inputs, hidden):
|
|
x = self.word_embeds(inputs) # transform the input integer into a high dimensional embedding
|
|
output, hidden = self.gru(x, hidden) # Compute the output of the GRU layer(s)
|
|
output = self.output_layer(output) # compute the logits
|
|
return output, hidden
|
|
|
|
def initHidden(self):
|
|
return torch.zeros(self.layers, self.batch, self.hidden_size)
|
|
```
|
|
|
|
### Training
|
|
|
|
Let us set up the model, some hyperparameters and define a training function
|
|
|
|
``` code
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # let us do the quick way this time
|
|
rnn = RNN(alphabet_size, 1024, 256, layers=2)
|
|
loss = torch.nn.CrossEntropyLoss()
|
|
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.005)
|
|
```
|
|
|
|
``` code
|
|
def train(model, optim, loss, device):
|
|
current_loss = [] # record running loss
|
|
model.to(device) # put the model on the specified device
|
|
hidden = model.initHidden().to(device) # create the hidden state
|
|
model.train() # tell the model its training time
|
|
for X, y in trainloader:
|
|
X, y = X.to(device), y.to(device) # collect the data and labels from the dataloader and put them on the device
|
|
optimizer.zero_grad() # empty the gradients
|
|
output, hidden = model(X, hidden) # compute the output
|
|
hidden = hidden.detach() # take the hidden state out of the graph
|
|
batch_loss = loss(output.transpose(1,2), y) # compute loss
|
|
batch_loss.backward() # compute gradients
|
|
optimizer.step() # update weights
|
|
current_loss.append(batch_loss.item()) # record loss
|
|
epoch_loss = np.mean(current_loss)
|
|
return epoch_loss
|
|
```
|
|
|
|
Train the model for some epochs.
|
|
|
|
``` code
|
|
epochs = 200
|
|
for e in range(epochs):
|
|
l = train(rnn, optimizer, loss, device)
|
|
print("Epoch ",e+1, ", Loss: ", l)
|
|
torch.save(rnn.state_dict(), "../saved_models/rnn_{}epochs.pth".format(epochs+1))
|
|
```
|
|
|
|
### Text generation
|
|
|
|
Load our previously saved model.
|
|
|
|
``` code
|
|
rnn = RNN(alphabet_size, 1024, 256, layers=2, batch=1) # instantiate model
|
|
rnn.load_state_dict(torch.load("../saved_models/rnn_2epochs.pth")) # load weights
|
|
rnn.eval() # tell model its time to evaluate
|
|
```
|
|
|
|
Give the model a starting sequence.
|
|
|
|
``` code
|
|
seq = "NICO: " # starting sequence which we give the model
|
|
max_seq_len = 1000 # max sequence length
|
|
temp = 0.7 # temperature for sampling, the higher the temperature the more random the sampling, the colder the temperature the more conservative
|
|
hidden = rnn.initHidden()
|
|
input_idx = torch.LongTensor([[letter_to_int[s] for s in seq]]) # input characters to ints
|
|
```
|
|
|
|
``` code
|
|
for i in range(max_seq_len):
|
|
output, hidden = rnn(input_idx, hidden) # predict the logits for the next character
|
|
pred = torch.squeeze(output, 0)[-1]
|
|
pred = pred / temp # apply temperature
|
|
pred_id = torch.distributions.categorical.Categorical(logits=pred).sample() # sample from the distribution
|
|
input_idx = torch.cat((input_idx[:,1:], pred_id.reshape(1,-1)), 1) # predicted character is added to our input
|
|
seq += int_to_letter[pred_id.item()] # add predicted character to sequence
|
|
print(seq) # show us the sequence
|
|
```
|