7.4 KiB
title | author |
---|---|
PyTorch Intro I: SSH, Jupyter and Cuda | Tom Weber |
Preliminaries
Make sure we are only using our reserved GPUs.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id
os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible
Using Torch Modules and Datasets
This part of the PyTorch introduction will focus on creating custom torch modules and datasets, while applying those concepts to a fun character-level text generation task.
Preparation
import torch
import numpy as np
from urllib.request import urlopen # for importing the data
Let us borrow a nice text dataset from TensorFlow.
text_source = "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt"
text = urlopen(text_source).read().decode(encoding="utf-8")
Do some general NLP preprocessing.
def preprocess(text):
alphabet = sorted(set(text))
letter_to_int = {let: ind for ind, let in enumerate(alphabet)}
int_to_letter = {ind: let for ind, let in enumerate(alphabet)}
letter_ints = [letter_to_int[letter] for letter in text]
alphabet_size = len(alphabet)
return int_to_letter, letter_to_int, alphabet_size, letter_ints
Now we can transform our text into a sequence of integers, where each integer represents are character.
int_to_letter, letter_to_int, alphabet_size, letter_ints = preprocess(text)
print("Alphabet size:", alphabet_size)
print("Length of letter sequence:", len(text))
Custom Datasets
Previously we imported a pre-made dataset and created a dataloader. This time, we want to create our own dataset that can be used to construct a dataloader.
A custom dataset needs to at least implement the __len__(self)
and the __getitem(self, index)___
method.
__len__(self)
only needs to return the size/length of the dataset, while __getitem(self, index)___
needs to map an index to a tuple of (sample, label). Batching will be automatically handled by the dataloader, so there is no need to think about that for now.
We want our model to predict the probability of all possible characters, that can succeed the input character. Hence, our samples will be sequences of a certain length, while the ground truth will be the same sequence but shifted forward by one character.
CAUTION: Not always the fastest method. If dataset is sufficiently simple and small, (as in our case here), manual batching is probably faster.
class Shakespeare_Dataset(torch.utils.data.Dataset):
def __init__(self, text, seq_len):
self.x = torch.LongTensor(text[:-1]) # get the data
self.y = torch.LongTensor(text[1:])
self.seq_len = seq_len # set the sequence length
def __len__(self):
return len(text) - self.seq_len - 1# length of corpora minus sequence length minus shift
def __getitem__(self, index):
return (self.x[index:index+self.seq_len],
self.x[index:index+self.seq_len]) # return tuple of (sample, label)
Now, we can easily instatiate our dataset and let a dataloader handle the shuffling, batching etc.
shakespeare_dset = Shakespeare_Dataset(letter_ints, seq_len=100)
trainloader = torch.utils.data.DataLoader(shakespeare_dset, batch_size=32,
shuffle=True, num_workers=2,
drop_last=True)
Custom Modules (models, layers, operations...)
The majority of high level computations in PyTorch are modeled as torch.nn.Modules, be it whole models or individual layers. A nn.Module needs to implement the forward(self, input)
method which defines the operations the Module computes.
Let us define a Recurrent Network consisting of an embedding, two GRU layers and a dense output layer (called linear layer in PyTorch terms).
class RNN(torch.nn.Module):
def __init__(self, vocab_size, hidden_size, embedding_size, batch=32, layers=2):
super(RNN, self).__init__()
self.hidden_size = hidden_size # size of the GRU layers
self.batch = batch
self.layers = layers # how many GRU layers
self.word_embeds = torch.nn.Embedding(vocab_size, embedding_size) # Embedding layer
self.gru = torch.nn.GRU(embedding_size, hidden_size, layers, batch_first=True) # GRU layer(s)
self.output_layer = torch.nn.Linear(hidden_size, vocab_size)
def forward(self, inputs, hidden):
x = self.word_embeds(inputs) # transform the input integer into a high dimensional embedding
output, hidden = self.gru(x, hidden) # Compute the output of the GRU layer(s)
output = self.output_layer(output) # compute the logits
return output, hidden
def initHidden(self):
return torch.zeros(self.layers, self.batch, self.hidden_size)
Training
Let us set up the model, some hyperparameters and define a training function
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # let us do the quick way this time
rnn = RNN(alphabet_size, 1024, 256, layers=2)
loss = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.005)
def train(model, optim, loss, device):
current_loss = [] # record running loss
model.to(device) # put the model on the specified device
hidden = model.initHidden().to(device) # create the hidden state
model.train() # tell the model its training time
for X, y in trainloader:
X, y = X.to(device), y.to(device) # collect the data and labels from the dataloader and put them on the device
optimizer.zero_grad() # empty the gradients
output, hidden = model(X, hidden) # compute the output
hidden = hidden.detach() # take the hidden state out of the graph
batch_loss = loss(output.transpose(1,2), y) # compute loss
batch_loss.backward() # compute gradients
optimizer.step() # update weights
current_loss.append(batch_loss.item()) # record loss
epoch_loss = np.mean(current_loss)
return epoch_loss
Train the model for some epochs.
epochs = 200
for e in range(epochs):
l = train(rnn, optimizer, loss, device)
print("Epoch ",e+1, ", Loss: ", l)
torch.save(rnn.state_dict(), "../saved_models/rnn_{}epochs.pth".format(epochs+1))
Text generation
Load our previously saved model.
rnn = RNN(alphabet_size, 1024, 256, layers=2, batch=1) # instantiate model
rnn.load_state_dict(torch.load("../saved_models/rnn_2epochs.pth")) # load weights
rnn.eval() # tell model its time to evaluate
Give the model a starting sequence.
seq = "NICO: " # starting sequence which we give the model
max_seq_len = 1000 # max sequence length
temp = 0.7 # temperature for sampling, the higher the temperature the more random the sampling, the colder the temperature the more conservative
hidden = rnn.initHidden()
input_idx = torch.LongTensor([[letter_to_int[s] for s in seq]]) # input characters to ints
for i in range(max_seq_len):
output, hidden = rnn(input_idx, hidden) # predict the logits for the next character
pred = torch.squeeze(output, 0)[-1]
pred = pred / temp # apply temperature
pred_id = torch.distributions.categorical.Categorical(logits=pred).sample() # sample from the distribution
input_idx = torch.cat((input_idx[:,1:], pred_id.reshape(1,-1)), 1) # predicted character is added to our input
seq += int_to_letter[pred_id.item()] # add predicted character to sequence
print(seq) # show us the sequence