--- title: "PyTorch Intro I: SSH, Jupyter and Cuda" author: Tom Weber --- ## Preliminaries Make sure we are only using our reserved GPUs. ``` code import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible ``` ## Using Torch Modules and Datasets This part of the PyTorch introduction will focus on creating custom torch modules and datasets, while applying those concepts to a fun character-level text generation task. ### Preparation ``` code import torch import numpy as np from urllib.request import urlopen # for importing the data ``` Let us borrow a nice text dataset from TensorFlow. ``` code text_source = "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt" text = urlopen(text_source).read().decode(encoding="utf-8") ``` Do some general NLP preprocessing. ``` code def preprocess(text): alphabet = sorted(set(text)) letter_to_int = {let: ind for ind, let in enumerate(alphabet)} int_to_letter = {ind: let for ind, let in enumerate(alphabet)} letter_ints = [letter_to_int[letter] for letter in text] alphabet_size = len(alphabet) return int_to_letter, letter_to_int, alphabet_size, letter_ints ``` Now we can transform our text into a sequence of integers, where each integer represents are character. ``` code int_to_letter, letter_to_int, alphabet_size, letter_ints = preprocess(text) print("Alphabet size:", alphabet_size) print("Length of letter sequence:", len(text)) ``` ## Custom Datasets Previously we imported a pre-made dataset and created a dataloader. This time, we want to create our own dataset that can be used to construct a dataloader. A custom dataset needs to at least implement the `__len__(self)` and the `__getitem(self, index)___` method. `__len__(self)` only needs to return the size/length of the dataset, while `__getitem(self, index)___` needs to map an index to a tuple of (sample, label). Batching will be automatically handled by the dataloader, so there is no need to think about that for now. We want our model to predict the probability of all possible characters, that can succeed the input character. Hence, our samples will be sequences of a certain length, while the ground truth will be the same sequence but shifted forward by one character. CAUTION: Not always the fastest method. If dataset is sufficiently simple and small, (as in our case here), manual batching is probably faster. ``` code class Shakespeare_Dataset(torch.utils.data.Dataset): def __init__(self, text, seq_len): self.x = torch.LongTensor(text[:-1]) # get the data self.y = torch.LongTensor(text[1:]) self.seq_len = seq_len # set the sequence length def __len__(self): return len(text) - self.seq_len - 1# length of corpora minus sequence length minus shift def __getitem__(self, index): return (self.x[index:index+self.seq_len], self.x[index:index+self.seq_len]) # return tuple of (sample, label) ``` Now, we can easily instatiate our dataset and let a dataloader handle the shuffling, batching etc. ``` code shakespeare_dset = Shakespeare_Dataset(letter_ints, seq_len=100) trainloader = torch.utils.data.DataLoader(shakespeare_dset, batch_size=32, shuffle=True, num_workers=2, drop_last=True) ``` ## Custom Modules (models, layers, operations...) The majority of high level computations in PyTorch are modeled as torch.nn.Modules, be it whole models or individual layers. A nn.Module needs to implement the `forward(self, input)` method which defines the operations the Module computes. Let us define a Recurrent Network consisting of an embedding, two GRU layers and a dense output layer (called linear layer in PyTorch terms). ``` code class RNN(torch.nn.Module): def __init__(self, vocab_size, hidden_size, embedding_size, batch=32, layers=2): super(RNN, self).__init__() self.hidden_size = hidden_size # size of the GRU layers self.batch = batch self.layers = layers # how many GRU layers self.word_embeds = torch.nn.Embedding(vocab_size, embedding_size) # Embedding layer self.gru = torch.nn.GRU(embedding_size, hidden_size, layers, batch_first=True) # GRU layer(s) self.output_layer = torch.nn.Linear(hidden_size, vocab_size) def forward(self, inputs, hidden): x = self.word_embeds(inputs) # transform the input integer into a high dimensional embedding output, hidden = self.gru(x, hidden) # Compute the output of the GRU layer(s) output = self.output_layer(output) # compute the logits return output, hidden def initHidden(self): return torch.zeros(self.layers, self.batch, self.hidden_size) ``` ### Training Let us set up the model, some hyperparameters and define a training function ``` code device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # let us do the quick way this time rnn = RNN(alphabet_size, 1024, 256, layers=2) loss = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(rnn.parameters(), lr=0.005) ``` ``` code def train(model, optim, loss, device): current_loss = [] # record running loss model.to(device) # put the model on the specified device hidden = model.initHidden().to(device) # create the hidden state model.train() # tell the model its training time for X, y in trainloader: X, y = X.to(device), y.to(device) # collect the data and labels from the dataloader and put them on the device optimizer.zero_grad() # empty the gradients output, hidden = model(X, hidden) # compute the output hidden = hidden.detach() # take the hidden state out of the graph batch_loss = loss(output.transpose(1,2), y) # compute loss batch_loss.backward() # compute gradients optimizer.step() # update weights current_loss.append(batch_loss.item()) # record loss epoch_loss = np.mean(current_loss) return epoch_loss ``` Train the model for some epochs. ``` code epochs = 200 for e in range(epochs): l = train(rnn, optimizer, loss, device) print("Epoch ",e+1, ", Loss: ", l) torch.save(rnn.state_dict(), "../saved_models/rnn_{}epochs.pth".format(epochs+1)) ``` ### Text generation Load our previously saved model. ``` code rnn = RNN(alphabet_size, 1024, 256, layers=2, batch=1) # instantiate model rnn.load_state_dict(torch.load("../saved_models/rnn_2epochs.pth")) # load weights rnn.eval() # tell model its time to evaluate ``` Give the model a starting sequence. ``` code seq = "NICO: " # starting sequence which we give the model max_seq_len = 1000 # max sequence length temp = 0.7 # temperature for sampling, the higher the temperature the more random the sampling, the colder the temperature the more conservative hidden = rnn.initHidden() input_idx = torch.LongTensor([[letter_to_int[s] for s in seq]]) # input characters to ints ``` ``` code for i in range(max_seq_len): output, hidden = rnn(input_idx, hidden) # predict the logits for the next character pred = torch.squeeze(output, 0)[-1] pred = pred / temp # apply temperature pred_id = torch.distributions.categorical.Categorical(logits=pred).sample() # sample from the distribution input_idx = torch.cat((input_idx[:,1:], pred_id.reshape(1,-1)), 1) # predicted character is added to our input seq += int_to_letter[pred_id.item()] # add predicted character to sequence print(seq) # show us the sequence ```