--- title: "PyTorch Intro I: SSH, Jupyter and Cuda" author: Tom Weber --- ## Connecting to remote servers for heavy computing The ssh command can be equipped with additional statements to allow port forwarding. Thereby, one can use jupyter notebooks on remote servers. *(not recommended for actual projects!)* E.g. `ssh -L 8000:localhost:8888 tomweber@REMOTESERVER` This enables our machine to listen on the remote port 8888 (jupyter notebook port) and foward it to our local port 8000. ## Setting up the environment This notebook assumes that it is run in a virtual environment. Using environments is encouraged in order to avoid package conflicts. **Quick setup** Close the jupyter server and execute the following shell commands, one after the other: ```shell python3 -m venv .venv # install the environment source .venv/bin/activate # activate the environment pip install jupyter # install jupyter into the environment ``` ## Installing PyTorch In order to see if torch is installed, check the output of the next cell (prepending an exclamation mark executes shell code inside the jupyter notebook). ``` code !pip list --format columns | grep torch ``` If there is no output, it is _not_ installed. Therefore we want to install it. Install PyTorch with: ``` code !pip install torch ``` As long as the environment is activated (and we are hopefully running the notebook from there), pip will install the package and dependencies into the appropriate venv folder. Global packages are masked and won't be conflicting with our local packages. ## Figuring out CUDA with PyTorch ``` code import torch ``` Let's begin by checking if CUDA works with PyTorch at all: ``` code if torch.cuda.is_available(): print("CUDA available") else: print("Could not find CUDA, possibly encountering problems with current CUDA version") ``` In contrast to most local machines, the servers are usually equipped with multiple GPUs. Let's see how many there are: ``` code print("GPUs available: ", torch.cuda.device_count()) # show number of cuda devices ``` ### Computing with tensors on the GPU In PyTorch, tensors are always associated with a device on which they are running, i.e. CPU or GPU/CUDA. Operations can be arbitrarily executed on tensors no matter which device they are on. By default, tensors are created on the "CPU" device. ``` code x = torch.ones((3,3)) # create 3x3 tensor consisting of ones print(x.device) # show associated device of x ``` In order to run computations on the GPU, the associated tensors must be explicitly copied there. ``` code x = x.cuda() # copy tensor to cuda device print(x.device) # show associated device of x ``` Let's look at an example: ``` code cpu1 = torch.rand((400,400)) # create 400x400 tensor consisting of random (normal) numbers cpu2 = torch.rand((400,400)) %timeit torch.matmul(cpu1,cpu2) # time the execution of matrix multiplication ``` ``` code gpu1 = torch.rand((400,400)).cuda() # create 400x400 tensor consisting of random (normal) numbers and copy to CUDA device gpu2 = torch.rand((400,400)).cuda() %timeit torch.matmul(gpu1,gpu2) # time the execution of matrix multiplication ``` ### Single GPU use case By default, PyTorch will always use the "first" GPU (i.e. lowest device number) as the current device. CAUTION: CUDA numbering is not necessarily the same as it is shown in `nvidia-smi`! `nvidia-smi` orders by PCI-Bus. We can check the selected device number with: ``` code print("The currently selected GPU is number:", torch.cuda.current_device(), ", it's a ", torch.cuda.get_device_name(device=None)) ``` One should always cross-reference if that is actually the device one wants to use. Which is easy in the case when there are different GPUs on the server. However, in our case, there are two GPUs with the same name. ``` code !nvidia-smi -L # show the GPUs installed on the machine ``` If one wants to change the current device, there are several possible ways to achieve this. 1.) Best practice is to explicitly whitelist the GPU your code can see, effectively masking the rest. This will avoid any accidental overlap with other GPUs that you did not book. Note: This way we can also bring consistency in the ordering by telling CUDA to order the GPUs by pci bus id. Due to how jupyter notebooks work, executing the cell will not have any effect, because we already imported torch and intialized cuda. Therefore, restart the the kernel and execute the cell again. ``` code import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # order devices by bus id os.environ["CUDA_VISIBLE_DEVICES"]="0,2" # only make device 0 visible ``` Now let's check how many devices we can see: ``` code import torch print("GPUs available: ", torch.cuda.device_count()) for device in range(torch.cuda.device_count()): print("Device",device, ":", torch.cuda.get_device_name(device=device)) ``` 2.) One can set the cuda device manually. ``` code torch.cuda.set_device(1) # make cuda device nr. 1 the current device print(torch.cuda.get_device_name(device=None)) ``` 3.) But better practice would be to embed your code into a cuda device context ``` code with torch.cuda.device(0): # context manager for specific cuda device # your code here print(torch.cuda.get_device_name(device=None)) ``` Alternatively, on can also copy the tensors to a specific device: ``` code x = torch.ones((3,3)) x_on_1 = x.to("cuda:0") x_on_2 = x.to("cuda:1") print(x_on_1.device) print(x_on_2.device) ``` Often times, in various tutorials on the internet, you can find the following: ``` code torch.device("cuda" if torch.cuda.is_available() else "cpu") x = x.to(device) print(x.device) ``` This way on can create CUDA agnostic code, that works on both, machines with and without GPU. However, only use this if you have made your reserved GPU explicitly visible and hid the rest. Otherwise this will automatically select GPU 0 as your CUDA device. ### Parallelize on multiple GPUs Parallellizing training on multiple GPUs is in most cases a one-liner. PyTorch comes with torch.nn.Parallel that makes it easy to split batches across GPUs. Essentially, the model gets copied to each GPU and receives part of the minibatch to process. ``` code net = torch.nn.DataParallel(net, device_ids=[0,1]) ```