PyTorch
Datasets: —need len and get_item
Dataloaders —collate_fn defines how the different data examples should be turned into a batch
Backwards: —fills “grad” field of every tensor that requires it
Zero_grad —turns “grad” field of every tensor that requires it to 0
optimizer.step() —the optimizer has a bunch of parameters stored in it, and it looks at the gradient of all the parameters and then does a backward step
Use register_buffer to add a desired tensor to the model, so it gets moved to the right device. persistent=False will make it not part of the state_dict.
Memory
- Pre-allocating tensors saves memory instead of appending them to a list then concatenating. This is because torch.cat requires a sudden allocation of a lot of contiguous memory, AND lots of appending, while pre-allocating just requires the former.
Learning Rate Schedulers
Learning rate schedulers are sometimes recursive, like cosineLR.
In Place Operations
- Be careful with in-place operations, for instance x = x + relu(x, inplace=True)
- Will zero out the negative parts of both terms, since that happens before addition.
Useful Operations
- torch.split splits tensors to the specified size.
- torch.chunk splits tensors into a desired number of chunks
- unbid removes a tensor dimension and returns a tuple of slices
- To generate a boolean mask for an operation that varies based on batches, we can create a tensor of scores then use >= and <= to unmask certain elements.
- torch.expand - you can actually expand to a larger number of dimensions, with new dimensions being added to the front.
For instance, you can do: A = torch.arange(80).reshape(2, 2, 2, 10, 1, 1)
A = A.expand(69, 49, 27, -1, -1, -1, -1, -1, -1)
BE CAREFUL - if you overwrite part of the expanded vector, you overwrite everything it was expanded to.
No Grad
- Inference mode is faster, but can’t mutate tensors
- can also use with torch.grad_enabled()
Datasets
- Map dataset load everything
- Iterables define a way to iterate through the dataset.
- convention is return a tuple of things per batch
Dataloaders
- Pin memory only allocates when the dataloader is called, apparently
- persistent workers keeps dataloader workers around, use for train, not valid
Last Reviewed: 4/30/25