Mason Wang

PyTorch

Datasets: —need len and get_item

Dataloaders —collate_fn defines how the different data examples should be turned into a batch

Backwards: —fills “grad” field of every tensor that requires it

Zero_grad —turns “grad” field of every tensor that requires it to 0

optimizer.step() —the optimizer has a bunch of parameters stored in it, and it looks at the gradient of all the parameters and then does a backward step

Use register_buffer to add a desired tensor to the model, so it gets moved to the right device. persistent=False will make it not part of the state_dict.

Memory

Learning Rate Schedulers

Learning rate schedulers are sometimes recursive, like cosineLR.

In Place Operations

Useful Operations

For instance, you can do: A = torch.arange(80).reshape(2, 2, 2, 10, 1, 1)

A = A.expand(69, 49, 27, -1, -1, -1, -1, -1, -1)

BE CAREFUL - if you overwrite part of the expanded vector, you overwrite everything it was expanded to.

No Grad

Datasets

Dataloaders

Last Reviewed: 4/30/25