Pytorch learning

7 minute read

Published: February 21, 2025

The content of this blog post is based on the materials from UMich EECS 498-007

Tensor Basics

Creating Tensors

using lists

a = torch.tensor([[1,2],[3,4]], dtype = torch.int32) # 32 bit integer

using constructors

a = torch.zeros(2, 3)
b = torch.ones(3, 4, dtype = torch.uint8)
c = torch.eye(3) 3*3 identity matrix
d = torch.rand(5, 6)
e = torch.full((4, 5), 3.14)
f = torch.arange(start = 1, end = 10, step = 2, dtype = torch.float64) # [1, 10)

Indexing

slice indexing
- 1-D
```
a = [1,2,3,4,5,6]
a[1:3] # [2,3]
a[1:] # [2,3,4,5,6]
a[:-1] # [1,2,3,4,5]
a[0:5:2] # [1,3,5] start:stop:step
```
- 2-D
```
a = [[1,2,3,4], [5,6,7,8], [9,10,11,12]]
a[1:2,:] 
# [[5,6,7,8]]
a[1,:]
# [5,6,7,8]
```
[!note]
- There are two common ways to access a single row or column of a tensor: using an integer will reduce the rank by one, and using a length-one slice will keep the same rank.
- Slicing a tensor returns a view into the same data, so modifying it will also modify the original tensor. To avoid this, you can use the clone() method to make a copy of a tensor.
integer tensor indexing

# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
idx = [0, 0, 2, 1, 1]  # index arrays can be Python lists of integers
print(a[idx])
# tensor([[ 1,  2,  3,  4],
#         [ 1,  2,  3,  4],
#         [ 9, 10, 11, 12],
#         [ 5,  6,  7,  8],
#         [ 5,  6,  7,  8]])

[!note]
More generally, given index arrays idx0 and idx1 with N elements each, a[idx0, idx1] is equivalent to:
torch.tensor([

 a[idx0[0], idx1[0]],

 a[idx0[1], idx1[1]],

 ...,

 a[idx0[N - 1], idx1[N - 1]]

])

boolean tensor indexing

In PyTorch, we use tensors of dtype torch.bool to hold boolean masks.

a = torch.tensor([[1, 2], [3, 4], [5, 6]])
     
mask = (a > 3)
print('\nMask tensor:')
print(mask)
     
# tensor([[False, False],
#        [False,  True],
#        [True,  True]])
     
print(a[mask])
     
# tensor([[0, 0],
#        [0, 4],
#        [5, 6]])

Reshaping Operations

View

.view(): returns a new tensor with the same number of elements as its input, but with a different shape.

x0 = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])

# Flatten x0 into a rank 1 vector of shape (8,)
x1 = x0.view(8)

# Convert x1 to a rank 2 "row vector" of shape (1, 8)
x2 = x1.view(1, 8)

# Convert x1 to a rank 2 "column vector" of shape (8, 1)
x3 = x1.view(8, 1)

# Convert x1 to a rank 3 tensor of shape (2, 2, 2):
x4 = x1.view(2, 2, 2)

[!note]
calls to .view() may include a single -1 argument; this puts enough elements on that dimension so that the output has the same number of elements as the input.
  x5 = x1.view(2, -1, 1) # 2*4*1
a tensor returned by .view() shares the same data as the input, so changes to one will affect the other

Swapping axes

[!warning]
You cannot transpose matrices with .view()

For other types of reshape operations, you usually need to use a function that can swap axes of a tensor.

The simplest such function is .t()
```
  x.t()
  torch.t(x)
```
For tensors with more than two dimensions, we can use the function torch.transpose
```
  # Swap axes 1 and 2
  x1 = x0.transpose(1, 2)
```

If you want to swap multiple axes at the same time, you can use torch.permute

  # Permute axes; the argument (1, 2, 0) means:
  # - Make the old dimension 1 appear at dimension 0;
  # - Make the old dimension 2 appear at dimension 1;
  # - Make the old dimension 0 appear at dimension 2
  # This results in a tensor of shape (3, 4, 2)
  x2 = x0.permute(1, 2, 0)

[!tip]
Contiguous tensors
Some combinations of reshaping operations will fail with cryptic errors. You can typically overcome these sorts of errors by either by calling .contiguous()

Tensor Operations

Elementwise operations

# Elementwise sum; all give the same result
x + y
torch.add(x, y)
x.add(y)

# Elementwise difference
x - y
torch.sub(x, y)
x.sub(y)

# Elementwise product
x * y
torch.mul(x, y)
x.mul(y)

# Elementwise division
x / y
torch.div(x, y)
x.div(y)

# Elementwise power
x ** y
torch.pow(x, y)
x.pow(y)

# other standard math functions
torch.sqrt(x)
x.sqrt()
torch.sin(x)
x.sin()

Reduction operations

sum

x = torch.tensor([[1,2,3],[4,5,6]])
  
x.sum()
torch.sum(x)
# 21
  
x.sum(dim = 0)
# [5,7,9]
  
x.sum(dim = 1)
# [6,15]

[!tip]
After summing with dim=d, the dimension at index d of the input is *eliminated from the shape of the output tensor:

Other useful reduction operations include mean, min, and max

[!note]
If you pass keepdim=True to a reduction operation, the specified dimension will not be removed; the output tensor will instead have a shape of 1 in that dimension.

Matrix operations

torch.dot: Computes inner product of vectors
torch.mm: Computes matrix-matrix products
torch.mv: Computes matrix-vector products
torch.addmm / torch.addmv: Computes matrix-matrix and matrix-vector multiplications plus a bias
torch.bmm / torch.baddmm: Batched versions of torch.mm and torch.addmm, respectively
[!note]
torch.bmm(x,y) -> Tensor
If x is a (b×n×m) tensor, y is a (b×m×p) tensor, out will be a (b×n×p) tensor.
$out_i=x_i @ y_i$
torch.matmul: General matrix product that performs different operations depending on the rank of the inputs.
- broadcasting is supported

Broadcasting

Suppose that we want to add a constant vector to each row of a tensor:

x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
v = torch.tensor([1, 0, 1])
y = torch.zeros_like(x)   

for i in range(4):
    y[i, :] = x[i, :] + v

This works; however when the tensor x is very large, computing an explicit loop in Python could be slow. We could implement this approach like this:

vv = v.repeat((4, 1))  # Stack 4 copies of v on top of each other
print(vv)              # Prints "[[1 0 1]
                       #          [1 0 1]
                       #          [1 0 1]
                       #          [1 0 1]]"

PyTorch broadcasting allows us to perform this computation without actually creating multiple copies of v.

x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = torch.tensor([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting

[!note]
Broadcasting two tensors together follows these rules:
If the tensors do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
The two tensors are said to be compatible in a dimension if they have the same size in the dimension, or if one of the tensors has size 1 in that dimension.
The tensors can be broadcast together if they are compatible in all dimensions.
After broadcasting, each tensor behaves as if it had shape equal to the elementwise maximum of shapes of the two input tensors.
In any dimension where one tensor had size 1 and the other tensor had size greater than 1, the first tensor behaves as if it were copied along that dimension

All elementwise functions support broadcasting.
Some non-elementwise functions (such as linear algebra routines) also support broadcasting;
torch.mm does not support broadcasting, buttorch.matmuldoes.

Out-of-place vs in-place operators

Out-of-place operators: return a new tensor.
In-place operators: modify and return the input tensor.

For example:

z = x.add(y)  # Same as z = x + y or z = torch.add(x, y)
x.add_(y)  # Same as x += y or torch.add(x, y, out=x)

Share on

X (formerly Twitter) Facebook LinkedIn

Tristin