Pytorch learning
Published:
The content of this blog post is based on the materials from UMich EECS 498-007
Tensor Basics
Creating Tensors
using lists
a = torch.tensor([[1,2],[3,4]], dtype = torch.int32) # 32 bit integer
using constructors
a = torch.zeros(2, 3) b = torch.ones(3, 4, dtype = torch.uint8) c = torch.eye(3) 3*3 identity matrix d = torch.rand(5, 6) e = torch.full((4, 5), 3.14) f = torch.arange(start = 1, end = 10, step = 2, dtype = torch.float64) # [1, 10)
Indexing
slice indexing
1-D
a = [1,2,3,4,5,6] a[1:3] # [2,3] a[1:] # [2,3,4,5,6] a[:-1] # [1,2,3,4,5] a[0:5:2] # [1,3,5] start:stop:step
2-D
a = [[1,2,3,4], [5,6,7,8], [9,10,11,12]] a[1:2,:] # [[5,6,7,8]] a[1,:] # [5,6,7,8]
[!note]
There are two common ways to access a single row or column of a tensor: using an integer will reduce the rank by one, and using a length-one slice will keep the same rank.
Slicing a tensor returns a view into the same data, so modifying it will also modify the original tensor. To avoid this, you can use the
clone()
method to make a copy of a tensor.
integer tensor indexing
# [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
idx = [0, 0, 2, 1, 1] # index arrays can be Python lists of integers
print(a[idx])
# tensor([[ 1, 2, 3, 4],
# [ 1, 2, 3, 4],
# [ 9, 10, 11, 12],
# [ 5, 6, 7, 8],
# [ 5, 6, 7, 8]])
[!note]
More generally, given index arrays
idx0
andidx1
withN
elements each,a[idx0, idx1]
is equivalent to:torch.tensor([ a[idx0[0], idx1[0]], a[idx0[1], idx1[1]], ..., a[idx0[N - 1], idx1[N - 1]] ])
boolean tensor indexing
In PyTorch, we use tensors of dtype
torch.bool
to hold boolean masks.a = torch.tensor([[1, 2], [3, 4], [5, 6]]) mask = (a > 3) print('\nMask tensor:') print(mask) # tensor([[False, False], # [False, True], # [True, True]]) print(a[mask]) # tensor([[0, 0], # [0, 4], # [5, 6]])
Reshaping Operations
View
.view()
: returns a new tensor with the same number of elements as its input, but with a different shape.
x0 = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])
# Flatten x0 into a rank 1 vector of shape (8,)
x1 = x0.view(8)
# Convert x1 to a rank 2 "row vector" of shape (1, 8)
x2 = x1.view(1, 8)
# Convert x1 to a rank 2 "column vector" of shape (8, 1)
x3 = x1.view(8, 1)
# Convert x1 to a rank 3 tensor of shape (2, 2, 2):
x4 = x1.view(2, 2, 2)
[!note]
calls to
.view()
may include a single -1 argument; this puts enough elements on that dimension so that the output has the same number of elements as the input.x5 = x1.view(2, -1, 1) # 2*4*1
a tensor returned by
.view()
shares the same data as the input, so changes to one will affect the other
Swapping axes
[!warning]
You cannot transpose matrices with
.view()
For other types of reshape operations, you usually need to use a function that can swap axes of a tensor.
The simplest such function is
.t()
x.t() torch.t(x)
For tensors with more than two dimensions, we can use the function
torch.transpose
# Swap axes 1 and 2 x1 = x0.transpose(1, 2)
If you want to swap multiple axes at the same time, you can use
torch.permute
# Permute axes; the argument (1, 2, 0) means: # - Make the old dimension 1 appear at dimension 0; # - Make the old dimension 2 appear at dimension 1; # - Make the old dimension 0 appear at dimension 2 # This results in a tensor of shape (3, 4, 2) x2 = x0.permute(1, 2, 0)
[!tip]
Contiguous tensors
Some combinations of reshaping operations will fail with cryptic errors. You can typically overcome these sorts of errors by either by calling
.contiguous()
Tensor Operations
Elementwise operations
# Elementwise sum; all give the same result
x + y
torch.add(x, y)
x.add(y)
# Elementwise difference
x - y
torch.sub(x, y)
x.sub(y)
# Elementwise product
x * y
torch.mul(x, y)
x.mul(y)
# Elementwise division
x / y
torch.div(x, y)
x.div(y)
# Elementwise power
x ** y
torch.pow(x, y)
x.pow(y)
# other standard math functions
torch.sqrt(x)
x.sqrt()
torch.sin(x)
x.sin()
Reduction operations
sum
x = torch.tensor([[1,2,3],[4,5,6]]) x.sum() torch.sum(x) # 21 x.sum(dim = 0) # [5,7,9] x.sum(dim = 1) # [6,15]
[!tip]
After summing with
dim=d
, the dimension at indexd
of the input is *eliminated from the shape of the output tensor:Other useful reduction operations include
mean
,min
, andmax
[!note]
If you pass
keepdim=True
to a reduction operation, the specified dimension will not be removed; the output tensor will instead have a shape of 1 in that dimension.
Matrix operations
torch.dot
: Computes inner product of vectorstorch.mm
: Computes matrix-matrix productstorch.mv
: Computes matrix-vector productstorch.addmm
/torch.addmv
: Computes matrix-matrix and matrix-vector multiplications plus a biastorch.bmm
/torch.baddmm
: Batched versions oftorch.mm
andtorch.addmm
, respectively[!note]
torch.bmm(x,y) -> Tensor
If
x
is a (b×n×m) tensor,y
is a (b×m×p) tensor,out
will be a (b×n×p) tensor.$out_i=x_i @ y_i$
torch.matmul
: General matrix product that performs different operations depending on the rank of the inputs.- broadcasting is supported
Broadcasting
Suppose that we want to add a constant vector to each row of a tensor:
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10,11,12]]) v = torch.tensor([1, 0, 1]) y = torch.zeros_like(x) for i in range(4): y[i, :] = x[i, :] + v
This works; however when the tensor x
is very large, computing an explicit loop in Python could be slow. We could implement this approach like this:
vv = v.repeat((4, 1)) # Stack 4 copies of v on top of each other
print(vv) # Prints "[[1 0 1]
# [1 0 1]
# [1 0 1]
# [1 0 1]]"
PyTorch broadcasting allows us to perform this computation without actually creating multiple copies of v
.
x = torch.tensor([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = torch.tensor([1, 0, 1])
y = x + v # Add v to each row of x using broadcasting
[!note]
Broadcasting two tensors together follows these rules:
- If the tensors do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
- The two tensors are said to be compatible in a dimension if they have the same size in the dimension, or if one of the tensors has size 1 in that dimension.
- The tensors can be broadcast together if they are compatible in all dimensions.
- After broadcasting, each tensor behaves as if it had shape equal to the elementwise maximum of shapes of the two input tensors.
- In any dimension where one tensor had size 1 and the other tensor had size greater than 1, the first tensor behaves as if it were copied along that dimension
All elementwise functions support broadcasting.
Some non-elementwise functions (such as linear algebra routines) also support broadcasting;
torch.mm
does not support broadcasting, buttorch.matmul
does.
Out-of-place vs in-place operators
Out-of-place operators: return a new tensor.
In-place operators: modify and return the input tensor.
For example:
z = x.add(y) # Same as z = x + y or z = torch.add(x, y)
x.add_(y) # Same as x += y or torch.add(x, y, out=x)