Other Topics Relevant to Indexing

There is a great deal of functionality in NumPy, and most of it is out-of-scope for this guide. However, a few additional features are useful to understand in order to fully utilize indexing to its fullest potential, or as important caveats to be aware of when doing indexing operations.

Broadcasting

Broadcasting is a powerful abstraction that applies to all operations in NumPy. It allows arrays with mismatched shapes to be combined together as if one or more of their dimensions were simply repeated the appropriate number of times.

Normally, when we perform an operation on two arrays with the same shape, it does what we’d expect, i.e., the operation is applied to each corresponding element of each array. For example, if x and y are both (2, 2) arrays, x + y results in a (2, 2) array with the sum of the corresponding elements.

>>> import numpy as np
>>> a = np.array([[1, 2],
...               [3, 4]])
>>> b = np.array([[101, 102],
...               [103, 104]])
>>> a + b
array([[102, 104],
       [106, 108]])

However, you may have noticed that x and y don’t always have to be two arrays with the same shape. For example, you can add a single scalar element to an array, and it will add it to each element.

>>> a + 1
array([[2, 3],
       [4, 5]])

In the above example, we can think of the scalar 1 as a shape () array, whereas x is a shape (2, 2) array. Thus, x and 1 do not have the same shape, but x + 1 is allowed via repeating 1 across every element of x. This means NumPy is taking 1 and treating it as if it were the shape (2, 2) array [[1, 1], [1, 1]].

Broadcasting is a generalization of this behavior. Specifically, instead of repeating just a single number into an array, we can repeat just some dimensions of an array into a bigger array. For example, here we multiply x, a shape (3, 2) array, with y, a shape (2,) array. y is virtually repeated into a shape (3, 2) array with each element of the last dimension repeated 3 times.

>>> x = np.array([[1, 2],
...               [3, 4],
...               [5, 6]])
>>> x.shape
(3, 2)
>>> y = np.array([0, 2])
>>> x*y
array([[ 0,  4],
       [ 0,  8],
       [ 0, 12]])

We can see how broadcasting works using np.broadcast_to

>>> np.broadcast_to(y, x.shape)
array([[0, 2],
       [0, 2],
       [0, 2]])

This is what the array y looks like before it is combined with x. Although note that broadcasting is implemented efficiently, so that the “repeated” elements actually refer to the same objects in memory. See Views vs. Copies and Strides below.

Broadcasting always happens automatically in NumPy whenever two arrays with different shapes are combined using any function or operator, assuming those shapes are broadcast compatible. The rule for broadcast compatibility is that the shorter of the shapes are prepended with length 1 dimensions so that they have the same number of dimensions. Then any dimensions that are size 1 in a shape are replaced with the corresponding size in the other shape. The other non-1 sizes must be equal or broadcasting is not allowed.

In the above example, we broadcast (3, 2) with (2,) by first extending (2,) to (1, 2) then broadcasting the size 1 dimension to the corresponding size in the other shape, 3, giving a broadcasted shape of (3, 2). In more advanced examples, both shapes may have broadcasted dimensions. For instance, (3, 1) can broadcast with (2,) to create (3, 2). The first shape repeats the first axis 2 times along the second axis, while the second shape inserts a new axis at the beginning that repeats 3 times.

We can think of (3, 2) as a “stack” of 3 shape (2,) arrays. Just as the scalar 1 got repeated to match the full shape of a above, the shape (2,) array y gets repeated into a (3,) “stack” so it matches the shape of x.

Additionally, size 1 dimensions are a signal to NumPy that that dimension is allowed to be repeated, or broadcasted, when used with another array where the other dimensions match.

It can be useful to think of broadcasting as repeating “stacks” of smaller arrays in this way. The automatic prepending rule lets you automatically treat array “stacking” along the last dimension, which is the most common, but the size 1 dimension rule allows these stacks to be along any dimensions of the array, not just the last ones.

When it comes to indexing, one of the most useful types of index for use with broadcasting is newaxis, which lets you easily insert size 1 dimensions into an array to make the broadcastable in a specific way. See Where newaxis is Used.

See the NumPy documentation for more examples of broadcasting.

Views vs. Copies

There is an important distinction between basic indices (i.e., integers, slices, ellipses, newaxis) and advanced indices (i.e., integer array indices and boolean array indices) in NumPy that must be noted in some situations. Specifically, basic indices always create a view into an array[1], while advanced indices always create a copy of the underlying array. Tuple indices (i.e., multidimensional indices) will create a view if they do not contain an advanced index and a copy if they do.

A view is a special type of array whose data points to another array. This means that if you mutate the data in one array, the other array will also have that data mutated as well. For example,

>>> a = np.arange(24).reshape((3, 2, 4))
>>> b = a[:, 0] # b is a view of a
>>> a
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> b[:] = 0 # Mutating b also mutates a
>>> a
array([[[ 0,  0,  0,  0],
        [ 4,  5,  6,  7]],

       [[ 0,  0,  0,  0],
        [12, 13, 14, 15]],

       [[ 0,  0,  0,  0],
        [20, 21, 22, 23]]])

Mutating b also changed a, because both arrays point to the same memory.

Note that this behavior is exactly the opposite of the way Python lists work. With Python lists, a[:] is a shorthand to copy a. But with NumPy, a[:] creates a view into a (to copy an array with NumPy, use a.copy()). Python lists do not have a notion of views.

>>> a = [1, 2, 3] # list
>>> b = a[:] # a copy of a
>>> b[0] = 0 # Modifies b but not a
>>> b
[0, 2, 3]
>>> a
[1, 2, 3]
>>> a = np.array([1, 2, 3]) # NumPy array
>>> b = a[:] # A view of a
>>> b[0] = 0 # Modifies both b and a
>>> b
array([0, 2, 3])
>>> a
array([0, 2, 3])
>>> c = a.copy() # A copy of a
>>> c[0] = -1 # Only modifies c
>>> c
array([-1, 2, 3])
>>> a
array([0, 2, 3])

Views don’t just come from indexing. For instance, when you reshape an array, that also creates a view.

>>> a = np.arange(24)
>>> b = a.reshape((3, 2, 4))
>>> b[0] = 0
>>> b
array([[[ 0,  0,  0,  0],
        [ 0,  0,  0,  0]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> a
array([ 0,  0,  0,  0,  0,  0,  0,  0,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

Many other operations also create views, for example np.transpose, a.T, np.ravel, broadcast, and a.view.[2]

To check if an array is a view, check a.base. It will be None if it is not a view and will point to the base array otherwise. A view of a view will have the same base as the original array.

>>> a = np.arange(24)
>>> b = a[:]
>>> print(a.base) # a is not a view
None
>>> b.base is a # b is a view of a
True
>>> c = b[::2]
>>> c.base is a # c is a further view of a
True

In contrast, an advanced index will always create a copy (even if it would be possible to represent it as a view). This includes any tuple index (i.e., multidimensional index) that contains at least one array index.

>>> a = np.arange(10)
>>> b = a[::4]
>>> c = a[[0, 4, 8]]
>>> b
array([0, 4, 8])
>>> c
array([0, 4, 8])
>>> b.base is a # b is a view of a
True
>>> print(c.base) # c is not (it is a copy)
None

Whether an array is a view or a copy matters for two reasons:

  1. If you ever mutate the array, if it is a view, it will also mutate the data in the base array, as shown above. Be aware that views affect mutations in both directions: if a is a view, mutating it will also mutate whichever array it is a view on, but conversely, even if a is not a view, mutating it will modify any other arrays which are views into a. And while you can check if a is a view by looking at a.base, there is no easy way to check if a has other views pointing at it. The only way to know is to analyze the program and check any array which is created from a.

    It’s best to minimize mutations in the presence of views, or to restrict them to a controlled part of the code, to avoid unexpected “action at a distance” bugs.

    Note that you can always ensure that a is a new array that isn’t a view and doesn’t have any views pointing to it by copying it, using a.copy().

  2. Even if you don’t mutate data, views are important because they are more efficient. A view is a relatively cheap thing to make, even if the array is large. It also saves on memory usage.

    For example, here we have an array with about 800 kB of data, and it takes over 200x longer to copy it than to create a view (using IPython’s %timeit):

    In [1]: import numpy as np
    
    In [2]: a = np.arange(100000, dtype=np.float64)
    
    In [3]: %timeit a.copy()
    25.9 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
    In [4]: %timeit a[:]
    127 ns ± 1.62 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
    

Strides

The reason so many types of indexing into arrays can be represented as a view without creating a copy is that NumPy arrays aren’t merely a pointer to a blob of memory. They are a pointer along with something called strides. The strides tell NumPy how many bytes to skip in memory along each axis to get to the next element of the array along that dimension. This along with the memory offset (the address in physical memory of the first byte of data), the shape, and the itemsize (the number of bytes each element takes up, which depends on the dtype) exactly determines how the corresponding memory is organized into an array. For example, let’s start with a flat 1-dimensional array with 24 elements whose itemsize is 8 (an int64 takes up 8 bytes). Its strides is (8,):

>>> a = np.arange(24)
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(24,)

Now let’s create a view b, which is a reshaped to shape (3, 2, 4). b uses the exact same memory as a (which is just 0 1 2 ... 23). Its itemsize is the same because it has the same dtype, but its strides and shape are different.

>>> b = a.reshape((3, 2, 4))
>>> b
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> b.itemsize
8
>>> b.strides
(64, 32, 8)
>>> b.shape
(3, 2, 4)

This tells NumPy that to get the next element in the first dimension of b, it needs to skip 64 bytes. That’s because the first dimension contains 2*4=8 items each, corresponding to the sizes of the second and third dimensions, and each item is 8 bytes, so 8*8=64. For example, the next element in the first dimension after 0 (index (0, 0, 0)) is 8 (index (1, 0, 0)), which sits exactly 64 bytes after it in memory. Similarly, to get the next element in the second dimension, it should skip 32 bytes (4 elements).

The memory offset of an array can be accessed using a.ctypes.data. This is the address in physical memory where the data (0 1 2 ... 23) lives. a and b have the same memory offset because they both start with the same first element:

>>> a.ctypes.data 
105553170825216
>>> b.ctypes.data 
105553170825216

When we slice off the beginning of the array, we can see that all this does is move the memory offset forward, and adjust the shape correspondingly.

>>> a[2:].ctypes.data 
105553170825232
>>> a[2:].shape
(22,)
>>> a[2:].strides # the strides are the same
(8,)

Specifically, it moves it by 2*8 (where remember 8 is a.itemsize):

>>> a[2:].ctypes.data - a.ctypes.data
16

Here the strides are the same. Similarly, if we slice off the end of the array, all it needs to do is adjust the shape. The memory offset is the same, because it still starts at the same place in memory.

>>> a[:2].ctypes.data # the memory offset is the same 
105553170825216
>>> a[:2].shape
(2,)
>>> a[:2].strides # the strides are the same
(8,)

If we instead slice with a step, this adjusts the strides. For instance a[::2] will double the strides, making it skip every other element (but the offset will be again unchanged because it still starts at the first element of a):

>>> a[::2].strides
(16,)
>>> a[::2].ctypes.data # the memory offset is the same 
105553170825216
>>> a[::2].shape
(12,)

If we use a negative step, the strides will become negative. This will cause NumPy to work backwards in memory as it accesses the elements of the array. The memory offset also changes here so that it starts with the last element of a:

>>> a[::-2].strides
(-16,)
>>> a[::-2].ctypes.data 
105553170825400

From this, you are hopefully convinced that every possible slice is just a manipulation of the memory offset, shape, and strides. It’s not hard to see that this also applies to integer indices (which just removes the stride for the corresponding axis, adjusting the shape and memory offset accordingly) and newaxis (which just adds 0 to the strides):

>>> b.strides
(64, 32, 8)
>>> b[2].strides
(32, 8)
>>> b[np.newaxis].strides
(0, 64, 32, 8)

This is why basic indexing always produces a view: because it can always be represented as a manipulation of the strides (plus shape and offset).

Another important fact about strides is that broadcasting can be achieved by manipulating the strides, namely by using a 0 stride to repeat the same data along a given axis.

>>> c = a.reshape((1, 12, 2))
>>> d = np.broadcast_to(c, (5, 12, 2))
>>> c.shape
(1, 12, 2)
>>> c.strides
(192, 16, 8)
>>> d.shape
(5, 12, 2)
>>> d.strides
(0, 16, 8)
>>> c.base is a
True
>>> d.base is a
True

If you’ve ever used broadcast_to(), you might have noticed that it returns a read-only array. That’s because writing into it would not do what you’d expect, since the repeated elements literally refer to the same memory.

This shows why broadcasting is so powerful: it can be done without any actual copy of the data. When you perform an operation on two arrays, the broadcasting is implicit, but even explicitly creating a broadcasted array is cheap, because all it does is create a view with different strides.

You can also manually create a view with any strides you want using stride tricks. Most views that you would want to create can be made with a combination of indexing, reshape, broadcast_to, and transpose, but it’s possible to use strides to represent some things which are not so easy to do with just these functions, for example, sliding windows and convolutions. However, if you do use stride tricks, be careful of the caveats (see the notes section of the as_strided docs).

C vs. Fortran Ordering

NumPy has an internal distinction between C order and Fortran order.[3] C-ordered arrays are stored in memory such that the last axis varies the fastest. For example, if a has 3 dimensions, then its elements are stored in memory like a[0, 0, 0], a[0, 0, 1], a[0, 0, 2], ..., a[0, 1, 0], a[0, 1, 1], a[0, 1, 2], .... Fortran ordering is the opposite: the elements are stored in memory so that the first axis varies fastest, like a[0, 0, 0], a[1, 0, 0], a[2, 0, 0], ..., a[0, 1, 0], a[1, 1, 0], a[2, 1, 0] ....

The most important thing to note about C and Fortran ordering for our purposes is that

the internal ordering of an array does not change any indexing semantics.

The same index will select the same elements on a regardless of whether it uses C or Fortran ordering internally.

More generally, the actual memory layout of an array has no bearing on indexing semantics. Indexing operates on the logical abstraction of the array as presented to the user, even if the true memory doesn’t look anything like that because the array is a view or has some other layout due to stride tricks.

In particular, this also applies to boolean array indices. A boolean mask always selects the elements in C order, even if the underlying arrays use Fortran ordering.

>>> a = np.arange(9).reshape((3, 3))
>>> a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> a_f = np.asarray(a, order='F')
>>> a_f # a_f looks the same as a, but the internal memory is ordered differently
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> idx = np.array([
...     [False,  True, False],
...     [ True,  True, False],
...     [False, False, False]])
>>> idx_f = np.asarray(idx, order='F') # It doesn't matter if the index array is Fortran-ordered either
>>> a[idx] # These are all the same
array([1, 3, 4])
>>> a_f[idx]
array([1, 3, 4])
>>> a[idx_f]
array([1, 3, 4])
>>> a_f[idx_f]
array([1, 3, 4])

Aside

If you read the previous section on strides, you may have guessed that the difference between C-ordered and Fortran-ordered arrays is a difference of…strides!

>>> a.strides
(24, 8)
>>> a_f.strides
(8, 24)

In a C-ordered array the strides decrease and in a Fortran-ordered array they increase, because a smaller stride corresponds to “faster varying”.

What ordering does affect is the performance of certain operations. In particular, the ordering determines whether it is more optimal to index along the first or last axes of an array. For example, a[0] selects the first subarray along the first axis (recall that a[0] is a view into a, so it references the exact same memory as a). For a C-ordered array, which is the default, this subarray is contiguous in memory. This is because the indices on the last axes vary the fastest (i.e., are next to each other in memory), so selecting a subarray of the first axis picks elements which are still contiguous. Conversely, for a Fortran-ordered array, a[0] is not contiguous, but a[..., 0] is.

>>> a[0].data.contiguous
True
>>> a_f[0].data.contiguous
False
>>> a_f[..., 0].data.contiguous
True

Operating on contiguous memory allows the CPU to place the entire memory block in the cache at once, and is more performant as a result. The performance difference won’t be noticeable for our small example a above, as it is small enough to fit entirely in the cache, but it becomes significant for larger arrays. Compare the time to sum along a[0] or a[..., 0] for a 3-dimensional array a with a million elements using C and Fortran ordering (using IPython’s %timeit):

In [1]: import numpy as np

In [2]: a = np.ones((100, 100, 100)) # a has C order (the default)

In [3]: %timeit np.sum(a[0])
8.57 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit np.sum(a[..., 0])
24.2 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: a_f = np.asarray(a, order='F')

In [6]: %timeit np.sum(a_f[0])
26.3 µs ± 952 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit np.sum(a_f[..., 0])
8.6 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Summing along contiguous memory (a[0] for C ordering and a[..., 0] for Fortran ordering) is about 3x faster.

NumPy indexing semantics generally favor using C ordering, as it does not require an ellipsis to select contiguous subarrays. C ordering also matches the list-of-lists intuition of an array, since an array like [[0, 1], [2, 3]] is stored in memory as literally 0, 1, 2, 3 with C ordering. It also aligns well with NumPy’s broadcasting rules, where broadcasted dimensions are prepended by default, allowing one to think of an array as a “stack” of contiguous subarrays.

C ordering is the default in NumPy when creating arrays with functions like asarray, ones, arange, and so on. One typically only switches to Fortran ordering when calling certain Fortran codes, or when creating an array from another memory source that produces Fortran-ordered data.

Regardless of which ordering you are using, it is worth structuring your data so that operations are done on contiguous memory when possible.

Footnotes