Integer Array Indices¶
Note
In this section and the next, do not confuse the array being indexed with the array that is the index. The former can be anything and have any dtype. It is only the latter that is restricted to being integer or boolean.
Integer array indices are very powerful. Using them, you can effectively construct arbitrary new arrays consisting of elements from the original indexed array.
To start, let’s consider a simple one-dimensional array:
>>> import numpy as np
>>> a = np.array([100, 101, 102, 103])
Now suppose we wish to construct the following 2-D array from this array using only indexing operations:
[[ 100, 102, 100 ],
[ 103, 100, 102 ]]
It should hopefully be clear that there’s no way we could possibly construct
this array as a[idx]
using only the index types we’ve discussed so far. For
one thing, integer indices, slices,
ellipses, and newaxes all only select
elements of the array in order (or possibly reversed order
for slices), whereas this array has elements completely shuffled from a
, and
some are even repeated.
However, we could “cheat” a bit here and do something like
>>> new_array = np.array([[a[0], a[2], a[0]],
... [a[3], a[0], a[2]]])
>>> new_array
array([[100, 102, 100],
[103, 100, 102]])
This is the array we want. We sort of constructed it using only indexing
operations, but we didn’t actually do a[idx]
for some index idx
. Instead,
we just listed the indices of each individual element.
An integer array index is essentially this “cheating” method, but as a single
index. Instead of listing out a[0]
, a[2]
, and so on, we just create a
single integer array with those integer indices:
>>> idx = np.array([[0, 2, 0],
... [3, 0, 2]])
If we then index a
with this array, it works just like new_array
above:
>>> a[idx]
array([[100, 102, 100],
[103, 100, 102]])
This is how integer array indices work:
An integer array index can construct arbitrary new arrays with elements from
a
, with the elements in any order and even repeated, simply by enumerating the integer index positions where each element of the new array comes from.
Note that a[idx]
above is not the same size as a
at all. a
has 4
elements and is 1-dimensional, whereas a[idx]
has 6 elements and is
2-dimensional. a[idx]
also contains some duplicate elements from a
, and
there are some elements which aren’t selected at all. Indeed, we could take
any integer array idx
of any shape, and as long as the elements are
between 0 and 3, a[idx]
would create a new array with the same shape as
idx
with corresponding elements selected from a
.
The shape of a
is (4,)
and the shape of a[idx]
is (2, 3)
, the same as the
shape of idx
. In general:
an integer array index
a[idx]
selects elements from the specified axis and replaces the selected dimension in the shape ofa
with the shape of the index arrayidx
.
For example, in a[idx].shape
, 4
is replaced with (2, 3)
. Consider what
happens when a
has more than one dimension:
>>> a = np.empty((3, 4))
>>> idx = np.zeros((2, 2), dtype=int)
>>> a[idx].shape # (3,) is replaced with (2, 2)
(2, 2, 4)
>>> a[:, idx].shape # Indexing the second dimension, (4,) is replaced with (2, 2)
(3, 2, 2)
Here a.shape
is (3, 4)
and idx.shape
is (2, 2)
. In a[idx].shape
, the
3
is replaced with (2, 2)
, giving (2, 2, 4)
, and in a[:, idx].shape
,
the 4
is replaced with (2, 2)
, giving (3, 2, 2)
.
A useful way to think about integer array indexing is that it generalizes
integer indexing. With integer indexing, we are
effectively indexing using a 0-dimensional integer array, that is, a single
integer. This always selects the corresponding element from the given axis and
removes the dimension. That is, it replaces that dimension in the shape with
()
(i.e., nothing), the “shape” of the integer index. The result of indexing
with an int
and a corresponding 0-D array is exactly the
same.[1]
>>> idx = np.asarray(0) # 0-D array
>>> idx.shape
()
>>> a = np.arange(12).reshape((3, 4))
>>> a[idx].shape # replaces (3,) with ()
(4,)
>>> a[:, idx].shape # replaces (4,) with ()
(3,)
>>> a[idx] # a[asarray(0)] is the exact same as a[0]
array([0, 1, 2, 3])
>>> a[0]
array([0, 1, 2, 3])
Note that even when the index array idx
has more than one dimension, an
integer array index still only selects elements from a single axis of a
. It
would appear that this limits the ability to arbitrarily shuffle elements of
a
using integer indexing. For instance, suppose we have the 2-D array
>>> a = [[100, 101, 102],
... [103, 104, 105]]
and we wanted use indexing to create the array [105, 100]
. Based on the
above examples, this might not seem possible, since the elements 105
and
100
are not in the same row or column of a
.
However, this is doable by providing multiple integer array indices:
When multiple integer array indices are provided, the elements of each index are selected correspondingly for that axis.
It’s perhaps most illustrative to show this as an example. Given the above
a
, we can produce the array [105, 100]
using
>>> a = np.array([[100, 101, 102],
... [103, 104, 105]])
>>> idx = (np.array([1, 0]), np.array([2, 0]))
>>> a[idx]
array([105, 100])
Let’s break this down. idx
is a tuple index with two arrays,
which are both the same shape. The first element of our desired result, 105
corresponds to index (1, 2)
in a
:
>>> a[1, 2]
np.int64(105)
So we write 1
in the first array and 2
in the second array. Similarly, the
next element, 100
corresponds to index (0, 0)
, so we write 0
in the
first array and 0
in the second. In general, the first array contains the
indices for the first axis, the second array contains the indices for the
second axis, and so on. If we were to
zip up our two index
arrays, we would get the set of indices for each corresponding element, (1, 2)
and (0, 0)
.
The resulting array has the same shape as our two index arrays. As before, this shape can be arbitrary. Suppose we want to create the array
[[[ 102, 103],
[ 102, 101]],
[[ 100, 105],
[ 102, 102]]]
Recall our array a
:
>>> a
array([[100, 101, 102],
[103, 104, 105]])
Noting the index for each element in our desired array, we get
>>> idx0 = np.array([[[0, 1], [0, 0]], [[0, 1], [0, 0]]])
>>> idx1 = np.array([[[2, 0], [2, 1]], [[0, 2], [2, 2]]])
>>> a[idx0, idx1]
array([[[102, 103],
[102, 101]],
[[100, 105],
[102, 102]]])
Again, reading across, the first element, 102
corresponds to index (0, 2)
,
the next element, 103
, corresponds to index (1, 0)
, and so on.
Use Cases¶
A common use case for integer array indexing is sampling. For example, to
sample \(k\) elements from a 1-D array of size \(n\) with replacement, we can
simply construct an a random integer index in the range \([0, n)\) with \(k\)
elements (see the
numpy.random.Generator.integers()
documentation):[2]
>>> k = 10
>>> a = np.array([100, 101, 102, 103]) # as above
>>> rng = np.random.default_rng(11) # Seeded so this example reproduces
>>> idx = rng.integers(0, a.size, k) # rng.integers() excludes the upper bound
>>> idx
array([0, 0, 3, 1, 2, 2, 2, 0, 1, 0])
>>> a[idx]
array([100, 100, 103, 101, 102, 102, 102, 100, 101, 100])
Another common use case of integer array indexing is to permute an array. An
array can be randomly permuted with
numpy.random.Generator.permutation()
. But what if we want
to permute two arrays with the same permutation? We can compute a permutation
index and apply it to both arrays. For a 1-D array a
of size \(n\), a
permutation index is just a permutation of the integer array index
np.arange(n)
, which itself is the identity
permutation on a
:
>>> a = np.array([100, 101, 102, 103]) # as above
>>> b = np.array([200, 201, 202, 203]) # another array
>>> identity = np.arange(a.size)
>>> a[identity] # arange by itself is the identity permutation index
array([100, 101, 102, 103])
>>> rng = np.random.default_rng(11) # Seeded so this example reproduces
>>> random_permutation = rng.permutation(identity)
>>> a[random_permutation]
array([103, 101, 100, 102])
>>> b[random_permutation] # The same permutation on b
array([203, 201, 200, 202])
Advanced Notes¶
The information above provides the basic gist of integer array indexing, but there are also many subtleties and advanced behaviors involved with them. The subsections here are included for completeness; however, if you are a beginner NumPy user, you may wish to skip them.
Negative Indices¶
Indices in the integer array can also be negative. Negative indices work the same as they do with integer indices.
Negative and nonnegative indices can be mixed arbitrarily.
>>> a = np.array([100, 101, 102, 103]) # as above
>>> idx = np.array([0, 1, -1])
>>> a[idx]
array([100, 101, 103])
If you want to convert an index containing negative indices into an index
without any negative indices, you can use the ndindex
reduce()
method with a shape
argument.
Python Lists¶
You can use a list of integers instead of an array to represent an integer array index.[3]
Using a list is useful when writing an array index by hand; however, in all other cases, using an actual array is preferable. In most real-world scenarios, an array index is constructed from some other array methods.
>>> a = np.array([100, 101, 102, 103]) # as above
>>> a[[0, 1, -1]]
array([100, 101, 103])
>>> idx = np.array([0, 1, -1])
>>> a[idx] # this is the same
array([100, 101, 103])
Bounds Checking¶
As with integer indices, integer array indexing uses bounds checking, with the same rule as integer indices.
If any entry in an integer array index is greater than
size - 1
or less than-size
, wheresize
is the size of the dimension being indexed, anIndexError
is raised.
>>> a = np.array([100, 101, 102, 103]) # as above
>>> a[[2, 3, 4]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 4 is out of bounds for axis 0 with size 4
>>> a[[-5, -4, -3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index -5 is out of bounds for axis 0 with size 4
Broadcasting¶
The integer arrays in an index must either be the same shape or be able to be broadcast together to the same shape.
If the arrays are not the same shape, they are first broadcast together, and those broadcasted arrays are used as the indices. This broadcasting behavior is useful if the index array would otherwise be repeated in a given dimension, and provides a convenient way to do outer indexing (see the next section).
This also means that mixing an integer array index with a single integer
index is the same as replacing the single integer
index with an array of the same shape filled with that integer (because
remember, a single integer index is the same thing as an integer array index
of shape ()
).
For example:
>>> a = np.array([[100, 101, 102], # as above
... [103, 104, 105]])
>>> idx0 = np.array([1, 0])
>>> idx0.shape
(2,)
>>> idx1 = np.array([[0], [1], [2]])
>>> idx1.shape
(3, 1)
>>> # idx0 and idx1 broadcast to shape (3, 2), which will
>>> # be the shape of a[idx0, idx1]
>>> a[idx0, idx1]
array([[103, 100],
[104, 101],
[105, 102]])
>>> a[idx0, idx1].shape
(3, 2)
>>> idx0_broadcasted = np.array([[1, 0], [1, 0], [1, 0]])
>>> idx1_broadcasted = np.array([[0, 0], [1, 1], [2, 2]])
>>> idx0_broadcasted.shape
(3, 2)
>>> idx1_broadcasted.shape
(3, 2)
>>> a[idx0_broadcasted, idx1_broadcasted] # The same thing as a[idx0, idx1]
array([[103, 100],
[104, 101],
[105, 102]])
And mixing an array and an integer index:
>>> a
array([[100, 101, 102],
[103, 104, 105]])
>>> idx0 = np.array([1, 0, 0])
>>> a[idx0, 2]
array([105, 102, 102])
>>> idx1_broadcasted = np.array([2, 2, 2]) # The 0-D array '2' broadcasted to shape (3,)
>>> a[idx0, idx1_broadcasted] # The same thing as a[idx0, 2]
array([105, 102, 102])
Here the idx0
array specifies the indices along the first dimension, 1
,
0
, and 0
, and the 2
specifies to always use index 2
along the second
dimension. This is the same as using the array [2, 2, 2]
for the second
dimension, since this is the scalar 2
broadcasted to the shape of [1, 0, 0]
.
The ndindex methods
Tuple.broadcast_arrays()
and
expand()
will broadcast array indices together into
a canonical form.
Outer Indexing¶
The broadcasting behavior for multiple integer indices may seem odd, but it serves a useful purpose. As we saw above, multiple integer array indices are required to select elements from higher dimensional arrays, one array for each dimension. These integer arrays enumerate the indices of the selected elements along these dimensions. For example, as above:
>>> a = np.array([[100, 101, 102],
... [103, 104, 105]])
>>> a[[1, 0], [2, 0]] # selects elements (1, 2) and (0, 0)
array([105, 100])
However, you might have noticed that this behavior is somewhat unusual
compared to other index types. For all other index types we’ve discussed so
far, such as slices and integer indices,
each index applies “independently” along each dimension. For example, x[0:2, 0:3]
applies the slice 0:2
to the first dimension of x
and 0:3
to the
second dimension. The resulting array has 2*3 = 6
elements, because there
are 3 subarrays selected from the first dimension with 2 elements each. But in
the above example, a[[1, 0], [2, 0]]
only has 2 elements, not 4. And
something like a[[1, 0], [2, 0, 1]]
is an error.
The integer array equivalent of the way slices work is called “outer
indexing”.[4] An outer index “a[[1, 0], [2, 0, 1]]
” would have 6 elements: rows
1 and 0, with elements from columns 2, 0, and 1, in that order. However, the
index a[[1, 0], [2, 0, 1]]
doesn’t actually work like
this.[5]
Strictly speaking, though, NumPy’s integer array indexing rules do allow for outer indexing. This is because, as we saw above, integer array indexing allows for creating arbitrary new arrays from a given input array. And as it turns out, the integer arrays required to represent an outer array index are quite simple to construct. They are simply the outer index arrays broadcasted together.
To see why this is, consider the above example, a[[1, 0], [2, 0, 1]]
. We
want our end result to be
[[105, 103, 104],
[102, 101, 100]]
That is, the rows of a
should be in the order [0, 1]
, and the columns
should be in the order [2, 0, 1]
. The end result should be an array of shape
(2, 3)
(which happens to be the same shape as a
, but that’s just a
coincidence; an outer-indexed array constructed from a
could have any 2-D
shape). So using the integer array indexing rules above, we need to index a
by integer arrays of shape (2, 3)
. Since a
has two dimensions, we will
need two index arrays, one for each dimension. Let’s consider what these
arrays should be. For the first dimension, we want to select row 1
three
times and then row 0
three times:
[[1, 1, 1],
[0, 0, 0]]
And for the second dimension, we want to select the columns 2
, 0
, and 1
,
in that order, regardless of which row we are in:
[[2, 0, 1],
[2, 0, 1]]
In general, we want to repeat each outer selection array along the
corresponding dimension so as to fill an array with the final desired shape.
This is exactly what broadcasting does! If we reshape our first array to have
shape (2, 1)
and the second array to have shape (1, 3)
, then broadcasting
them together will repeat the first dimension of the first array along the
second axis, and the second dimension of the second array along the first
axis, i.e., exactly the arrays we want.
This is why NumPy automatically broadcasts integer array indices together.
Outer indexing arrays can be constructed by inserting size
1
dimensions into the desired “outer” integer array indices so that the non-size1
dimension for each is in the indexing dimension.
For example,
>>> idx0 = np.array([1, 0])
>>> idx1 = np.array([2, 0, 1])
>>> a[idx0[:, np.newaxis], idx1[np.newaxis, :]]
array([[105, 103, 104],
[102, 100, 101]])
Here, we use newaxis along with :
to turn idx0
and
idx1
into shape (2, 1)
and (1, 3)
arrays, respectively. These then
automatically broadcast together to give the desired outer index.
This “insert size 1
dimensions” operation can also be performed automatically
with the numpy.ix_()
function.[6]
>>> np.ix_(idx0, idx1)
(array([[1],
[0]]), array([[2, 0, 1]]))
>>> a[np.ix_(idx0, idx1)]
array([[105, 103, 104],
[102, 100, 101]])
Outer indexing can be thought of as a generalization of slicing. With a
slice, you can really only select a “regular” sequence of
elements from a dimension, that is, either a contiguous chunk, or a contiguous
chunk split by a regular step value. It’s impossible, for instance, to use a
slice to select the indices [0, 1, 2, 3, 5, 6, 7]
, because 4
is omitted.
For instance, say the first dimension of your array represents time steps and
you want to select time steps 0–7, but time step 4 is invalid for some reason
and you want to ignore it for your analysis. If you just care about the first
dimension, you can just use the integer index [0, 1, 2, 3, 5, 6, 7]
. But
suppose you also wanted select some other non-contiguous “slice” from the
second dimension. Using just basic indices, you’d have to index the array with
normal slices then either remove or ignore the non-desired indices, neither of
which is ideal. And it would be even more complicated if you also wanted the
indices out-of-order or repeated for some reason.
With outer indexing, you would just construct your “slice” of non-contiguous
indices as integer arrays, turn them into “outer” indices using ix_
or
manual reshaping, then use that outer index to construct the desired array
directly.
Conversely, a slice like 2:9
is equivalent to the outer index [2, 3, 4, 5, 6, 7, 8]
.[7]
Assigning to an Integer Array Index¶
As with all index types discussed in this guide, an integer array index can be used on the left-hand side of an assignment. This is useful because it allows you to surgically inject new elements into existing positions in your array.
>>> a = np.array([100, 101, 102, 103]) # as above
>>> idx = np.array([0, 3])
>>> a[idx] = np.array([200, 203])
>>> a
array([200, 101, 102, 203])
However, exercise caution, as this is inherently ambiguous if the index array
contains duplicate elements. For example, suppose we attempted to
set index 0
to both 1
and 3
:
>>> a = np.array([100, 101, 102, 103]) # as above
>>> idx = np.array([0, 1, 0])
>>> a[idx] = np.array([1, 2, 3])
>>> a
array([ 3, 2, 102, 103])
The end result was 3
. This happened because 3
corresponded to the last 0
in the index array. But importantly, this is just an implementation detail.
NumPy makes no guarantees regarding the order in which index elements are
assigned.[8] If you are using an integer array as an
assignment index, be careful to avoid duplicate entries in the index or, at
the very least, ensure that duplicate entries are always assigned the same
value.
Combining Integer Arrays Indices with Basic Indices¶
If any slice, ellipsis, or newaxis indices precede or follow all the integer and integer array indices in an index, the two sets of indices operate independently. Slices and ellipses select the corresponding axes, newaxes add new axes to these locations, and the integer array indices select the elements on their respective axes, as previously described.
For example, consider:
>>> a = np.array([[[100, 101, 102], # Like above, but with an extra dimension
... [103, 104, 105]]])
This is the same a
as in the above examples, except it has an extra size 1
dimension:
>>> a.shape
(1, 2, 3)
We can select this first dimension with a slice :
, then use the exact same
index as in the example shown previously:
>>> idx0
array([1, 0])
>>> a[:, idx0, 2]
array([[105, 102]])
>>> a[:, idx0, 2].shape
(1, 2)
The primary point of this behavior is that you can use ...
at the beginning
of an index to select the last axes of an array using integer array indices,
or several :
s to select some middle axes. This lets you do with indexing
what you can also do with the numpy.take()
function.
To be sure though, this index could use any slice, not just :
, and could
also include newaxes. This behavior is mainly implemented for the sake of
semantic completeness, although it could potentially allow combining two
sequential indexing operations into a single step.
Integer Array Indices Separated by Basic Indices¶
Finally, if the slices, ellipses, or newaxes are in between the integer array indices, then something more strange happens. The two index types still operate “independently”; however, unlike the previous case, the shape derived from the array indices is prepended to the shape derived from the non-array indices. This is because in these cases there is inherent ambiguity in where these dimensions should be placed in the final shape.
An example demonstrates this most clearly:
>>> a = np.empty((2, 3, 4, 5))
>>> a.shape
(2, 3, 4, 5)
>>> idx = np.zeros((10, 20), dtype=int)
>>> idx.shape
(10, 20)
>>> a[idx, :, :, idx].shape
(10, 20, 3, 4)
Here the integer array index shape (10, 20)
comes first in the result array
and the shape corresponding to the rest of the index, (3, 4)
, comes last.
If you find yourself running into this behavior, chances are you would be
better off rewriting the indexing operation to be simpler, for instance, by
first reshaping the array so that the integer array indices are together in
the index. This is considered a design flaw in
NumPy[9], and no other Python array
library has replicated it. ndindex will raise a NotImplementedError
exception on indices like these, because I don’t want to deal with
implementing this obscure
logic.[10]
Exercise¶
Based on the above sections, you should be able to complete the following
exercise: How might you randomly permute a 2-D array using
numpy.random.Generator.permutation()
and indexing, so
that each axis is permuted independently? This operation might correspond to
multiplying the array by random permutation
matrices on the left and
right, like \(P_1AP_2\).
For example, the array
a = array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Might be permuted to
a_perm = array([[ 5, 4, 6, 7],
[ 1, 0, 2, 3],
[ 9, 8, 10, 11]])
(Note that this is not a full permutation of the array. For instance, the
first row [5, 4, 7, 6]
contains only elements from the second row of a
.)
Click here to show the solution
Suppose we start with the following 2-D array a
:
>>> a = np.arange(12).reshape((3, 4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
We can generate permutations for the two axes using
numpy.random.Generator.permutation()
as
above:
>>> rng = np.random.default_rng(11) # Seeded so this example reproduces
>>> idx0 = rng.permutation(np.arange(3))
>>> idx1 = rng.permutation(np.arange(4))
However, we cannot do a[idx0, idx1]
as this will fail.
>>> a[idx0, idx1]
Traceback (most recent call last):
...
IndexError: shape mismatch: indexing arrays could not be broadcast together
with shapes (3,) (4,)
Remember that we want a permutation of a
, so the result array should have
the same shape as a
((3, 4)
). This should therefore be the broadcasted
shape of idx0
and idx1
, which are currently shapes (3,)
, and (4,)
. We
can use newaxis
to insert dimensions so that they are
shape (3, 1)
and (1, 4)
so that they broadcast together to this shape.
>>> a[idx0[:, np.newaxis], idx1[np.newaxis]]
array([[ 5, 4, 6, 7],
[ 1, 0, 2, 3],
[ 9, 8, 10, 11]])
You can check that this is a permutation of a
where each axis is permuted
independently.
We can also interpret this as an outer indexing operation.
In this case, our non-contiguous “slices” that we are outer indexing by are a
full slice along each axis, just permuted. We can use the ix_()
helper to
construct the same index as above
>>> a[np.ix_(idx0, idx1)]
array([[ 5, 4, 6, 7],
[ 1, 0, 2, 3],
[ 9, 8, 10, 11]])
As an extra bonus, here’s how we can interpret this as a multiplication by
permutation matrices, using the same indices (but of course, simply permuting
a
directly with the indices is more efficient):
>>> P1 = np.eye(3, dtype=int)[idx0]
>>> P2 = np.eye(4, dtype=int)[idx1]
>>> P1 @ a @ P2.T
array([[ 5, 4, 6, 7],
[ 1, 0, 2, 3],
[ 9, 8, 10, 11]])
Can you see why this works?
Footnotes