Introduction: What is an Index?

Nominally, an index is any object that can be placed between the square brackets after an array. That is, if a is a NumPy array, then in a[x], x is an index of a.[1] This also applies to built-in sequence types in Python, such as list, tuple, and str; however, be careful not to confuse this with the similar notation used in Python dictionaries. If d is a Python dictionary, it uses the same notation d[x], but the meaning of x is completely different from what is discussed in this document. This document also does not apply to indexing Pandas DataFrame or Series objects, except insofar as they reuse the same semantics as NumPy. Finally, note that some other Python array libraries (e.g., PyTorch or Jax) have similar indexing rules, but they generally implement only a subset of the full NumPy semantics outlined here.

Semantically, an index x selects, or indexes[2], some subset of the elements of a. An index a[x] always either returns a new array containing a subset of the elements of a or raises an IndexError. When it comes to indexing, the most important rule, which applies to all types of indices, is this:

Indices do not in any way depend on the values of the elements they select. They only depend on their positions in the array.

For example, consider a, an array of integers with the shape (2, 3, 2):

>>> import numpy as np
>>> a = np.array([[[0, 1], [2, 3], [4, 5]], [[6, 7], [8, 9], [10, 11]]])
>>> a.shape
(2, 3, 2)

Let’s take as an example the index 0, ..., 1:. We’ll investigate how exactly this index works later. For now, just notice that a[0, ..., 1:] returns a new array with some of the elements of a.

>>> a[0, ..., 1:]
array([[1],
       [3],
       [5]])

Now consider another array, b, with the exact same shape (2, 3, 2), but containing completely different entries, such as strings. If we apply the same index 0, ..., 1: to b, it will choose the exact same corresponding elements.

>>> b = np.array([[['A', 'B'], ['C', 'D'], ['E', 'F']], [['G', 'H'], ['I', 'J'], ['K', 'L']]])
>>> b[0, ..., 1:]
array([['B'],
       ['D'],
       ['F']], dtype='<U1')

Notice that 'B' is in the same place in b as 1 was in a, 'D' as 3, and 'F' as 5. Furthermore, the shapes of the resulting arrays are the same:

>>> a[0, ..., 1:].shape
(3, 1)
>>> b[0, ..., 1:].shape
(3, 1)

Therefore, the following statements are always true about any index:

  • An index on an array always produces a new array with the same dtype (unless it raises IndexError).

  • Each element of the new array corresponds to some element of the original array.

  • These elements are chosen by their position in the original array only. The values of these elements are irrelevant.

  • As such, the same index applied to any other array with the same shape will produce an array with the exact same resulting shape with elements in the exact same corresponding places.

The full range of valid indices allows the generation of arbitrary new arrays whose elements come from the indexed array a. In practice, the most commonly desired indexing operations are represented by basic indices such as integer indices, slices, and ellipses.

Footnotes