Chunking

The ChunkSize class represents the shape of a chunk for a chunked array. It contains methods for manipulating these chunks and indices on a chunked array.

class ndindex.ChunkSize(chunk_size)[source]

Represents a chunk size tuple.

A chunk size is a tuple of length n where each element is either a positive integer or None. It represents a chunking of an array with n dimensions, where each corresponding dimension is chunked by the corresponding chunk size, or not chunked for None (note, None chunks are currently not yet implemented).

For example, given a 3 dimensional chunk size of (20, 20, None) and an array of shape (40, 30, 10), the array would be split into four chunks, corresponding to the indices 0:20,0:20,:, 0:20,20:30,:, 20:40,0:20,:, and 20:40,20:30,:. Note that the size of a chunk may be less than the total chunk size if the array shape is not a multiple of the chunk size in a given dimension.

ChunkSize behaves like a tuple. For example, chunk_size[0] gives the first chunk dimension, and len(chunk_size) gives the number of dimensions of a chunk. Also, the input to ChunkSize should be a tuple, just as with the tuple constructor, even for single dimensional chunk sizes.

>>> from ndindex import ChunkSize
>>> ChunkSize((20, 30, 40))
ChunkSize((20, 30, 40))
>>> ChunkSize((2**12,))
ChunkSize((4096,))
args

idx.args contains the arguments needed to create idx.

For an ndindex object idx, idx.args is always a tuple such that

type(idx)(*idx.args) == idx

For Tuple indices, the elements of .args are themselves ndindex types. For other types, .args contains raw Python types. Note that .args contains NumPy arrays for IntegerArray and BooleanArray types, so one should always do equality testing or hashing on the ndindex type itself, not its .args.

as_subchunks(idx, shape, *, _force_slow=None)[source]

Split an index idx on an array of shape shape into subchunk indices.

Yields indices c, where c is an index for the chunk that should be sliced. Only those c for which idx includes at least one element are yielded.

That is to say, for each c index yielded, a[c][idx.as_subindex(c)] will give those elements of a[idx] that are part of the c chunk, and together they give all the elements of a[idx]. See also the docstring of as_subindex().

This method is roughly equivalent to

def as_subchunks(self, idx, shape):
    for c in self.indices(shape):
        try:
            index = idx.as_subindex(c)
        except ValueError:
            # as_subindex raises ValueError in some cases when the
            # indices do not intersect (see the docstring of
            # as_subindex())
            continue

        if not index.isempty(self):
            # Yield those c for which idx.as_subindex(c) is nonempty
            yield c

except it is more efficient.

>>> from ndindex import ChunkSize, Tuple
>>> idx = Tuple(slice(5, 15), 0)
>>> shape = (20, 20)
>>> chunk_size = ChunkSize((10, 10))
>>> for c in chunk_size.as_subchunks(idx, shape):
...     print(c)
...     print('    ', idx.as_subindex(c))
Tuple(slice(0, 10, 1), slice(0, 10, 1))
    Tuple(slice(5, 10, 1), 0)
Tuple(slice(10, 20, 1), slice(0, 10, 1))
    Tuple(slice(0, 5, 1), 0)
containing_block(idx, shape)[source]

Compute the index for the smallest contiguous block of chunks that contains idx on an array of shape shape.

A block is a subset of an array that is contiguous in all dimensions and is aligned along the chunk size. A block index is always of the form (Slice(k1, m1), Slice(k2, m2), …, Slice(kn, mn)) where n is the number of dimensions in the chunk size, and the ki and mi are multiples of the corresponding chunk dimension (the mi may be truncated to the shape).

For example, given a chunk size of (10, 15), an example block might be (Slice(0, 20), Slice(30, 45)). Such a block would be the smallest block that contains the index (Slice(0, 12), 40), for example.

>>> from ndindex import ChunkSize
>>> chunk_size = ChunkSize((10, 15))
>>> idx = (slice(0, 12), 40)
>>> shape = (100, 100)
>>> block = chunk_size.containing_block(idx, shape)
>>> block
Tuple(slice(0, 20, 1), slice(30, 45, 1))

The method as_subchunks() can be used on the block to determine which chunks are contained in it, and num_subchunks() to determine how many:

>>> chunk_size.num_subchunks(block, shape)
2
>>> for c in chunk_size.as_subchunks(block, shape):
...     print(c)
Tuple(slice(0, 10, 1), slice(30, 45, 1))
Tuple(slice(10, 20, 1), slice(30, 45, 1))

In this example, chunk_size.as_subchunk(block, shape) and chunk_size.as_subchunks(idx, shape) are the same, but in general, a block may overlap with more chunks than the original index because the block is contiguous.

indices(shape)[source]

Yield a set of ndindex indices for the chunks on an array of shape shape.

shape should have the same number of dimensions as self. If the shape is not a multiple of the chunk size, some chunks will be truncated, so that len(idx.args[i]) can be used to get the size of an indexed axis.

For example, if a has shape (10, 19) and is chunked into chunks of shape (5, 5):

>>> from ndindex import ChunkSize
>>> chunk_size = ChunkSize((5, 5))
>>> for idx in chunk_size.indices((10, 19)):
...     print(idx)
Tuple(slice(0, 5, 1), slice(0, 5, 1))
Tuple(slice(0, 5, 1), slice(5, 10, 1))
Tuple(slice(0, 5, 1), slice(10, 15, 1))
Tuple(slice(0, 5, 1), slice(15, 19, 1))
Tuple(slice(5, 10, 1), slice(0, 5, 1))
Tuple(slice(5, 10, 1), slice(5, 10, 1))
Tuple(slice(5, 10, 1), slice(10, 15, 1))
Tuple(slice(5, 10, 1), slice(15, 19, 1))
num_chunks(shape)[source]

Give the number of chunks for the given shape.

This is the same as len(list(self.indices(shape))), but much faster. shape must have the same number of dimensions as self.

>>> from ndindex import ChunkSize
>>> chunk_size = ChunkSize((10, 10, 10))
>>> shape = (10000, 10000, 10000)
>>> # len(list(chunk_size.indices(shape))) would be very slow, as
>>> # would have to iterate all 1 billion chunks
>>> chunk_size.num_chunks(shape)
1000000000
num_subchunks(idx, shape)[source]

Give the number of chunks indexed by idx on an array of shape shape.

This is equivalent to len(list(self.as_subindex(idx, shape))), but more efficient.

>>> from ndindex import ChunkSize, Tuple
>>> idx = Tuple(slice(5, 15), 0)
>>> shape = (20, 20)
>>> chunk_size = ChunkSize((10, 10))
>>> chunk_size.num_subchunks(idx, shape)
2