Chunking¶
The ChunkSize
class represents the shape of a chunk for a chunked
array. It contains methods for manipulating these chunks and indices on a
chunked array.
- class ndindex.ChunkSize(chunk_size)[source]¶
Represents a chunk size tuple.
A chunk size is a tuple of length n where each element is either a positive integer or
None
. It represents a chunking of an array with n dimensions, where each corresponding dimension is chunked by the corresponding chunk size, or not chunked forNone
(note,None
chunks are currently not yet implemented).For example, given a 3 dimensional chunk size of
(20, 20, None)
and an array of shape(40, 30, 10)
, the array would be split into four chunks, corresponding to the indices0:20,0:20,:
,0:20,20:30,:
,20:40,0:20,:
, and20:40,20:30,:
. Note that the size of a chunk may be less than the total chunk size if the array shape is not a multiple of the chunk size in a given dimension.ChunkSize
behaves like atuple
. For example,chunk_size[0]
gives the first chunk dimension, andlen(chunk_size)
gives the number of dimensions of a chunk. Also, the input to ChunkSize should be a tuple, just as with thetuple
constructor, even for single dimensional chunk sizes.>>> from ndindex import ChunkSize >>> ChunkSize((20, 30, 40)) ChunkSize((20, 30, 40)) >>> ChunkSize((2**12,)) ChunkSize((4096,))
- args¶
idx.args
contains the arguments needed to createidx
.For an ndindex object
idx
,idx.args
is always a tuple such thattype(idx)(*idx.args) == idx
For
Tuple
indices, the elements of.args
are themselves ndindex types. For other types,.args
contains raw Python types. Note that.args
contains NumPy arrays forIntegerArray
andBooleanArray
types, so one should always do equality testing or hashing on the ndindex type itself, not its.args
.
- as_subchunks(idx, shape, *, _force_slow=None)[source]¶
Split an index
idx
on an array of shapeshape
into subchunk indices.Yields indices
c
, wherec
is an index for the chunk that should be sliced. Only thosec
for whichidx
includes at least one element are yielded.That is to say, for each
c
index yielded,a[c][idx.as_subindex(c)]
will give those elements ofa[idx]
that are part of thec
chunk, and together they give all the elements ofa[idx]
. See also the docstring ofas_subindex()
.This method is roughly equivalent to
def as_subchunks(self, idx, shape): for c in self.indices(shape): try: index = idx.as_subindex(c) except ValueError: # as_subindex raises ValueError in some cases when the # indices do not intersect (see the docstring of # as_subindex()) continue if not index.isempty(self): # Yield those c for which idx.as_subindex(c) is nonempty yield c
except it is more efficient.
>>> from ndindex import ChunkSize, Tuple >>> idx = Tuple(slice(5, 15), 0) >>> shape = (20, 20) >>> chunk_size = ChunkSize((10, 10)) >>> for c in chunk_size.as_subchunks(idx, shape): ... print(c) ... print(' ', idx.as_subindex(c)) Tuple(slice(0, 10, 1), slice(0, 10, 1)) Tuple(slice(5, 10, 1), 0) Tuple(slice(10, 20, 1), slice(0, 10, 1)) Tuple(slice(0, 5, 1), 0)
See also
- containing_block(idx, shape)[source]¶
Compute the index for the smallest contiguous block of chunks that contains
idx
on an array of shapeshape
.A block is a subset of an array that is contiguous in all dimensions and is aligned along the chunk size. A block index is always of the form
(Slice(k1, m1), Slice(k2, m2), …, Slice(kn, mn))
wheren
is the number of dimensions in the chunk size, and theki
andmi
are multiples of the corresponding chunk dimension (themi
may be truncated to the shape).For example, given a chunk size of
(10, 15)
, an example block might be(Slice(0, 20), Slice(30, 45))
. Such a block would be the smallest block that contains the index(Slice(0, 12), 40)
, for example.>>> from ndindex import ChunkSize >>> chunk_size = ChunkSize((10, 15)) >>> idx = (slice(0, 12), 40) >>> shape = (100, 100) >>> block = chunk_size.containing_block(idx, shape) >>> block Tuple(slice(0, 20, 1), slice(30, 45, 1))
The method
as_subchunks()
can be used on the block to determine which chunks are contained in it, andnum_subchunks()
to determine how many:>>> chunk_size.num_subchunks(block, shape) 2 >>> for c in chunk_size.as_subchunks(block, shape): ... print(c) Tuple(slice(0, 10, 1), slice(30, 45, 1)) Tuple(slice(10, 20, 1), slice(30, 45, 1))
In this example,
chunk_size.as_subchunk(block, shape)
andchunk_size.as_subchunks(idx, shape)
are the same, but in general, a block may overlap with more chunks than the original index because the block is contiguous.
- indices(shape)[source]¶
Yield a set of ndindex indices for the chunks on an array of shape
shape
.shape
should have the same number of dimensions asself
. If the shape is not a multiple of the chunk size, some chunks will be truncated, so thatlen(idx.args[i])
can be used to get the size of an indexed axis.For example, if
a
has shape(10, 19)
and is chunked into chunks of shape(5, 5)
:>>> from ndindex import ChunkSize >>> chunk_size = ChunkSize((5, 5)) >>> for idx in chunk_size.indices((10, 19)): ... print(idx) Tuple(slice(0, 5, 1), slice(0, 5, 1)) Tuple(slice(0, 5, 1), slice(5, 10, 1)) Tuple(slice(0, 5, 1), slice(10, 15, 1)) Tuple(slice(0, 5, 1), slice(15, 19, 1)) Tuple(slice(5, 10, 1), slice(0, 5, 1)) Tuple(slice(5, 10, 1), slice(5, 10, 1)) Tuple(slice(5, 10, 1), slice(10, 15, 1)) Tuple(slice(5, 10, 1), slice(15, 19, 1))
See also
- num_chunks(shape)[source]¶
Give the number of chunks for the given
shape
.This is the same as
len(list(self.indices(shape)))
, but much faster.shape
must have the same number of dimensions asself
.>>> from ndindex import ChunkSize >>> chunk_size = ChunkSize((10, 10, 10)) >>> shape = (10000, 10000, 10000) >>> # len(list(chunk_size.indices(shape))) would be very slow, as >>> # would have to iterate all 1 billion chunks >>> chunk_size.num_chunks(shape) 1000000000
See also
- num_subchunks(idx, shape)[source]¶
Give the number of chunks indexed by
idx
on an array of shapeshape
.This is equivalent to
len(list(self.as_subindex(idx, shape)))
, but more efficient.>>> from ndindex import ChunkSize, Tuple >>> idx = Tuple(slice(5, 15), 0) >>> shape = (20, 20) >>> chunk_size = ChunkSize((10, 10)) >>> chunk_size.num_subchunks(idx, shape) 2
See also