Chunking¶
The ChunkSize class represents the shape of a chunk for a chunked
array. It contains methods for manipulating these chunks and indices on a
chunked array.
- class ndindex.ChunkSize(chunk_size)[source]¶
Represents a chunk size tuple.
A chunk size is a tuple of length n where each element is either a positive integer or
None. It represents a chunking of an array with n dimensions, where each corresponding dimension is chunked by the corresponding chunk size, or not chunked forNone(note,Nonechunks are currently not yet implemented).For example, given a 3 dimensional chunk size of
(20, 20, None)and an array of shape(40, 30, 10), the array would be split into four chunks, corresponding to the indices0:20,0:20,:,0:20,20:30,:,20:40,0:20,:, and20:40,20:30,:. Note that the size of a chunk may be less than the total chunk size if the array shape is not a multiple of the chunk size in a given dimension.ChunkSizebehaves like atuple. For example,chunk_size[0]gives the first chunk dimension, andlen(chunk_size)gives the number of dimensions of a chunk. Also, the input to ChunkSize should be a tuple, just as with thetupleconstructor, even for single dimensional chunk sizes.>>> from ndindex import ChunkSize >>> ChunkSize((20, 30, 40)) ChunkSize((20, 30, 40)) >>> ChunkSize((2**12,)) ChunkSize((4096,))
- args¶
idx.argscontains the arguments needed to createidx.For an ndindex object
idx,idx.argsis always a tuple such thattype(idx)(*idx.args) == idx
For
Tupleindices, the elements of.argsare themselves ndindex types. For other types,.argscontains raw Python types. Note that.argscontains NumPy arrays forIntegerArrayandBooleanArraytypes, so one should always do equality testing or hashing on the ndindex type itself, not its.args.
- as_subchunks(idx, shape, *, _force_slow=None)[source]¶
Split an index
idxon an array of shapeshapeinto subchunk indices.Yields indices
c, wherecis an index for the chunk that should be sliced. Only thosecfor whichidxincludes at least one element are yielded.That is to say, for each
cindex yielded,a[c][idx.as_subindex(c)]will give those elements ofa[idx]that are part of thecchunk, and together they give all the elements ofa[idx]. See also the docstring ofas_subindex().This method is roughly equivalent to
def as_subchunks(self, idx, shape): for c in self.indices(shape): try: index = idx.as_subindex(c) except ValueError: # as_subindex raises ValueError in some cases when the # indices do not intersect (see the docstring of # as_subindex()) continue if not index.isempty(self): # Yield those c for which idx.as_subindex(c) is nonempty yield c
except it is more efficient.
>>> from ndindex import ChunkSize, Tuple >>> idx = Tuple(slice(5, 15), 0) >>> shape = (20, 20) >>> chunk_size = ChunkSize((10, 10)) >>> for c in chunk_size.as_subchunks(idx, shape): ... print(c) ... print(' ', idx.as_subindex(c)) Tuple(slice(0, 10, 1), slice(0, 10, 1)) Tuple(slice(5, 10, 1), 0) Tuple(slice(10, 20, 1), slice(0, 10, 1)) Tuple(slice(0, 5, 1), 0)
See also
- containing_block(idx, shape)[source]¶
Compute the index for the smallest contiguous block of chunks that contains
idxon an array of shapeshape.A block is a subset of an array that is contiguous in all dimensions and is aligned along the chunk size. A block index is always of the form
(Slice(k1, m1), Slice(k2, m2), …, Slice(kn, mn))wherenis the number of dimensions in the chunk size, and thekiandmiare multiples of the corresponding chunk dimension (themimay be truncated to the shape).For example, given a chunk size of
(10, 15), an example block might be(Slice(0, 20), Slice(30, 45)). Such a block would be the smallest block that contains the index(Slice(0, 12), 40), for example.>>> from ndindex import ChunkSize >>> chunk_size = ChunkSize((10, 15)) >>> idx = (slice(0, 12), 40) >>> shape = (100, 100) >>> block = chunk_size.containing_block(idx, shape) >>> block Tuple(slice(0, 20, 1), slice(30, 45, 1))
The method
as_subchunks()can be used on the block to determine which chunks are contained in it, andnum_subchunks()to determine how many:>>> chunk_size.num_subchunks(block, shape) 2 >>> for c in chunk_size.as_subchunks(block, shape): ... print(c) Tuple(slice(0, 10, 1), slice(30, 45, 1)) Tuple(slice(10, 20, 1), slice(30, 45, 1))
In this example,
chunk_size.as_subchunk(block, shape)andchunk_size.as_subchunks(idx, shape)are the same, but in general, a block may overlap with more chunks than the original index because the block is contiguous.
- indices(shape)[source]¶
Yield a set of ndindex indices for the chunks on an array of shape
shape.shapeshould have the same number of dimensions asself. If the shape is not a multiple of the chunk size, some chunks will be truncated, so thatlen(idx.args[i])can be used to get the size of an indexed axis.For example, if
ahas shape(10, 19)and is chunked into chunks of shape(5, 5):>>> from ndindex import ChunkSize >>> chunk_size = ChunkSize((5, 5)) >>> for idx in chunk_size.indices((10, 19)): ... print(idx) Tuple(slice(0, 5, 1), slice(0, 5, 1)) Tuple(slice(0, 5, 1), slice(5, 10, 1)) Tuple(slice(0, 5, 1), slice(10, 15, 1)) Tuple(slice(0, 5, 1), slice(15, 19, 1)) Tuple(slice(5, 10, 1), slice(0, 5, 1)) Tuple(slice(5, 10, 1), slice(5, 10, 1)) Tuple(slice(5, 10, 1), slice(10, 15, 1)) Tuple(slice(5, 10, 1), slice(15, 19, 1))
See also
- num_chunks(shape)[source]¶
Give the number of chunks for the given
shape.This is the same as
len(list(self.indices(shape))), but much faster.shapemust have the same number of dimensions asself.>>> from ndindex import ChunkSize >>> chunk_size = ChunkSize((10, 10, 10)) >>> shape = (10000, 10000, 10000) >>> # len(list(chunk_size.indices(shape))) would be very slow, as >>> # would have to iterate all 1 billion chunks >>> chunk_size.num_chunks(shape) 1000000000
See also
- num_subchunks(idx, shape)[source]¶
Give the number of chunks indexed by
idxon an array of shapeshape.This is equivalent to
len(list(self.as_subindex(idx, shape))), but more efficient.>>> from ndindex import ChunkSize, Tuple >>> idx = Tuple(slice(5, 15), 0) >>> shape = (20, 20) >>> chunk_size = ChunkSize((10, 10)) >>> chunk_size.num_subchunks(idx, shape) 2
See also