[Git][debian-gis-team/flox][master] 4 commits: New upstream version 0.10.3

Antonio Valentino (@antonio.valentino) gitlab at salsa.debian.org
Sun Apr 13 10:24:16 BST 2025



Antonio Valentino pushed to branch master at Debian GIS Project / flox


Commits:
4eceb92b by Antonio Valentino at 2025-04-13T09:15:48+00:00
New upstream version 0.10.3
- - - - -
236a6e07 by Antonio Valentino at 2025-04-13T09:15:51+00:00
Update upstream source from tag 'upstream/0.10.3'

Update to upstream version '0.10.3'
with Debian dir 24ccd741e5dc6cfda844c21b044a710d9b07ce20
- - - - -
0cd81028 by Antonio Valentino at 2025-04-13T09:16:28+00:00
New upstream release

- - - - -
64637f1c by Antonio Valentino at 2025-04-13T09:17:35+00:00
Set distribution to unstable

- - - - -


9 changed files:

- .pre-commit-config.yaml
- asv_bench/benchmarks/combine.py
- debian/changelog
- docs/source/implementation.md
- docs/source/user-stories/large-zonal-stats.ipynb
- flox/core.py
- flox/xarray.py
- flox/xrutils.py
- tests/test_core.py


Changes:

=====================================
.pre-commit-config.yaml
=====================================
@@ -4,7 +4,7 @@ ci:
 repos:
   - repo: https://github.com/astral-sh/ruff-pre-commit
     # Ruff version.
-    rev: "v0.9.1"
+    rev: "v0.11.4"
     hooks:
       - id: ruff
         args: ["--fix", "--show-fixes"]
@@ -24,7 +24,7 @@ repos:
       - id: check-docstring-first
 
   - repo: https://github.com/executablebooks/mdformat
-    rev: 0.7.21
+    rev: 0.7.22
     hooks:
       - id: mdformat
         additional_dependencies:
@@ -38,19 +38,19 @@ repos:
         args: [--extra-keys=metadata.kernelspec metadata.language_info.version]
 
   - repo: https://github.com/codespell-project/codespell
-    rev: v2.3.0
+    rev: v2.4.1
     hooks:
       - id: codespell
         additional_dependencies:
           - tomli
 
   - repo: https://github.com/abravalheri/validate-pyproject
-    rev: v0.23
+    rev: v0.24.1
     hooks:
       - id: validate-pyproject
 
   - repo: https://github.com/rhysd/actionlint
-    rev: v1.7.6
+    rev: v1.7.7
     hooks:
       - id: actionlint
         files: ".github/workflows/"


=====================================
asv_bench/benchmarks/combine.py
=====================================
@@ -45,7 +45,7 @@ class Combine:
 class Combine1d(Combine):
     """
     Time the combine step for dask reductions,
-    this is for reducting along a single dimension
+    this is for reducing along a single dimension
     """
 
     def setup(self, *args, **kwargs) -> None:


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+flox (0.10.3-1) unstable; urgency=medium
+
+  * New upstream release.
+
+ -- Antonio Valentino <antonio.valentino at tiscali.it>  Sun, 13 Apr 2025 09:17:14 +0000
+
 flox (0.10.2-1) unstable; urgency=medium
 
   * New upstream release.


=====================================
docs/source/implementation.md
=====================================
@@ -110,6 +110,26 @@ width: 100%
 
 This approach allows grouping by a dask array so group labels can be discovered at compute time, similar to `dask.dataframe.groupby`.
 
+### reindexing to a sparse array
+
+For large numbers of groups, we might be reducing to a very sparse array (e.g. [this issue](https://github.com/xarray-contrib/flox/issues/428)).
+
+To control memory, we can instruct flox to reindex the intermediate results to a `sparse.COO` array using:
+
+```python
+from flox import ReindexArrayType, ReindexStrategy
+
+ReindexStrategy(
+    # do not reindex to the full output grid at the blockwise aggregation stage
+    blockwise=False,
+    # when combining intermediate results after blockwise aggregation, reindex to the
+    # common grid using a sparse.COO array type
+    array_type=ReindexArrayType.SPARSE_COO,
+)
+```
+
+See [this user story](user-stories/large-zonal-stats) for more discussion.
+
 ### Example
 
 For example, consider `groupby("time.month")` with monthly frequency data and chunksize of 4 along `time`.


=====================================
docs/source/user-stories/large-zonal-stats.ipynb
=====================================
@@ -9,7 +9,7 @@
     "\n",
     "\"Zonal statistics\" spans a large range of problems. \n",
     "\n",
-    "This one is inspired by [this issue](https://github.com/xarray-contrib/flox/issues/428), where a cell areas raster is aggregated over 6 different groupers and summed. Each array involved has shape 560_000 x 1440_000 and chunk size 10_000 x 10_000. Three of the groupers `tcl_year`, `drivers`, and `tcd_thresholds` have a small number of group labels (23, 5, and 7). \n",
+    "This one is inspired by [this issue](https://github.com/xarray-contrib/flox/issues/428), where a cell areas raster is aggregated over 6 different groupers and summed. Each array involved has a global extent on a 30m grid with shape 560_000 x 1440_000 and chunk size 10_000 x 10_000. Three of the groupers `tcl_year`, `drivers`, and `tcd_thresholds` have a small number of group labels (23, 5, and 7). \n",
     "\n",
     "The last 3 groupers are [GADM](https://gadm.org/) level 0, 1, 2 administrative area polygons rasterized to this grid; with 248, 86, and 854 unique labels respectively (arrays `adm0`, `adm1`, and `adm2`). These correspond to country-level, state-level, and county-level administrative boundaries. "
    ]
@@ -44,7 +44,7 @@
     "from flox.xarray import xarray_reduce\n",
     "\n",
     "sizes = {\"y\": 560_000, \"x\": 1440_000}\n",
-    "chunksizes = {\"y\": 2_000, \"x\": 2_000}\n",
+    "chunksizes = {\"y\": 10_000, \"x\": 10_000}\n",
     "dims = (\"y\", \"x\")\n",
     "shape = tuple(sizes[d] for d in dims)\n",
     "chunks = tuple(chunksizes[d] for d in dims)\n",
@@ -124,13 +124,13 @@
    "id": "8",
    "metadata": {},
    "source": [
-    "Formulating the three admin levels as orthogonal dimensions is quite wasteful --- not all countries have 86 states or 854 counties per state. \n",
+    "Formulating the three admin levels as orthogonal dimensions is quite wasteful --- not all countries have 86 states or 854 counties per state. The total number of GADM geometries for levels 0, 1, and 2 is ~48,000 which is much smaller than 23 x 5 x 7 x 248 x 86 x 854 = 14_662_360_160.\n",
     "\n",
-    "We end up with one humoungous 56GB chunk, that is mostly empty.\n",
+    "We end up with one humoungous 56GB chunk, that is mostly empty (sparsity ~ 48,000/14_662_360_160 ~ 0.2%).\n",
     "\n",
     "## We can do better using a sparse array\n",
     "\n",
-    "Since the results are very sparse, we can instruct flox to constructing dense arrays of intermediate results on the full 23 x 5 x 7 x 248 x 86 x 854 output grid.\n",
+    "Since the results are very sparse, we can instruct flox to construct dense arrays of intermediate results on the full 23 x 5 x 7 x 248 x 86 x 854 output grid.\n",
     "\n",
     "```python\n",
     "ReindexStrategy(\n",
@@ -161,6 +161,7 @@
     "        blockwise=False,\n",
     "        array_type=ReindexArrayType.SPARSE_COO,\n",
     "    ),\n",
+    "    fill_value=0,\n",
     ")\n",
     "result"
    ]
@@ -174,6 +175,42 @@
     "\n",
     "The computation runs smoothly with low memory."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11",
+   "metadata": {},
+   "source": [
+    "## Why\n",
+    "\n",
+    "To understand why you might do this, here is how flox runs reductions. In the images below, the `areas` array on the left has 5 2D chunks. Each color represents a group, each square represents a value of the array; clearly there are different groups in each chunk. \n",
+    "\n",
+    "\n",
+    "### reindex = True\n",
+    "\n",
+    "<img src=\"../_images/new-map-reduce-reindex-True-annotated.svg\" width=100%>\n",
+    "\n",
+    "First, the grouped-reduction is run on each chunk independently, and the results are constructed as _dense_ arrays on the full 23 x 5 x 7 x 248 x 86 x 854 output grid. This means that every chunk balloons to ~50GB. This method cannot work well.\n",
+    "\n",
+    "### reindex = False with sparse intermediates\n",
+    "\n",
+    "<img src=\"../_images/new-map-reduce-reindex-False-annotated.svg\" width=100%>\n",
+    "\n",
+    "First, the grouped-reduction is run on each chunk independently. Conceptually the result after this step is an array with differently sized chunks. \n",
+    "\n",
+    "Next results from neighbouring blocks are concatenated and a reduction is run again. These results are first aligned or reindexed to a common grid of group labels, termed \"reindexing\". At this stage, we instruct flox to construct a _sparse array_ during reindexing, otherwise we will eventually end up constructing _dense_ reindexed arrays of shape 23 x 5 x 7 x 248 x 86 x 854.\n",
+    "\n",
+    "\n",
+    "## Can we do better?\n",
+    "\n",
+    "Yes. \n",
+    "\n",
+    "1. Using the reindexing machinery to convert intermediates to sparse is a little bit hacky. A better option would be to aggregate directly to sparse arrays, potentially using a new `engine=\"sparse\"` ([issue](https://github.com/xarray-contrib/flox/issues/346)).\n",
+    "2. The total number of GADM geometries for levels 0, 1, and 2 is ~48,000. A much more sensible solution would be to allow grouping by these _geometries_ directly. This would allow us to be smart about the reduction, by exploiting the ideas underlying the [`method=\"cohorts\"` strategy](../implementation.md#method-cohorts).\n",
+    "\n",
+    "Regardless, the ability to do such reindexing allows flox to scale to much larger grouper arrays than previously possible.\n",
+    "\n"
+   ]
   }
  ],
  "metadata": {


=====================================
flox/core.py
=====================================
@@ -68,6 +68,7 @@ else:
     from numpy.core.numeric import normalize_axis_tuple  # type: ignore[no-redef]
 
 HAS_NUMBAGG = module_available("numbagg", minversion="0.3.0")
+HAS_SPARSE = module_available("sparse")
 
 if TYPE_CHECKING:
     try:
@@ -255,6 +256,12 @@ def _is_bool_supported_reduction(func: T_Agg) -> bool:
     )
 
 
+def _is_sparse_supported_reduction(func: T_Agg) -> bool:
+    if isinstance(func, Aggregation):
+        func = func.name
+    return HAS_SPARSE and all(f not in func for f in ["first", "last", "prod", "var", "std"])
+
+
 def _get_expected_groups(by: T_By, sort: bool) -> T_ExpectIndex:
     if is_duck_dask_array(by):
         raise ValueError("Please provide expected_groups if not grouping by a numpy array.")
@@ -736,12 +743,12 @@ def rechunk_for_blockwise(array: DaskArray, axis: T_Axis, labels: np.ndarray) ->
         return array.rechunk({axis: newchunks})
 
 
-def reindex_numpy(array, from_, to, fill_value, dtype, axis):
+def reindex_numpy(array, from_: pd.Index, to: pd.Index, fill_value, dtype, axis: int):
     idx = from_.get_indexer(to)
     indexer = [slice(None, None)] * array.ndim
     indexer[axis] = idx
     reindexed = array[tuple(indexer)]
-    if any(idx == -1):
+    if (idx == -1).any():
         if fill_value is None:
             raise ValueError("Filling is required. fill_value cannot be None.")
         indexer[axis] = idx == -1
@@ -750,25 +757,55 @@ def reindex_numpy(array, from_, to, fill_value, dtype, axis):
     return reindexed
 
 
-def reindex_pydata_sparse_coo(array, from_, to, fill_value, dtype, axis):
+def reindex_pydata_sparse_coo(array, from_: pd.Index, to: pd.Index, fill_value, dtype, axis: int):
     import sparse
 
     assert axis == -1
 
-    if fill_value is None:
+    # Are there any elements in `to` that are not in `from_`.
+    if isinstance(to, pd.RangeIndex) and len(to) > len(from_):
+        # 1. pandas optimizes set difference between two RangeIndexes only
+        # 2. We want to avoid realizing a very large numpy array in to memory.
+        #    This happens in the `else` clause.
+        #    There are potentially other tricks we can play, but this is a simple
+        #    and effective one. If a user is reindexing to sparse, then len(to) is
+        #    almost guaranteed to be > len(from_). If len(to) <= len(from_), then realizing
+        #    another array of the same shape should be fine.
+        needs_reindex = True
+    else:
+        needs_reindex = (from_.get_indexer(to) == -1).any()
+
+    if needs_reindex and fill_value is None:
         raise ValueError("Filling is required. fill_value cannot be None.")
+
     idx = to.get_indexer(from_)
-    assert (idx != -1).all()  # FIXME
+    mask = idx != -1  # indices along last axis to keep
+    if mask.all():
+        mask = slice(None)
     shape = array.shape
-    ranges = np.broadcast_arrays(*np.ix_(*(tuple(np.arange(size) for size in shape[:axis]) + (idx,))))
-    coords = np.stack(ranges, axis=0).reshape(array.ndim, -1)
 
-    data = array.data if isinstance(array, sparse.COO) else array.reshape(-1)
+    if isinstance(array, sparse.COO):
+        subset = array[..., mask]
+        data = subset.data
+        coords = subset.coords
+        if subset.nnz > 0:
+            coords[-1, :] = idx[mask][coords[-1, :]]
+        if fill_value is None:
+            # no reindexing is actually needed (dense case)
+            # preserve the fill_value
+            fill_value = array.fill_value
+    else:
+        ranges = np.broadcast_arrays(
+            *np.ix_(*(tuple(np.arange(size) for size in shape[:axis]) + (idx[mask],)))
+        )
+        coords = np.stack(ranges, axis=0).reshape(array.ndim, -1)
+        data = array[..., mask].reshape(-1)
 
     reindexed = sparse.COO(
         coords=coords,
         data=data.astype(dtype, copy=False),
         shape=(*array.shape[:axis], to.size),
+        fill_value=fill_value,
     )
 
     return reindexed
@@ -795,7 +832,11 @@ def reindex_(
 
     if array.shape[axis] == 0:
         # all groups were NaN
-        reindexed = np.full(array.shape[:-1] + (len(to),), fill_value, dtype=array.dtype)
+        shape = array.shape[:-1] + (len(to),)
+        if array_type in (ReindexArrayType.AUTO, ReindexArrayType.NUMPY):
+            reindexed = np.full(shape, fill_value, dtype=array.dtype)
+        else:
+            raise NotImplementedError
         return reindexed
 
     from_ = pd.Index(from_)
@@ -1044,7 +1085,7 @@ def chunk_argreduce(
         sort=sort,
         user_dtype=user_dtype,
     )
-    if not isnull(results["groups"]).all():
+    if not all(isnull(results["groups"])):
         idx = np.broadcast_to(idx, array.shape)
 
         # array, by get flattened to 1D before passing to npg
@@ -1288,7 +1329,7 @@ def _finalize_results(
     fill_value = agg.fill_value["user"]
     if min_count > 0:
         count_mask = counts < min_count
-        if count_mask.any():
+        if count_mask.any() or reindex.array_type is ReindexArrayType.SPARSE_COO:
             # For one count_mask.any() prevents promoting bool to dtype(fill_value) unless
             # necessary
             if fill_value is None:
@@ -2286,6 +2327,8 @@ def _factorize_multiple(
     if any_by_dask:
         import dask.array
 
+        from . import dask_array_ops  # noqa
+
         # unifying chunks will make sure all arrays in `by` are dask arrays
         # with compatible chunks, even if there was originally a numpy array
         inds = tuple(range(by[0].ndim))
@@ -2478,7 +2521,7 @@ def groupby_reduce(
         array's dtype.
     method : {"map-reduce", "blockwise", "cohorts"}, optional
         Note that this arg is chosen by default using heuristics.
-        Strategy for reduction of dask arrays only:
+        Strategy for reduction of dask arrays only.
           * ``"map-reduce"``:
             First apply the reduction blockwise on ``array``, then
             combine a few newighbouring blocks, apply the reduction.
@@ -2815,6 +2858,15 @@ def groupby_reduce(
             array.dtype,
         )
 
+        if reindex.array_type is ReindexArrayType.SPARSE_COO:
+            if not HAS_SPARSE:
+                raise ImportError("Package 'sparse' must be installed to reindex to a sparse.COO array.")
+            if not _is_sparse_supported_reduction(func):
+                raise NotImplementedError(
+                    f"Aggregation {func=!r} is not supported when reindexing to a sparse array. "
+                    "Please raise an issue"
+                )
+
         if TYPE_CHECKING:
             assert isinstance(reindex, ReindexStrategy)
             assert method is not None


=====================================
flox/xarray.py
=====================================
@@ -113,7 +113,7 @@ def xarray_reduce(
         DType for the output. Can be anything that is accepted by ``np.dtype``.
     method : {"map-reduce", "blockwise", "cohorts"}, optional
         Note that this arg is chosen by default using heuristics.
-        Strategy for reduction of dask arrays only:
+        Strategy for reduction of dask arrays only.
           * ``"map-reduce"``:
             First apply the reduction blockwise on ``array``, then
             combine a few newighbouring blocks, apply the reduction.


=====================================
flox/xrutils.py
=====================================
@@ -159,7 +159,9 @@ def notnull(data):
         return out
 
 
-def isnull(data):
+def isnull(data: Any):
+    if data is None:
+        return False
     if not is_duck_array(data):
         data = np.asarray(data)
     scalar_type = data.dtype.type
@@ -177,7 +179,7 @@ def isnull(data):
     else:
         # at this point, array should have dtype=object
         if isinstance(data, (np.ndarray, dask_array_type)):  # noqa
-            return pd.isnull(data)
+            return pd.isnull(data)  # type: ignore[arg-type]
         else:
             # Not reachable yet, but intended for use with other duck array
             # types. For full consistency with pandas, we should accept None as
@@ -374,9 +376,10 @@ def _select_along_axis(values, idx, axis):
 def nanfirst(values, axis, keepdims=False):
     if isinstance(axis, tuple):
         (axis,) = axis
-    values = np.asarray(values)
+    if not is_duck_array(values):
+        values = np.asarray(values)
     axis = normalize_axis_index(axis, values.ndim)
-    idx_first = np.argmax(~pd.isnull(values), axis=axis)
+    idx_first = np.argmax(~isnull(values), axis=axis)
     result = _select_along_axis(values, idx_first, axis)
     if keepdims:
         return np.expand_dims(result, axis=axis)
@@ -387,10 +390,11 @@ def nanfirst(values, axis, keepdims=False):
 def nanlast(values, axis, keepdims=False):
     if isinstance(axis, tuple):
         (axis,) = axis
-    values = np.asarray(values)
+    if not is_duck_array(values):
+        values = np.asarray(values)
     axis = normalize_axis_index(axis, values.ndim)
     rev = (slice(None),) * axis + (slice(None, None, -1),)
-    idx_last = -1 - np.argmax(~pd.isnull(values)[rev], axis=axis)
+    idx_last = -1 - np.argmax(~isnull(values)[rev], axis=axis)
     result = _select_along_axis(values, idx_last, axis)
     if keepdims:
         return np.expand_dims(result, axis=axis)


=====================================
tests/test_core.py
=====================================
@@ -24,6 +24,7 @@ from flox.core import (
     _choose_engine,
     _convert_expected_groups_to_index,
     _get_optimal_chunks_for_groups,
+    _is_sparse_supported_reduction,
     _normalize_indexes,
     _validate_reindex,
     factorize_,
@@ -43,6 +44,7 @@ from . import (
     assert_equal_tuple,
     has_cubed,
     has_dask,
+    has_sparse,
     raise_if_dask_computes,
     requires_cubed,
     requires_dask,
@@ -74,6 +76,10 @@ if has_cubed:
 
 
 DEFAULT_QUANTILE = 0.9
+REINDEX_SPARSE_STRAT = ReindexStrategy(blockwise=False, array_type=ReindexArrayType.SPARSE_COO)
+REINDEX_SPARSE_PARAM = pytest.param(
+    REINDEX_SPARSE_STRAT, marks=(requires_dask, pytest.mark.skipif(not has_sparse, reason="no sparse"))
+)
 
 if TYPE_CHECKING:
     from flox.core import T_Agg, T_Engine, T_ExpectedGroupsOpt, T_Method
@@ -320,13 +326,20 @@ def test_groupby_reduce_all(nby, size, chunks, func, add_nan_by, engine):
         if not has_dask or chunks is None or func in BLOCKWISE_FUNCS:
             continue
 
-        params = list(itertools.product(["map-reduce"], [True, False, None]))
+        params = list(
+            itertools.product(
+                ["map-reduce"],
+                [True, False, None, REINDEX_SPARSE_STRAT],
+            )
+        )
         params.extend(itertools.product(["cohorts"], [False, None]))
         if chunks == -1:
             params.extend([("blockwise", None)])
 
         combine_error = RuntimeError("This combine should not have been called.")
         for method, reindex in params:
+            if isinstance(reindex, ReindexStrategy) and not _is_sparse_supported_reduction(func):
+                continue
             call = partial(
                 groupby_reduce,
                 array,
@@ -360,6 +373,10 @@ def test_groupby_reduce_all(nby, size, chunks, func, add_nan_by, engine):
                 assert_equal(actual_group, expect, tolerance)
             if "arg" in func:
                 assert actual.dtype.kind == "i"
+            if isinstance(reindex, ReindexStrategy):
+                import sparse
+
+                expected = sparse.COO.from_numpy(expected)
             assert_equal(actual, expected, tolerance)
 
 
@@ -447,7 +464,7 @@ def test_numpy_reduce_nd_md():
 
 
 @requires_dask
- at pytest.mark.parametrize("reindex", [None, False, True])
+ at pytest.mark.parametrize("reindex", [None, False, True, REINDEX_SPARSE_PARAM])
 @pytest.mark.parametrize("func", ALL_FUNCS)
 @pytest.mark.parametrize("add_nan", [False, True])
 @pytest.mark.parametrize("dtype", (float,))
@@ -470,6 +487,9 @@ def test_groupby_agg_dask(func, shape, array_chunks, group_chunks, add_nan, dtyp
     if "arg" in func and (engine in ["flox", "numbagg"] or reindex):
         pytest.skip()
 
+    if isinstance(reindex, ReindexStrategy) and not _is_sparse_supported_reduction(func):
+        pytest.skip()
+
     rng = np.random.default_rng(12345)
     array = dask.array.from_array(rng.random(shape), chunks=array_chunks).astype(dtype)
     array = dask.array.ones(shape, chunks=array_chunks)
@@ -775,6 +795,7 @@ def test_groupby_reduce_axis_subset_against_numpy(func, axis, engine):
         (None, None),
         pytest.param(False, (2, 2, 3), marks=requires_dask),
         pytest.param(True, (2, 2, 3), marks=requires_dask),
+        pytest.param(REINDEX_SPARSE_PARAM, (2, 2, 3), marks=requires_dask),
     ],
 )
 @pytest.mark.parametrize(
@@ -821,7 +842,13 @@ def test_groupby_reduce_nans(reindex, chunks, axis, groups, expected_shape, engi
 @requires_dask
 @pytest.mark.parametrize(
     "expected_groups, reindex",
-    [(None, None), (None, False), ([0, 1, 2], True), ([0, 1, 2], False)],
+    [
+        (None, None),
+        (None, False),
+        ([0, 1, 2], True),
+        ([0, 1, 2], False),
+        pytest.param([0, 1, 2], REINDEX_SPARSE_PARAM),
+    ],
 )
 def test_groupby_all_nan_blocks_dask(expected_groups, reindex, engine):
     labels = np.array([0, 0, 2, 2, 2, 1, 1, 2, 2, 1, 1, 0])
@@ -2058,12 +2085,14 @@ def test_datetime_timedelta_first_last(engine, func) -> None:
 
 @requires_dask
 @requires_sparse
-def test_reindex_sparse():
+ at pytest.mark.xdist_group(name="sparse-group")
+ at pytest.mark.parametrize("size", [2**62 - 1, 11])
+def test_reindex_sparse(size):
     import sparse
 
     array = dask.array.ones((2, 12), chunks=(-1, 3))
     func = "sum"
-    expected_groups = pd.Index(np.arange(11))
+    expected_groups = pd.RangeIndex(size)
     by = dask.array.from_array(np.repeat(np.arange(6) * 2, 2), chunks=(3,))
     dense = np.zeros((2, 11))
     dense[..., np.arange(6) * 2] = 2
@@ -2083,9 +2112,39 @@ def test_reindex_sparse():
             assert isinstance(res, sparse.COO)
         return res
 
-    with patch("flox.core.reindex_") as mocked_func:
-        mocked_func.side_effect = mocked_reindex
-        actual, *_ = groupby_reduce(array, by, func=func, reindex=reindex, expected_groups=expected_groups)
-        assert_equal(actual, expected)
-        # once during graph construction, 10 times afterward
-        assert mocked_func.call_count > 1
+    # Define the error-raising property
+    def raise_error(self):
+        raise AttributeError("Access to '_data' is not allowed.")
+
+    with patch("flox.core.reindex_") as mocked_reindex_func:
+        with patch.object(pd.RangeIndex, "_data", property(raise_error)):
+            mocked_reindex_func.side_effect = mocked_reindex
+            actual, *_ = groupby_reduce(
+                array, by, func=func, reindex=reindex, expected_groups=expected_groups, fill_value=0
+            )
+            if size == 11:
+                assert_equal(actual, expected)
+            else:
+                actual.compute()  # just compute
+
+            # once during graph construction, 10 times afterward
+            assert mocked_reindex_func.call_count > 1
+
+
+def test_sparse_errors():
+    call = partial(
+        groupby_reduce,
+        [1, 2, 3],
+        [0, 1, 1],
+        reindex=REINDEX_SPARSE_STRAT,
+        fill_value=0,
+        expected_groups=[0, 1, 2],
+    )
+
+    if not has_sparse:
+        with pytest.raises(ImportError):
+            call(func="sum")
+
+    else:
+        with pytest.raises(ValueError):
+            call(func="first")



View it on GitLab: https://salsa.debian.org/debian-gis-team/flox/-/compare/7b038c3903ec3ec6e512ad500aa9d4ca5d5708a5...64637f1c7675b56b0758a04b2ae1c0bd7d0fe9c3

-- 
View it on GitLab: https://salsa.debian.org/debian-gis-team/flox/-/compare/7b038c3903ec3ec6e512ad500aa9d4ca5d5708a5...64637f1c7675b56b0758a04b2ae1c0bd7d0fe9c3
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20250413/f4dcfa99/attachment-0001.htm>


More information about the Pkg-grass-devel mailing list