Bug#923707: statsmodels FTBFS: array |= frame now returns frame

Rebecca N. Palmer rebecca_palmer at zoho.com
Mon Mar 4 07:50:47 GMT 2019


Source: pandas
Version: 0.23+dfsg-2
Severity: serious
Control: affects -1 src:statsmodels
Control: tags -1 patch

The fix for #918206 (setting __array_priority__ to make np.array @ 
DataFrame work) is technically API breaking: in-place operators

arr = np.array(...)
df = pd.DataFrame(...)
arr += df

used to leave arr as an np.array (though this does not appear to have 
been documented), but now turn it into a DataFrame.

The only known [0] place where this fails a test is in statsmodels: as 
[1] now returns a DataFrame rather than a Series, it is now an exception 
for data passed to statsmodels.formula.api.OLS to contain both missing 
(NaN) values and duplicate index labels.  As 
statsmodels.regression.tests.test_regression.test_missing_formula_predict 
contains such data, statsmodels 0.8.0-9 hence failed to build on the 
architectures where pandas was built first [2].

pandas upstream don't appear to have noticed this: they don't mention it 
in the discussion or release notes of the fix [3].  statsmodels 
upstream's response was to make this test stop using duplicate index 
names (without explicit comment) [4].

An alternative fix for #918206 that doesn't do this is

pandas/core/generic.py
      def __array_wrap__(self, result, context=None):
          d = self._construct_axes_dict(self._AXIS_ORDERS, copy=False)
+        if context is not None and context[0]==np.matmul
+            and not hasattr(context[1][0],'index'):
+            del d['index']
          return self._constructor(result, **d).__finalize__(self)

      # ideally we would define this to avoid the getattr checks, but

but this has not yet been tested in a full build.

[0] autopkgtest results - https://release.debian.org/britney/excuses.yaml
[1] 
https://sources.debian.org/src/patsy/0.5.0+git13-g54dcf7b-1/patsy/missing.py/#L136
[2] 
https://buildd.debian.org/status/fetch.php?pkg=statsmodels&arch=amd64&ver=0.8.0-9&stamp=1551564937&raw=0
[3] https://github.com/pandas-dev/pandas/pull/23114 
https://github.com/pandas-dev/pandas/commit/ad2a14f4bec8a004b2972c12f12ed3e4ce37ff52
[4] (please *do not upload* this without discussion, or we may lose my 
other statsmodels work to the freeze) 
https://github.com/statsmodels/statsmodels/commit/30c9ddbff8a072cbc1bebc7550b667e760cb386a#diff-2708847815406a7890933a960465d8e8



More information about the debian-science-maintainers mailing list