Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

Scipy’s mannwhitneyu function

$
0
0

Without looking it up, can yousay what the following code does:

import numpy as np
from scipy import stats
a = np.arange(25)
b = np.arange(25)+4
print(stats.mannwhitneyu(a , b))

You probably guessed that it computesthe Mann-Whitney test between two samples, butexactly which test? The two-sided or the one-sided test?

You can’t tell from the code because it depends on which version of scipy you are runningand it has gone back and forth between the two! Pre-0.17.0 it used the one-sided test with the side being decided based on the input data. Thiswas obviously the wrong thing to do . Then,the API was fixed in 0.17.0 todo thetwo-sided test. This was considered a bad thing because it broke backwards compatibility and now it’s back to performing the one-sided test!I wish I was making this up.

Reading through the github issues ( #4933 , #6034 , #6062 , #6100 ) isan example of how open source projects can stagnate. There is a basic, simple, solution to the issue: create a corrected version of the function with a new name and deprecate the old one. This keeps backwards compatibility while allowing the project to fix its API. Once the issue had been identified, this should have been a 20 minute job. Reading through the issues, this simple solutionis proposed, discussed, seemingly agreed to. Instead, something else happens andat this point, it’d take me longer than 20 minutes to just read through the whole discussions.

This is not the first time I have run into numpy/scipy’s lack of respect for backwards compatibility either. Fortunately, there is a solution to this case, which is to use the full version:

stats.mannwhitneyu(a, b, alternative='two-sided')


Viewing all articles
Browse latest Browse all 9596

Trending Articles