Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

Data manipulation with numpy: tips and tricks, part 1

$
0
0
Data manipulation with numpy: tips and tricks, part 1

Some inobvious examples of what you can do with numpy are collected here.

Examples are mostly coming from area of machine learning, but will be useful if you're doing number crunching in python.

In[1]:

from __future__ import print_function # for python 2 & python 3 compatibility %matplotlib inline import numpy as np

Sorting values of one array according to the other

Say, we want to order the people according to their age and their heights.

In[2]:

ages = np.random.randint(low=30, high=60, size=10) heights = np.random.randint(low=150, high=210, size=10) print(ages) print(heights)

[49 45 44 52 44 57 46 49 31 50] [209 183 202 188 205 179 209 187 156 209] In[3]: sorter = np.argsort(ages) print(ages[sorter]) print(heights[sorter]) [31 44 44 45 46 49 49 50 52 57] [156 202 205 183 209 209 187 209 188 179]

once you computed permutation, you can apply it many times - this is fast.

Frequently to solve this problem people use sorted(zip(ages, heights)), which is much slower (10-20 times slower on large arrays).

Computing inverse of permutation

permutations in numpy are simply arrays:

In[4]: permutation = np.random.permutation(10) original = np.array(list('abcdefghij')) print(permutation) print(original) print(original[permutation]) [1 7 4 9 2 8 0 5 6 3] ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'] ['b' 'h' 'e' 'j' 'c' 'i' 'a' 'f' 'g' 'd']

Inverse permutationis computed using numpy.argsort (again!)

In[5]: inverse_permutation = np.argsort(permutation) print(original[permutation][inverse_permutation]) ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'] This is true because of two facts:

indexing operation is associative (dramatically simple and interesting fact):

a[c] = a[b[c]]

provided that a , b , c are 1-dimensional arrays

`permutation[inverse_permutation] is identical permutation: In[6]: permutation[inverse_permutation] Out[6]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Even faster inverse permutation

As was said, numpy.argsort returns inverse permutation, but it takes $O(n \log(n))$ time, while computing inverse permutation should take $O(n)$.

This optimal way can be written in numpy.

In[7]: print(np.argsort(permutation)) inverse_permutation = np.empty(len(permutation), dtype=np.int) inverse_permutation[permutation] = np.arange(len(permutation)) print(inverse_permutation) [6 0 4 9 2 7 8 1 5 3] [6 0 4 9 2 7 8 1 5 3] Computing order of elements in array

frequently it is important to compute order of each value in array.

In other words, for each element in array we want to find the number of elements smaller than given.

In[8]:

data = np.random.random(10) print(data) print(np.argsort(np.argsort(data)))

[ 0.69378073 0.47532658 0.11610735 0.89700047 0.04700985 0.24701576 0.29017543 0.27857828 0.62900998 0.08187479] [8 6 2 9 0 3 5 4 7 1]

NB:there is scipy function which does the same, but it's more general and faster, so prefer using it:

In[9]:

from scipy.stats import rankdata rankdata(data) - 1

Out[9]: array([ 8., 6., 2., 9., 0., 3., 5., 4., 7., 1.]) IronTransform (flattener of distribution)

Sometimes you need to write monotonic tranformation, which turns one distribution into uniform.

This method is useful to compare distributions or to work with distributions with heavy tails or strange shape.

In[10]: class IronTransform: def fit(self, data, weights): weights = weights / weights.sum() sorter = np.argsort(data) self.x = data

Viewing all articles
Browse latest Browse all 9596

Trending Articles