Data manipulation with numpy: tips and tricks, part 1

Some inobvious examples of what you can do with numpy are collected here.

Examples are mostly coming from area of machine learning, but will be useful if you're doing number crunching in python.

In[1]:

from __future__ import print_function # for python 2 & python 3 compatibility %matplotlib inline import numpy as np

Sorting values of one array according to the other

Say, we want to order the people according to their age and their heights.

In[2]:

ages = np.random.randint(low=30, high=60, size=10) heights = np.random.randint(low=150, high=210, size=10) print(ages) print(heights)

[49 45 44 52 44 57 46 49 31 50] [209 183 202 188 205 179 209 187 156 209] In[3]: sorter = np.argsort(ages) print(ages[sorter]) print(heights[sorter]) [31 44 44 45 46 49 49 50 52 57] [156 202 205 183 209 209 187 209 188 179]

once you computed permutation, you can apply it many times - this is fast.

Frequently to solve this problem people use sorted(zip(ages, heights)), which is much slower (10-20 times slower on large arrays).

Computing inverse of permutation

permutations in numpy are simply arrays:

In[4]: permutation = np.random.permutation(10) original = np.array(list('abcdefghij')) print(permutation) print(original) print(original[permutation]) [1 7 4 9 2 8 0 5 6 3] ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'] ['b' 'h' 'e' 'j' 'c' 'i' 'a' 'f' 'g' 'd']

Inverse permutationis computed using numpy.argsort (again!)

In[5]: inverse_permutation = np.argsort(permutation) print(original[permutation][inverse_permutation]) ['a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j'] This is true because of two facts:

indexing operation is associative (dramatically simple and interesting fact):

a[c] = a[b[c]]

provided that a , b , c are 1-dimensional arrays

`permutation[inverse_permutation] is identical permutation: In[6]: permutation[inverse_permutation] Out[6]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Even faster inverse permutation

As was said, numpy.argsort returns inverse permutation, but it takes $O(n \log(n))$ time, while computing inverse permutation should take $O(n)$.

This optimal way can be written in numpy.

In[7]: print(np.argsort(permutation)) inverse_permutation = np.empty(len(permutation), dtype=np.int) inverse_permutation[permutation] = np.arange(len(permutation)) print(inverse_permutation) [6 0 4 9 2 7 8 1 5 3] [6 0 4 9 2 7 8 1 5 3] Computing order of elements in array

frequently it is important to compute order of each value in array.

In other words, for each element in array we want to find the number of elements smaller than given.

In[8]:

data = np.random.random(10) print(data) print(np.argsort(np.argsort(data)))

[ 0.69378073 0.47532658 0.11610735 0.89700047 0.04700985 0.24701576 0.29017543 0.27857828 0.62900998 0.08187479] [8 6 2 9 0 3 5 4 7 1]

NB:there is scipy function which does the same, but it's more general and faster, so prefer using it:

In[9]:

from scipy.stats import rankdata rankdata(data) - 1

Out[9]: array([ 8., 6., 2., 9., 0., 3., 5., 4., 7., 1.]) IronTransform (flattener of distribution)

Sometimes you need to write monotonic tranformation, which turns one distribution into uniform.

This method is useful to compare distributions or to work with distributions with heavy tails or strange shape.

In[10]: class IronTransform: def fit(self, data, weights): weights = weights / weights.sum() sorter = np.argsort(data) self.x = data

Data manipulation with numpy: tips and tricks, part 1

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎