Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec

X Mark channel Not-Safe-For-Work? cancel confirm NSFW Votes: (0 votes)

X Are you the publisher? Claim or contact us about this channel.

X 0

Showing article 2741 of 9596 in channel 64873560
Channel Details:

Title: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Channel Number: 64873560
Language: Chinese
Registered On: May 19, 2016, 6:27 pm
Number of Articles: 9596
Latest Snapshot: January 4, 2019, 5:20 pm
RSS URL: http://www.codesec.net/feed_5.xml
Publisher: https://www.codesec.net/feed_5.xml
Description: CodeSection,代码区,为Python爱好者和开发者提供最齐全的Python相关的技术教程文章
Catalog: //codesec8.rssing.com/catalog.php?indx=64873560

↧

Data manipulation with numpy: tips and tricks, part 2

January 7, 2017, 5:23 am

≫ Next: Python 常用 PEP8 编码规范和建议

≪ Previous: Symbolic mathematics on Linux

Data manipulation with numpy: tips and tricks, part 2

More examples on fast manipulations with data using numpy.

First part may be foundhere.

In[1]:

from __future__ import print_function import numpy as np

Rolling window, strided tricks

When working with time series / images it is frequently needed to do some operations on windows.

Simplest case: taking mean for running window:

In[2]:

sequence = np.random.normal(size=10000) + np.arange(10000)

Very bad ideais to do this with pure python

In[3]: def running_average_simple(seq, window=100): result = np.zeros(len(seq) - window) for i in range(len(result)): result[i] = np.mean(seq[i:i + window]) return result running_average_simple(sequence) Out[3]: array([ 49.43051858, 50.42845047, 51.43946518, ..., 9946.35091814, 9947.34962938, 9948.35901262])

A bit better is to use as_strided

In[4]: from numpy.lib.stride_tricks import as_strided def running_average_strides(seq, window=100): stride = seq.strides[0] sequence_strides = as_strided(seq, shape=[len(seq) - window + 1, window], strides=[stride, stride]) return sequence_strides.mean(axis=1) In[5]:

running_average_strides(sequence)

Out[5]: array([ 49.43051858, 50.42845047, 51.43946518, ..., 9947.34962938, 9948.35901262, 9949.35015443])

From computation side, as_strided does nothing. No copies and no computations, it only gives new view to the data, which is two-dimensional this time.

However the right way to compute mean over rolling window is using numpy.cumsum:

(this one is unbeatable in speed if n is not small)

In[6]: def running_average_cumsum(seq, window=100): s = np.insert(np.cumsum(seq), 0, [0]) return (s[window :] - s[:-window]) * (1. / window) In[7]:

running_average_cumsum(sequence)

Out[7]: array([ 49.43051858, 50.42845047, 51.43946518, ..., 9947.34962938, 9948.35901262, 9949.35015443])

See also for this purpose:

scipy.signal.smooth pandas.rolling_mean and similar functions

Remark: numpy.cumsum is equivalent to numpy.add.accumulate , but there are also:

numpy.maximum.accumulate , numpy.minimum.accumulate - running max and min numpy.multiply.accumulate , which is equivalent to numpy.cumprod

Remark: for computing rolling mean, numpy.cumsum is best, however for other window statistics like min/max/percentile, use strides trick.

Strides and training on sequences

ML algorithms in python are often taking numpy.arrays . In many cases when working with sequences you need to pass some data many times as part of different chunks.

Example: you have exhange rates for a year, you want GBDT to predict next exchange rate based on the previous 10.

In[8]: window = 10 rates = np.random.normal(size=1000) # target in training y = rates[window:]

Typically the solution used is:

In[9]: X1 = np.zeros([len(rates) - window, window]) for day in range(len(X1)): X1[day, :] = rates[day:day + window]

But strided tricks are better way, since they don't need additional space:

In[10]: stride, = rates.strides X2 = as_strided(

↧

Latest Images

【斗羅大陸：獵魂世界】諾丁鬥場教學｜千萬不要培養第二隊｜不用紅星武魂也能通關｜陣容使用推薦｜新手必看重點攻略｜#斗羅大陸獵魂世界 #斗羅大陸獵魂世界禮包碼...

【斗羅大陸：獵魂世界】諾丁鬥場教學｜千萬不要培養第二隊｜不用紅星武魂也能通關｜陣容使用推薦｜新手必看重點攻略｜#斗羅大陸獵魂世界 #斗羅大陸獵魂世界禮包碼...

July 20, 2025, 3:06 am

[LoliHouse] Princess-Session Orchestra - 15 [WebRip 1080p HEVC-10bit...

[LoliHouse] Princess-Session Orchestra - 15 [WebRip 1080p HEVC-10bit...

July 20, 2025, 3:03 am

CPU固定扣具導熱改良簡易分享

CPU固定扣具導熱改良簡易分享

July 20, 2025, 2:58 am

剛擺脫「黑戶」21歲青年兵役怎辦？中市府：若在學可延期徵集

剛擺脫「黑戶」21歲青年兵役怎辦？中市府：若在學可延期徵集

July 20, 2025, 2:21 am

本季5星好評！《Silent Witch沉默魔女的祕密》特裝版開放預購

本季5星好評！《Silent Witch沉默魔女的祕密》特裝版開放預購

July 20, 2025, 2:20 am

5G通訊翻新頁義傳推出眼鏡蛇MT2824全新晶片

5G通訊翻新頁義傳推出眼鏡蛇MT2824全新晶片

July 20, 2025, 1:30 am

台積電嘉科廠「50噸板車翻車」廠區停工 2個月內4起工安意外

台積電嘉科廠「50噸板車翻車」廠區停工 2個月內4起工安意外

July 20, 2025, 12:29 am

中職／明星賽曾頌恩全壘打大賽14轟奪冠兄弟隊史第4人

中職／明星賽曾頌恩全壘打大賽14轟奪冠兄弟隊史第4人

July 20, 2025, 12:24 am

在地黑毛豬、筊白筍登上營養午餐桃園食農教育首獲五星肯定

在地黑毛豬、筊白筍登上營養午餐桃園食農教育首獲五星肯定

July 20, 2025, 12:18 am

中國全面推行「網絡身份證」電子認證減低個資外洩風險

中國全面推行「網絡身份證」電子認證減低個資外洩風險

July 19, 2025, 11:58 pm

Trending Articles

印度计划大规模培训阿富汗军队人员

November 30, 2011, 12:28 am

配備齊全更臻成熟、售價調降展現誠意 Honda Fit 小改試駕報導

September 21, 2017, 6:33 pm

关门一家亲：习远平、张澜澜、徐才厚

December 23, 2020, 10:17 pm

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

August 10, 2025, 10:19 pm

[4K_HDR][DBD-Raws][鬼灭之刃：羁绊的奇迹以及柱训练/Kimetsu no Yaiba -To the Hashira...

May 27, 2025, 12:47 am

免费翻墙节点大全

August 18, 2024, 7:49 pm

Lorenzo 羅蘭索電動沙發疑似變形問題諮詢

August 15, 2017, 11:14 pm

“你们师父可真神了！”

July 15, 2024, 7:16 pm

<詢問>有人有在奧迪原廠購買新車鍍膜的經驗嗎

October 5, 2021, 10:57 pm

Universal Audio 发布 LUNA 专属 Ampex ATR‑102 母带开盘机扩展

September 12, 2020, 12:00 am

虛擬高速運算道路加速企業數位轉型

July 26, 2017, 9:27 pm

LeetCode 654 Maximum Binary Tree(递归)

October 9, 2017, 1:20 pm

占星视频资料裴恩18讲 (豆瓣 SUSAN MILLER小组)

September 18, 2018, 5:25 am

洗車的疑問???

December 7, 2016, 11:20 pm

UAD Ampex ATR 102——专门为 UAD-2 平台而开发的磁带模拟插件

January 3, 2014, 12:00 am

不靠姊姊！張柏芝弟弟開計程車維生

April 5, 2016, 10:46 pm

教育部鼓励厨房安装监控萤石为校园食品安全保驾护航

April 13, 2017, 4:12 am

V60柴油12萬公里原廠大保養分享

April 29, 2021, 9:06 pm

越坂康史days系列：48天島村舞花、58天树花凛、38天范田纱纱、39天杏堂怜，耻辱中的情欲，暂缺68天鳴海小春！

April 8, 2017, 11:43 pm

6亿md5密文分享 md5查询必备数据库

July 25, 2019, 11:38 pm

Latest Images

【斗羅大陸：獵魂世界】諾丁鬥場教學｜千萬不要培養第二隊｜不用紅星武魂也能通關｜陣容使用推薦｜新手必看重點攻略｜#斗羅大陸獵魂世界 #斗羅大陸獵魂世界禮包碼...

【斗羅大陸：獵魂世界】諾丁鬥場教學｜千萬不要培養第二隊｜不用紅星武魂也能通關｜陣容使用推薦｜新手必看重點攻略｜#斗羅大陸獵魂世界 #斗羅大陸獵魂世界禮包碼...

July 20, 2025, 3:06 am

[LoliHouse] Princess-Session Orchestra - 15 [WebRip 1080p HEVC-10bit...

[LoliHouse] Princess-Session Orchestra - 15 [WebRip 1080p HEVC-10bit...

July 20, 2025, 3:03 am

CPU固定扣具導熱改良簡易分享

CPU固定扣具導熱改良簡易分享

July 20, 2025, 2:58 am

剛擺脫「黑戶」21歲青年兵役怎辦？中市府：若在學可延期徵集

剛擺脫「黑戶」21歲青年兵役怎辦？中市府：若在學可延期徵集

July 20, 2025, 2:21 am

本季5星好評！《Silent Witch沉默魔女的祕密》特裝版開放預購

本季5星好評！《Silent Witch沉默魔女的祕密》特裝版開放預購

July 20, 2025, 2:20 am

5G通訊翻新頁義傳推出眼鏡蛇MT2824全新晶片

5G通訊翻新頁義傳推出眼鏡蛇MT2824全新晶片

July 20, 2025, 1:30 am

台積電嘉科廠「50噸板車翻車」廠區停工 2個月內4起工安意外

台積電嘉科廠「50噸板車翻車」廠區停工 2個月內4起工安意外

July 20, 2025, 12:29 am

中職／明星賽曾頌恩全壘打大賽14轟奪冠兄弟隊史第4人

中職／明星賽曾頌恩全壘打大賽14轟奪冠兄弟隊史第4人

July 20, 2025, 12:24 am

在地黑毛豬、筊白筍登上營養午餐桃園食農教育首獲五星肯定

在地黑毛豬、筊白筍登上營養午餐桃園食農教育首獲五星肯定

July 20, 2025, 12:18 am

中國全面推行「網絡身份證」電子認證減低個資外洩風險

中國全面推行「網絡身份證」電子認證減低個資外洩風險

July 19, 2025, 11:58 pm

© 2025 //www.rssing.com