Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all 9596 articles
Browse latest View live

为何人工智能(AI)首选Python?如何转行Pyth

$
0
0
一、为何人工智能(AI)首选python

读完这篇文章你就知道了。我们看谷歌的TensorFlow基本上所有的代码都是C++和Python,其他语言一般只有几千行 。如果讲运行速度的部分,用C++,如果讲开发效率,用Python,谁会用Java这种高不成低不就的语言搞人工智能呢?

Python虽然是脚本语言,但是因为容易学,迅速成为科学家的工具(MATLAB也能搞科学计算,但是软件要钱,且很贵),从而积累了大量的工具库、架构,人工智能涉及大量的数据计算,用Python是很自然的,简单高效。

Python有非常多优秀的深度学习库可用,现在大部分深度学习框架都支持Python,不用Python用谁?人生苦短,就用Python。

二、Python现状与发展趋势

python现在的确已经很火了,这已是一个不需要争论的问题。如果说三年前,Matlab、Scala、R、Java 和 还各有机会,局面尚且不清楚,那么三年之后,趋势已经非常明确了,特别是前两天 Facebook 开源了 PyTorch 之后,Python 作为 AI 时代头牌语言的位置基本确立,未来的悬念仅仅是谁能坐稳第二把交椅。

Python 已经是数据分析和 AI的第一语言,网络攻防的第一黑客语言,正在成为编程入门教学的第一语言,云计算系统管理第一语言。

Python 也早就成为Web 开发、游戏脚本、计算机视觉、物联网管理和机器人开发的主流语言之一,随着 Python 用户可以预期的增长,它还有机会在多个领域里登顶。

三、Python与人工智能

如果要从科技领域找出最大的变化和革新,那么我们很难不说到“人工智能”这个关键词。人工智能催生了大量新技术、新企业和新业态,为个人、企业、国家乃至全球提供了新的经济增长点,上到谷歌、苹果、百度等巨头,下到各类创业公司,人工智能已成为一个现象级的风口。短短几年时间,图片自动归类、人脸识别已经成为非常通用的功能,自然语言作为一种交互方式正在被各种语音助理广泛运用,无人车驾驶突飞猛进,AlphaGo战胜围棋冠军,仿生机器人的技术迭代,未来几十年的城市交通和人类的生活方式都将会被人工智能所改变。

Python作为人工智能首选编程语言,随着人工智能时代的到来,Python开发效率非常高,Python有非常强大的第三方库,基本上你想通过计算机实现任何功能,Python官方库里都有相应的模块进行支持,直接下载调用后,在基础库的基础上再进行开发,大大降低开发周期,避免重复造轮子,还有python的是可移植性、可扩展性、可嵌入性、少量代码可以做很多事,这就是为何人工智能(AI)首选Python。

四、程序员转人工智能

如今程序员转人工智能的优势就在于具备行业基础,既然不敢直接了当转去别的行业,为何不奋勇向前,继续IT之路?对于还没有毕业或者刚刚毕业的大学生,恰好也是在最好的时机,新青年可以很快接受、理解新事物,学习能力也更强,既年轻又有兴趣那是最好不过了。中国人工智能行业正处于一个创新发展时期,对人才的需求也在同步急剧增长。这就是相当于把握了时代发展的“商机”,未来回报讲师非常优厚的。

当决定做好了之后就去行动吧,时间不等人。

在不久的将来,多智时代一定会彻底走入我们的生活,有兴趣入行未来前沿产业的朋友,可以收藏 多智时代 ,及时获取人工智能、大数据、云计算和物联网的前沿资讯和基础知识,让我们一起携手,引领人工智能的未来!


python 包管理工具之pip简介

$
0
0
安装python包 pip install $package_name 查看已安装的python包 pip list 显示指定包信息 pip show $package_name 查看指定包包含的所有文件的列表(以及包信息) pip show -f $package_name 卸载包 pip uninstall $package_name 搜索包 pip search $package_name 查看帮助 pip -h pip sub-cmd -h 命令行自动补全 . <(pip completion --bash) 查看当前已安装包和版本号 pip freeze 检查已安装包完整性 pip check Done

相关问题:

pip install aliyuncli 后发现依然没有aliyuncli命令 查看aliyuncli安装到哪里了
# pip show -f aliyuncli Name: aliyuncli Version: 2.1.9 Summary: Universal Command Line Environment for aliyun Home-page: http://docs.aliyun.com/?spm=5176.1829009.1002.1.LxlLfS#/pub/aliyun-command-line-interface Author: aliyun-developers-efficiency Author-email: aliyun-developers-efficiency@list.alibaba-inc.com License: UNKNOWN Location: /home/phpor/.local/lib/python2.7/site-packages Requires: colorama, jmespath 查看安装包中有没有aliyuncli这个命令(肯定是有的)

$ pip show -f aliyuncli Name: aliyuncli Version: 2.1.9 Summary: Universal Command Line Environment for aliyun Home-page: http://docs.aliyun.com/?spm=5176.1829009.1002.1.LxlLfS#/pub/aliyun-command-line-interface Author: aliyun-developers-efficiency Author-email: aliyun-developers-efficiency@list.alibaba-inc.com License: UNKNOWN Location: /home/phpor/.local/lib/python2.7/site-packages Requires: colorama, jmespath Files: ../../../bin/aliyun_completer ../../../bin/aliyun_zsh_complete.sh ../../../bin/aliyuncli ... 基本可知,aliyuncli相关命令安装在了/home/phpor/.local/bin 下面,该路径应该没有在$PATH中,添加到$PATH就行了 Done

Finding bad colored pixels between boundaries

$
0
0

In an image I have a large number of cells of various colors separated by black boundaries. However, the boundaries were not drawn perfectly, and now some cells have a handful of pixels of the wrong color (every cell should contain only 1 color).

In the following image, I have encircled the pixels that are the wrong color. The blue pixels encircled in the top-left should be grey, and the grey pixels encircled in the other three spots should be blue.


Finding bad colored pixels between boundaries

Question: How do I find the wrong colored pixels in order to replace them with the right color?

Currently I am using python and NumPy to load images into an array and with a double for-loop going row by column checking every pixel.

My current method involves for every pixel checking the pixels that directly border it (row +1, row -1, column +1 and column -1). If these are a different non-black color, I check that pixel's bordering pixels, and if their color is different from the original pixel, then I change the color of the original pixel.

However, it doesn't work correctly when there are more than one incorrect pixel next to each other, leading to the following image:


Finding bad colored pixels between boundaries

Below is the script I use. I am looking for either a way to improve it, or a different algorithm altogether. The image required by the code is right below it. I have already matched its name in the code to the name stackoverflow gave it.

import Image import numpy as np BLACK = (0,0,0) im = Image.open("3gOg0.png").convert('RGB') im.load() im_array = np.asarray(im, dtype="int32") (height, width, dim) = im_array.shape newim_array = np.array(im_array) for row in range(height): for col in range(width): rgb = tuple(im_array[row,col]) if rgb == BLACK: continue n = tuple(im_array[row-1,col]) s = tuple(im_array[row+1,col]) e = tuple(im_array[row,col+1]) w = tuple(im_array[row,col-1]) if n != BLACK and n != rgb: nn = tuple(im_array[row-2,col]) ne = tuple(im_array[row-1,col+1]) nw = tuple(im_array[row-1,col-1]) if (nn != BLACK and nn != rgb) or (nw != BLACK and nw != rgb) or (ne != BLACK and ne != rgb): newim_array[row,col] = n continue if s != BLACK and s != rgb: ss = tuple(im_array[row+2,col]) se = tuple(im_array[row+1,col+1]) sw = tuple(im_array[row+1,col-1]) if (ss != BLACK and ss != rgb) or (sw != BLACK and sw != rgb) or (se != BLACK and se != rgb): newim_array[row,col] = s continue if e != BLACK and e != rgb: ee = tuple(im_array[row,col+2]) ne = tuple(im_array[row-1,col+1]) se = tuple(im_array[row+1,col+1]) if (ee != BLACK and ee != rgb) or (se != BLACK and se != rgb) or (ne != BLACK and ne != rgb): newim_array[row,col] = e continue if w != BLACK and w != rgb: ww = tuple(im_array[row,col-2]) nw = tuple(im_array[row-1,col-1]) sw = tuple(im_array[row+1,col-1]) if (ww != BLACK and ww != rgb) or (nw != BLACK and nw != rgb) or (sw != BLACK and sw != rgb): newim_array[row,col] = w im2 = Image.fromarray(np.uint8(newim_array)) im2.save("fix.png")

This is the example image in correct non-zoomed size:


Finding bad colored pixels between boundaries

Sounds like you have 2 issues:

What are the regions? What color should each be?

To find the regions, and fill each with what is the most common color within it currently:

For each non-black pixel not visited yet: Start a new region; initialize a counter for each color Recursively: Mark the pixel as in-region Increment the counter for that color Visit each of the adjacent pixels that are not black nor in-region When done, Color all of the in-region pixels to the color with the highest count, and mark them as visited

傻瓜式学Python3――列表

$
0
0

傻瓜式学Python3――列表

前言: 好久不见,突然发觉好久没写博客了,最近迷上了 python 无法自拔,了解了一下, Python 简单易学,尤其是接触过 java 的人,入门 Python 更是门槛极低,本着学习记录的原则,边学习边记录,有利于梳理学习的成果,也有利于后期的复盘,所以今天开始 Python 学习之旅的第一篇博客,纯粹只是记录。

简单的语法定义就不记录了,从数据结构开始,程序最重要的就是操作数据,学习一门编程语言毫无疑问必须掌握它独有的数据结构。现在就从 列表 开始。

列表是 Python 编程中使用频率极高的数据结构,由一系列按特定顺序排列的元素组成,用 [] 表示,逗号分隔元素,类似 Java 中的 数组 。列表被创建来存储数据,是动态的,随时可以对列表进行 crud 操作。由于列表包含多个元素,所以通常命名为复数形式,如 names,letters 等。 基本格式 fruits = ["apple","bananer","oranger"] print(fruits) 复制代码 访问列表元素

跟大多数编程语言一样,python列表数据的访问也是通过索引来获取,第一个元素从 0 开始,最后一个元素索引为列表的总数据和减一。 python 也提供可另一种特殊语法,可以直接使用 -1 作为索引获取最后一个元素,依次减一逆向获取数据。获取到列表数据之后 可以直接拿来进行任何操作。

print(fruits[0])//获取第一个元素 `apple` print(fruits[-1])//获取最后一个元素`oranger` 复制代码 修改列表元素

直接根据索引拿到对应位置的元素,对其重新赋值即可。

fruits[0] = "watermelon"//修改第一个元素 print(fruits)//重新打印列表 复制代码 此时列表变更为: ['watermelon', 'bananer', 'oranger'] 增加元素 调用 append() 方法在列表末尾添加元素 fruits.append("Plum") print(fruits) 复制代码 此时列表为: ['apple', 'bananer', 'oranger', 'Plum'] 调用 insert() 方法在指定索引出插入数据 fruits.insert(1,"pear")//在索引为1处插入数据 print(fruits) 复制代码 此时列表为: ['apple', 'pear', 'bananer', 'oranger'] 删除列表中的数据 知道索引的情况下,直接使用 del 删除数据 del fruits[0]//删除第一个数据 print(fruits) 复制代码 此时列表为: ['pear', 'bananer', 'oranger'] 调用 pop() 弹出列表列表数据,并返回弹出的数据。如果不传参数,默认弹出列表最后一个元素,若传入索引值,则弹出指定索引元素 print(fruits.pop())//弹出最后一个元素,并打印 print(fruits) print(fruits.pop(0))//弹出第一个元素,并打印 print(fruits) 复制代码

注意此时结果为:

oranger ['pear', 'bananer'] pear ['bananer'] 复制代码 假如不知道元素的索引,而直到具体要删除的元素值,也可以直接调用 remove() 方法删除,注意 删除之后,还可以继续使用该元素。 fruits = ["apple","bananer","oranger","prea"] print(fruits) delete = "bananer"//删除的元素 fruits.remove(delete)//调用方法删除指定元素值 print(fruits) print(delete)//最后打印删除掉的元素 复制代码

结果为:

['apple', 'bananer', 'oranger', 'prea'] ['apple', 'oranger', 'prea'] bananer 复制代码 列表排序 调用 sort() 对列表元素进行排序,默认是按自然顺序排序,如果要逆向排序,可传入 参数 reverse=True ,排序之后 列表的顺序就永久改变了。 fruits = ["bananer","apple","oranger","prea"] print(fruits) fruits.sort() print(fruits) 复制代码

结果为:

['bananer', 'apple', 'oranger', 'prea'] ['apple', 'bananer', 'oranger', 'prea'] 复制代码 假如我们想临时该表一下列表的顺序,可以使用 sorted() 方法 fruits = ["bananer","apple","oranger","prea"] print(fruits) print(sorted(fruits)) print(fruits) 复制代码

结果为:

['bananer', 'apple', 'oranger', 'prea'] ['apple', 'bananer', 'oranger', 'prea'] ['bananer', 'apple', 'oranger', 'prea'] 复制代码

可见 列表顺序并没有改变。

使用 reserse() 方法,翻转列表元素 fruits = ["bananer","apple","oranger","prea"] print(fruits) fruits.reverse()//翻转列表元素 print(fruits) 复制代码

结果为:

['bananer', 'apple', 'oranger', 'prea'] ['prea', 'oranger', 'apple', 'bananer'] 复制代码 使用 len() 方法获取列表长度 fruits = ["bananer","apple","oranger","prea"] print(len(fruits)) 复制代码

结果明显为4.

for 循环遍历列表

这跟 java 中类似,格式为 for xxx in 列表名: ,遍历获取到列表数据之后,我们可以对它进行任何操作

fruits = ["bananer","apple","oranger","prea"] for fruit in fruits: print(fruit) 复制代码

结果为遍历打印出各个元素:

bananer apple oranger prea 复制代码 注意:python 中代码块是没有{}的,都是用4个空格缩进代表代码块,for循环,if 循环,while循环,方法体都是如此,编写代码时一定要注意缩进 快速组装数字列表

ranger() 传入起始值和结束值可以按顺序产生一系列数字,再使用 list() 可以快速组装一定任意范围的数字列表

numbers = list(range(1,6))//组装列表 print(numbers) 复制代码 结果: [1, 2, 3, 4, 5] 其实使用遍历也可以实现,但是这种方法更加便捷。 列表生成式 使用 [表达式 for 变量值 in range(x,x) if xxx] 只需一条语句就可以快速生成数值列表,其中表达式是遍历数值结果进行操作,还可以添加 if 条件。 numbers = [x * x for x in range(1,6)]//求平方数值列表 print(numbers) 复制代码 结果为: [1, 4, 9, 16, 25] 这种表达式相当简洁,原先的好几行只需一行就能搞定。 使用切片裁剪获取子列表 使用 列表名[x:y] 裁剪获取对应索引区间的子列。假如省略起始值x,默认从0索引开始裁剪,假如省略结束值y,默认裁剪余下的所有元素。 fruits = ["bananer","apple","oranger","prea"] print(fruits[0:2]) 复制代码 结果为: ['bananer', 'apple']

至此,基本的列表操作差不多都总结完了,顺便记录一下 元组

(xxx,yyy,zzz) 小白式学 python3 第一篇总结,如有不妥之处,麻烦在评论区指出,谢谢~

参考:廖大教程https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000

更多原创文章会在公众号第一时间推送,欢迎扫码关注 张少林同学


傻瓜式学Python3――列表

I wrote a free book about TDD and clean architecture in Python

$
0
0

Hey HN,

I just published on Leanpub a free book, "Clean Architectures in python". It's a humble attempt to organise and expand some posts I published on my blog in the last years.

You can find it here: https://leanpub.com/clean-architectures-in-python

The main content is divided in two parts, this is a brief overview of the table of contents

* Part 1 - Tools - Chapter 1 - Introduction to TDD - Chapter 2 - On unit testing - Chapter 3 - Mocks

* Part 2 - The clean architecture - Chapter 1 - Components of a clean architecture - Chapter 2 - A basic example - Chapter 3 - Error management - Chapter 4 - Database repositories

Some highlights:

- The book is written with beginners in mind

- It contains 3 full projects, two small ones to introduce TDD and mocks, a bigger one to describe the clean architecture approach

- Each project is explained step-by-step, and each step is linked to a tag in a companion repository on GitHub

The book is free, but if you want to contribute I will definitely appreciate the help. My target is to encourage the discussion about software architectures, both in the Python community and outside it.

I hope you will enjoy the book! Please spread the news on your favourite social network

基于TextRank算法的文本摘要(附Python代码)

$
0
0

TextRank 算法是一种用于文本的基于图的排序算法,通过把文本分割成若干组成单元(句子),构建节点连接图,用句子之间的相似度作为边的权重,通过循环迭代计算句子的TextRank值,最后抽取排名高的句子组合成文本摘要。本文介绍了 抽取型文本摘要算法 TextRank,并使用python实现TextRank算法在多篇单领域文本数据中抽取句子组成摘要的应用。

介绍

文本摘要是自然语言处理(NLP)的应用之一,一定会对我们的生活产生巨大影响。随着数字媒体的发展和出版业的不断增长,谁还会有时间完整地浏览整篇文章、文档、书籍来决定它们是否有用呢?值得高兴的是,这项技术已经在这里了。

你有没有用过inshorts这个手机app?它是一个创新的新闻app,可以将新闻文章转化成一篇60字的摘要,这正是我们将在本文中学习的内容―― 自动文本摘要 。


基于TextRank算法的文本摘要(附Python代码)

自动文本摘要是自然语言处理(NLP)领域中最具挑战性和最有趣的问题之一。它是一个从多种文本资源(如书籍、新闻文章、博客帖子、研究类论文、电子邮件和微博)生成简洁而有意义的文本摘要的过程。

由于大量文本数据的可获得性,目前对自动文本摘要系统的需求激增。

通过本文,我们将探索文本摘要领域,将了解TextRank算法原理,并将在Python中实现该算法。上车,这将是一段有趣的旅程!

目录 一、文本摘要方法 二、TextRank算法介绍 三、问题背景介绍 四、TextRank算法实现 五、下一步是什么? 一、文本摘要方法
基于TextRank算法的文本摘要(附Python代码)

早在20世纪50年代,自动文本摘要已经吸引了人们的关注。在20世纪50年代后期,Hans Peter Luhn发表了一篇名为《The automatic creation of literature abstract》的研究论文,它利用词频和词组频率等特征从文本中提取重要句子,用于总结内容。

参考链接:

http://courses.ischool.berkeley.edu/i256/f06/papers/luhn58.pdf

另一个重要研究是由Harold P Edmundson在20世纪60年代后期完成,他使用线索词的出现(文本中出现的文章题目中的词语)和句子的位置等方法来提取重要句子用于文本摘要。此后,许多重要和令人兴奋的研究已经发表,以解决自动文本摘要的挑战。

参考链接:

http://courses.ischool.berkeley.edu/i256/f06/papers/luhn58.pdf

文本摘要可以大致分为两类―― 抽取型摘要 和 抽象型摘要 :

抽取型摘要:这种方法依赖于从文本中提取几个部分,例如短语、句子,把它们堆叠起来创建摘要。因此,这种抽取型的方法最重要的是识别出适合总结文本的句子。

抽象型摘要:这种方法应用先进的NLP技术生成一篇全新的总结。可能总结中的文本甚至没有在原文中出现。

本文,我们将关注于抽取式摘要方法。

二、TextRank算法介绍

在开始使用TextRank算法之前,我们还应该熟悉另一种算法――PageRank算法。事实上它启发了TextRank!PageRank主要用于对在线搜索结果中的网页进行排序。让我们通过一个例子快速理解这个算法的基础。

PageRank算法简介:
基于TextRank算法的文本摘要(附Python代码)

图 1 PageRank算法

假设我们有4个网页――w1,w2,w3,w4。这些页面包含指向彼此的链接。有些页面可能没有链接,这些页面被称为悬空页面。


基于TextRank算法的文本摘要(附Python代码)

w1有指向w2、w4的链接

w2有指向w3和w1的链接

w4仅指向w1

w3没有指向的链接,因此为悬空页面

为了对这些页面进行排名,我们必须计算一个称为PageRank的分数。这个分数是用户访问该页面的概率。

为了获得用户从一个页面跳转到另一个页面的概率,我们将创建一个正方形矩阵M,它有n行和n列,其中n是网页的数量。


基于TextRank算法的文本摘要(附Python代码)

矩阵中得每个元素表示从一个页面链接进另一个页面的可能性。比如,如下高亮的方格包含的是从w1跳转到w2的概率。


基于TextRank算法的文本摘要(附Python代码)
如下是概率初始化的步骤:
1.从页面i连接到页面j的概率,也就是M[i][j],初始化为1/页面i的出链接总数wi 2.如果页面i没有到页面j的链接,那么M[i][j]初始化为0 3.如果一个页面是悬空页面,那么假设它链接到其他页面的概率为等可能的,因此M[i][j]初始化为1/页面总数

因此在本例中,矩阵M初始化后如下:


基于TextRank算法的文本摘要(附Python代码)

最后,这个矩阵中的值将以迭代的方式更新,以获得网页排名。

三、TextRank算法

现在我们已经掌握了PageRank,让我们理解TextRank算法。我列举了以下两种算法的相似之处:

用句子代替网页

任意两个句子的相似性等价于网页转换概率

相似性得分存储在一个方形矩阵中,类似于PageRank的矩阵M

TextRank算法是一种抽取式的无监督的文本摘要方法。让我们看一下我们将遵循的TextRank算法的流程:


基于TextRank算法的文本摘要(附Python代码)

1.第一步是把所有文章整合成文本数据

2.接下来把文本分割成单个句子

3.然后,我们将为每个句子找到向量表示(词向量)。

4.计算句子向量间的相似性并存放在矩阵中

5.然后将相似矩阵转换为以句子为节点、相似性得分为边的图结构,用于句子TextRank计算。

6.最后,一定数量的排名最高的句子构成最后的摘要。

让我们启动Jupyter Notebook,开始coding!

备注:如果你想了解更多图论知识,我推荐你参考这篇文章

https://www.analyticsvidhya.com/blog/2018/09/introduction-graph-theory-applications-python/

三、问题背景介绍

作为一个网球爱好者,我一直试图通过对尽可能多的网球新闻的阅读浏览来使自己随时了解这项运动的最新情况。然而,事实证明这已经是一项相当困难的工作!花费太多的资源和时间是一种浪费。

因此,我决定设计一个系统,通过扫描多篇文章为我提供一个要点整合的摘要。如何着手做这件事?这就是我将在本教程中向大家展示的内容。我们将在一个爬取得到的文章集合的文本数据集上应用TextRank算法,以创建一个漂亮而简洁的文章摘要。


基于TextRank算法的文本摘要(附Python代码)

请注意:这是一个单领域多文本的摘要任务,也就是说,我们以多篇文章输入,生成的是一个单要点摘要。本文不讨论多域文本摘要,但您可以自己尝试一下。

数据集下载链接:

https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/10/tennis_articles_v4.csv

四、TextRank算法实现

所以,不用再费心了,打开你的Jupyter Notebook,让我们实现我们迄今为止所学到的东西吧!

1. 导入所需的库

首先导入解决本问题需要的库


基于TextRank算法的文本摘要(附Python代码)
2. 读入数据

现在读取数据,在上文我已经提供了数据集的下载链接。


基于TextRank算法的文本摘要(附Python代码)
3. 检查数据

让我们快速了解以下数据。


基于TextRank算法的文本摘要(附Python代码)

数据集有三列,分别是‘article_id’,‘article_text’,和‘source’。我们对‘article_text’列的内容最感兴趣,因为它包含了文章的文本内容。让我们打印一些这个列里的变量的值,具体看看它们是什么样。


基于TextRank算法的文本摘要(附Python代码)

输出:


基于TextRank算法的文本摘要(附Python代码)
基于TextRank算法的文本摘要(附Python代码)
基于TextRank算法的文本摘要(附Python代码)

现在我们有两种选择,一个是 总结单个文章 ,一个是 对所有文章进行内容摘要 。为了实现我们的目的,我们继续后者。

4. 把文本分割成句子

下一步就是把文章的文本内容分割成单个的句子。我们将使用nltk库中的sent_tokenize( )函数来实现。


基于TextRank算法的文本摘要(附Python代码)

打印出句子列表中的几个元素。


基于TextRank算法的文本摘要(附Python代码)

输出:


基于TextRank算法的文本摘要(附Python代码)
5. 下载GloVe词向量

GloVe词向量是单词的向量表示。这些词向量将用于生成表示句子的特征向量。我们也可以使用Bag-of-Words或TF-IDF方法来为句子生成特征,但这些方法忽略了单词的顺序,并且通常这些特征的数量非常大。

我们将使用预训练好的Wikipedia 2014 + Gigaword 5 (补充链接)GloVe向量,文件大小是822 MB。

GloVe词向量下载链接:

https://nlp.stanford.edu/data/glove.6B.zip


基于TextRank算法的文本摘要(附Python代码)

让我们提取词向量:


基于TextRank算法的文本摘要(附Python代码)

现在我们在字典中存储了400000个不同术语的词向量。

6. 文本预处理

尽可能减少文本数据的噪声是一个好习惯,所以我们做一些基本的文本清洗(包括移除标点符号、数字、特殊字符,统一成小写字母)。


基于TextRank算法的文本摘要(附Python代码)

去掉句子中出现的停用词(一种语言的常用词――is,am,of,in等)。如果尚未下载nltk-stop,则执行以下代码行:


基于TextRank算法的文本摘要(附Python代码)

现在我们可以导入停用词。


基于TextRank算法的文本摘要(附Python代码)

接下来定义移除我们的数据集中停用词的函数。


基于TextRank算法的文本摘要(附Python代码)

我们将在GloVe词向量的帮助下用clean_sentences(程序中用来保存句子的列表变量)来为我们的数据集生成特征向量。

7. 句子的特征向量

现在,来为我们的句子生成特征向量。我们首先获取每个句子的所有组成词的向量(从GloVe词向量文件中获取,每个向量大小为100个元素),然后取这些向量的平均值,得出这个句子的合并向量为这个句子的特征向量。


基于TextRank算法的文本摘要(附Python代码)
8. 相似矩阵准备

下一步是找出句子之间的相似性,我们将使用余弦相似性来解决这个问题。让我们为这个任务创建一个空的相似度矩阵,并用句子的余弦相似度填充它。

首先定义一个n乘n的零矩阵,然后用句子间的余弦相似度填充矩阵,这里n是句子的总数。


基于TextRank算法的文本摘要(附Python代码)

将用余弦相似度计算两个句子之间的相似度。


基于TextRank算法的文本摘要(附Python代码)

用余弦相似度初始化这个相似度矩阵。


基于TextRank算法的文本摘要(附Python代码)
9. 应用PageRank算法

在进行下一步之前,我们先将相似性矩阵sim_mat转换为图结构。这个图的节点为句子,边用句子之间的相似性分数表示。在这个图上,我们将应用PageRank算法来得到句子排名。


基于TextRank算法的文本摘要(附Python代码)
10. 摘要提取

最后,根据排名提取前N个句子,就可以用于生成摘要了。

Microsoft Announces a Public Preview of Python Support for Azure Functions

$
0
0

At the recent Connect() event , Microsoft announced the public preview of python support in Azure Functions. Developers can build functions using Python 3.6, based upon the open-source Functions 2.0 runtime and publish them to a Consumption plan.

Since the general availability of Azure Function runtime 2.0 , reportedearlier in October on InfoQ, support for Python has been one of the top requests and was available through a private preview. Now it is generally available, and developers can start building functions useful for data manipulation, machine learning, scripting, and automation scenarios.

The Azure runtime 2.0 has a language worker model, providing support for non-.NET languages such as Java, and Python. Hence, developers can import existing .py scripts and modules, and start writing functions . Furthermore, with the requirement.txt file developers can configure additional dependencies for pip.


Microsoft Announces a Public Preview of Python Support for Azure Functions
Source: https://azure.microsoft.com/en-us/blog/taking-a-closer-look-at-python-support-for-azure-functions/

With triggers and bindings available in the Azure Function programming model developers can configure an event that will trigger the function execution and any data sources that the function needs to orchestrate with. According to Asavari Tayal , Program Manager of the Azure Functions team at Microsoft, the preview release will support bindings to HTTP requests, timer events, Azure Storage, Cosmos DB, Service Bus, Event Hubs, and Event Grid. Once configured, developers can quickly retrieve data from these bindings or write back using the method attributes of your entry point function.

Developers familiar with Python do not have to learn any new tooling, they can debug and test functions locally using a Mac, linux, or windows machine. With the Azure Functions Core Tools (CLI) developers can get started quickly using trigger templates and publish directly to Azure, while the Azure platform will handle the build and configuration. Furthermore, developers can also use the Azure Functions extension for Visual Studio Code , including a Python extension, to benefit from auto-complete, IntelliSense, linting, and debugging for Python development, on any platform.


Microsoft Announces a Public Preview of Python Support for Azure Functions
Source: https://azure.microsoft.com/en-us/blog/taking-a-closer-look-at-python-support-for-azure-functions/

Hosting of Azure Functions written in Python language can be either through a Consumption Plan or Service App Plan . Tayal explains in the blog post around the Python preview:

Underneath the covers, both hosting plans run your functions in a docker container based on the open source azure-function/python base image . The platform abstracts away the container, so you're only responsible for providing your Python files and don't need to worry about managing the underlying Azure Functions and Python runtime.

Lastly, with the support for Python 3.6, Microsoft is following competitor Amazon’s offering AWS Lambda , which already supports this Python version. By promoting more languages for running code on a Cloud platform both Microsoft and Amazon try to reach a wider audience.

技术趋势:2019,人工智能开发的5种最佳编程语言

$
0
0

技术趋势:2019,人工智能开发的5种最佳编程语言
导引

AI(人工智能)为应用程序开发人员打开了一个充满可能性的世界。 通过利用机器学习或深度学习,您可以生成更好的用户画像、个性特征和适当推荐,或者包含更智能的搜索、语音接口或智能辅助,或者以其他方式改进您的应用程序。 您甚至可以构建能看、会听并响应的应用程序。

你应该学习哪种编程语言来探究AI的深度? 当然,您需要一种具有许多良好机器学习和深度学习库的语言。 它还应具有良好的运行时性能、良好的工具支持、大型程序员社区以及健康的支持包生态系统。 这仍然能留下很多好的选择。

以下是我对人工智能开发的五种最佳编程语言的选择,以及三项荣誉提及。 其中一些语言正在崛起,而其他语言似乎正在下滑。 几个月后回来,没准你可能会发现这些排名发生了变化。

那么,应该选择哪种编程语言进行机器学习或深度学习项目? 给你 推荐五种最佳 的编程语言选择。

一、AI编程首选 1.python

排名第一的是Python。 怎么可能是别的,真的吗? 虽然有一些关于Python的令人抓狂的事情 ――空格、Python 2.x和Python 3.x之间的重大分裂、五种不同的打包系统,它们都将不是问题――如果你正在进行AI工作,你几乎肯定会 在某些时候使用Python。

Python中提供的库在其他语言中几乎是无与伦比的。 NumPy已经变得如此普遍,它几乎是张量操作的标准API,而Pandas将R强大而灵活的数据帧带入Python。 对于自然语言处理(NLP),您拥有令人尊敬的NLTK和极其快速的SpaCy。 对于机器学习,有经过实战考验的Scikit-learn。 当涉及到深度学习时,所有当前的库(TensorFlow,PyTorch,Chainer,Apache MXNet,Theano等)都是有效的Python优先项目。

如果您正在阅读关于arXiv的尖端深度学习研究,那么几乎可以肯定您会在Python中找到源代码。 然后是Python生态系统的其他部分。虽然IPython已成为Jupyter Notebook,而不是以Python为中心,但您仍会发现大多数Jupyter Notebook用户以及大多数在线共享的笔记本都使用Python。

没法绕过它。 Python是人工智能研究的最前沿语言,是你能找到最多的机器学习和深度学习框架的语言,也是AI世界中几乎所有人都会说的。 出于这些原因,Python是人工智能编程语言中的第一个,尽管你的编码作者每天至少会诅咒一次空白问题。

2.Java和他的朋友

JVM家族系列语言(Java,Scala,Kotlin,Clojure等)也是AI应用程序开发的绝佳选择。 无论是自然语言处理(CoreNLP)、张量操作(ND4J)还是完整的GPU加速深度学习堆栈(DL4J),您都可以使用丰富的库来管理所有部分。此外,您还可以轻松访问Apache Spark和Apache Hadoop等大数据平台。

Java是大多数企业的通用语言,Java 8和Java 9中提供了新的语言结构,编写Java代码并不是我们许多人记忆中的可憎体验。 用Java编写AI应用程序可能会让人觉得无聊,但它可以完成工作 - 您可以使用所有现有的Java基础架构进行开发、部署和监视。

3. C/C++

在开发AI应用程序时,C/C++不太可能是您的首选,但如果您在嵌入式环境中工作,并且无法负担Java虚拟机或Python解释器的开销,那么C/C++就是解决之道。当你需要从系统中获取最后一点性能时,你需要回到可怕的指针世界。

值得庆幸的是,现代的C/C++写起来还是很愉快的(诚实之言!)。 具体方法你是有的选择的――您可以深入了解堆栈底部,使用CUDA等库来编写直接在GPU上运行的代码,也可以使用TensorFlow或Caffe来访问灵活的高级API。 后者还允许您导入数据科学家可能使用Python构建的模型,然后以C/C++提供的所有速度在生产中运行它们。

请留意Rust在未来一年中在这个领域所做的事情。 结合C/C++的速度与类型和数据安全性,Rust是实现生产性能的绝佳选择,而不会产生安全问题。 并且对Rust来说,TensorFlow绑定已经可用了。

4.javascript

JavaScript? 到底几个意思? 淡定,听我说说――Google最近发布了TensorFlow.js,这是一个WebGL加速库,允许您在Web浏览器中训练和运行机器学习模型。 它还包括Keras API以及加载和使用在常规TensorFlow中训练的模型的能力。 这可能会吸引大量开发人员涌入AI领域。 虽然JavaScript目前没有与此处列出的其他语言相同的机器学习库访问权限,但很快开发人员将在他们的网页中添加神经网络,与添加React组件或CSS属性几乎相同。真是即赋予权力有让人震恐。

TensorFlow.js仍处于早期阶段。 目前它在浏览器中有用,但在Node.js中不起作用。 它还没有实现完整的TensorFlow API。 但是,我预计这两个问题将在2018年底之前得到解决,此后不久JavaScript将对人工智能进行入侵。

5.R

R进入前五名的底部,并且趋势向下。 R是数据科学家喜爱的语言。 但是,由于其以数据帧为中心的方法,其他程序员在第一次遇到R时会发现R有点混乱。 如果你有一个专门的R开发人员小组,那么使用与TensorFlow,Keras或H2O的集成进行研究,原型设计和实验是有意义的,但由于性能和操作问题,我不愿意推荐R用于生产用途 。 虽然您可以编写可以部署在生产服务器上的高性能R代码,但是使用该R原型并使用Java或Python重新编写它几乎肯定会更容易。

二、其它AI编程

当然,Python,Java,C/C++,JavaScript和R并不是唯一可用于AI编程的语言。 让我们来看看其它三种编程语言,这些语言并没有完全进入我们的前五――二上升,一下降。

1.Lua

几年前,Lua在人工智能领域处于领先地位。 使用Torch框架,Lua是最流行的深度学习开发语言之一,你仍然会在GitHub上遇到很多历史深度学习工作,用Lua/Torch定义模型。 我认为,为了研究和查看人们以前的工作,熟悉Lua是个好主意。 但随着TensorFlow和PyTorch这样的框架的到来,Lua的使用已大幅减少。

2.Julia

Julia是一种高性能的编程语言,专注于数值计算,这使得它非常适合“数学繁重”的AI世界。 虽然现在不是那种流行的语言选择,但像TensorFlow.jl和Mocha(受Caffe影响很大)这样的包装器提供了良好的深度学习支持。 如果你不介意那里还没有一个庞大的生态系统,但是希望从其专注于使高性能计算变得容易和迅速的过程中获益,这是个不错的选择。

3.Swift

正如我们要推出的那样,LLVM编译器和Swift编程语言的创建者Chris Lattner宣布推出Swift for TensorFlow,该项目承诺将Python提供的易用性与速度和静态类型检查相结合的编译型语言。 作为奖励,Swift for TensorFlow还允许您导入Python库(如NumPy)并在Swift代码中使用它们,就像使用任何其他库一样。

现在,Swift for Tensorflow目前处于开发的早期阶段,但是能够编写现代编程结构并获得速度和安全性的编译时保证,确实是一个诱人的前景。 即使你还没出去学习Swift,我也建议你留意这个项目。

结论

未来已来,作为在IT界生根发芽的你,2019,你准备好了吧?

立足现在,面向未来,投身到AI世界,去创造你想往的美好未来吧!


Programiz: Python time Module

$
0
0

python has a module named time to handle time-related tasks. To use functions defined in the module, we need to import the module first. Here's how:

import time

Here are commonly used time-related functions.

Python time.time()

The time() function returns the number of seconds passed since epoch.

For Unix system, January 1, 1970, 00:00:00 at UTC is epoch (the point where time begins).

import time seconds = time.time() print("Seconds since epoch =", seconds) Python time.ctime()

The time.ctime() function takes seconds passed since epoch as an argument and returns a string representing local time.

import time # seconds passed since epoch seconds = 1545925769.9618232 local_time = time.ctime(seconds) print("Local time:", local_time)

If you run the program, the output will be something like:

<samp>Local time: Thu Dec 27 15:49:29 2018</samp> Python time.sleep()

The sleep() function suspends (delays) execution of the current thread for the given number of seconds.

import time print("This is printed immediately.") time.sleep(2.4) print("This is printed after 2.4 seconds.")

To learn more, visit:Python sleep().

Before we talk about other time-related functions, let's explore time.struct_time class in brief.

time.struct_time Class

Several functions in the time module such as gmtime() , asctime() etc. either take time.struct_time object as an argument or return it.

Here's an example of time.struct_time object.

time.struct_time(tm_year=2018, tm_mon=12, tm_mday=27, tm_hour=6, tm_min=35, tm_sec=17, tm_wday=3, tm_yday=361, tm_isdst=0) Index Attribute Values 0 tm_year 0000, ...., 2018, ..., 9999 1 tm_mon 1, 2, ..., 12 2 tm_mday 1, 2, ..., 31 3 tm_hour 0, 1, ..., 23 4 tm_min 0, 1, ..., 59 5 tm_sec 0, 1, ..., 61 6 tm_wday 0, 1, ..., 6; Monday is 0 7 tm_yday 1, 2, ..., 366 8 tm_isdst 0, 1 or -1

The values (elements) of the time.struct_time object are accessible using both indices and attributes.

Python time.localtime()

The localtime() function takes the number of seconds passed since epoch as an argument and returns struct_time in local time .

import time result = time.localtime(1545925769) print("result:", result) print("\nyear:", result.tm_year) print("tm_hour:", result.tm_hour)

When you run the program, the output will be something like:

<samp>result: time.struct_time(tm_year=2018, tm_mon=12, tm_mday=27, tm_hour=15, tm_min=49, tm_sec=29, tm_wday=3, tm_yday=361, tm_isdst=0) year: 2018 tm_hour: 15 </samp>

If no argument or None is passed to localtime() , the value returned by time() is used.

Python time.gmtime()

The gmtime() function takes the number of seconds passed since epoch as an argument and returns struct_time in UTC .

import time result = time.gmtime(1545925769) print("result:", result) print("\nyear:", result.tm_year) print("tm_hour:", result.tm_hour)

When you run the program, the output will be:

<samp>result = time.struct_time(tm_year=2018, tm_mon=12, tm_mday=28, tm_hour=8, tm_min=44, tm_sec=4, tm_wday=4, tm_yday=362, tm_isdst=0) year = 2018 tm_hour = 8 </samp>

If no argument or None is passed to gmtime() , the value returned by time() is used.

Python time.mktime()

The mktime() function takes struct_time (or a tuple containing 9 elements corresponding to struct_time ) as an argument and returns the seconds passed since epoch in local time. Basically, it's the inverse function of localtime() .

import time t = (2018, 12, 28, 8, 44, 4, 4, 362, 0) local_time = time.mktime(t) print("Local time:", local_time)

The example below shows how mktime() and localtime() are related.

import time seconds = 1545925769 # returns struct_time t = time.localtime(seconds) print("t1: ", t) # returns seconds from struct_time s = time.mktime(t) print("\s:", seconds)

When you run the program, the output will be something like:

<samp>t1: time.struct_time(tm_year=2018, tm_mon=12, tm_mday=27, tm_hour=15, tm_min=49, tm_sec=29, tm_wday=3, tm_yday=361, tm_isdst=0) s: 1545925769.0 </samp> Python time.asctime()

The asctime() function takes struct_time (or a tuple containing 9 elements corresponding to struct_time ) as an argument and returns a string representing it. Here's an example:

import time t = (2018, 12, 28, 8, 44, 4, 4, 362, 0) result = time.asctime(t) print("Result:", result)

When you run the program, the output will be:

<samp>Result: Fri Dec 28 08:44:04 2018</samp> Python time.strftime()

The strftime() function takes struct_time (or tuple corresponding to it) as an argument and returns a string representing it based on the format code used. For example,

import time named_tuple = time.localtime() # get struct_time time_string = time.strftime("%m/%d/%Y, %H:%M:%S", named_tuple) print(time_string)

When you run program, the output will be something like:

<samp>12/28/2018, 09:47:41 </samp>

Here, %Y , %m , %d , %H etc. are format codes.

%Y %m %d %H %M %S

To learn more, visit: time.strftime() .

Python time.strptime()

The strptime() function parses a string representing time and returns struct_time .

import time time_string = "21 June, 2018" result = time.strptime(time_string, "%d %B, %Y") print(result)

When you run the program, the output will be:

<samp> time.struct_time(tm_year=2018, tm_mon=6, tm_mday=21, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=172, tm_isdst=-1) </samp>

Simple file server to serve current directory

$
0
0

I'm looking for a dead simple bin that I can launch up in the shell and have it serve the current directory (preferably not ..), with maybe a -p for specifying port. As it should be a development server, it should by default allow connections from localhost only, maybe with an option to specify otherwise. The simpler, the better.

Not sure which tags to use here.

Problem courtesy of: Reactormonk

Solution python -m SimpleHTTPServer

or

python -m SimpleHTTPServer 80

if you don't want to use the default port 8000. See the docs .

Solution courtesy of: David Pope

Discussion

There is the Perl app App::HTTPThis or I have often used a tiny Mojolicious server to do this. See my blog post from a while back.

Make a file called say server.pl . Put this in it.

#!/usr/bin/env perl use Mojolicious::Lite; use Cwd; app->static->paths->[0] = getcwd; any '/' => sub { shift->render_static('index.html'); }; app->start;

Install Mojolicious: curl get.mojolicio.us | sh and then run morbo server.pl .

Should work, and you can tweak the script if you need to.

Discussion courtesy of: Joel Berger

For Node, there's http-server :

$ npm install -g http-server $ http-server Downloads -a localhost -p 8080 Starting up http-server, serving Downloads on port: 8080 Hit CTRL-C to stop the server

Python has:

Python 3 : python -m http.server 8080 Python 2 : python -m SimpleHTTPServer 8080

Note that these two allow all connections (not just from localhost ). Sadly, there isn't a simple way to change the address.

Discussion courtesy of: Blender

Using Twisted Web :

twistd --pidfile= -n web --path . --port 8080

--pidfile= disables the PID file. Without it a twistd.pid file will be created in the current directory. You can also use --pidfile '' .

Discussion courtesy of: Cristian Ciupitu

This recipe can be found in it's original form on Stack Over Flow .

The Probability that One Normal Random Variable is Greater than Another

$
0
0

Suppose you have a Normal (Gaussian) random variable A with mean = 2150 and standard deviation = 70. And you have a second random variable B with mean = 2000 and std = 70. What is the probability that A is truly greater than B?

The idea is that if you select an A value it will usually be about 2150 but could be as low as about 2150 3*70 = 1940 or a high as 2150 + 3*70 = 2360. Similarly, B will usually be near to 2000 but could be as low as 2000 3*70 = 1790 or as high as 2000 + 3*70 = 2210. A will be greater than B most of the time, but there’s a chance A could be less than B.


The Probability that One Normal Random Variable is Greater than Another

What is the probability that A > B? I wrote a little python program to solve the problem in two ways. In the first approach, I used the NumPy random.normal() function to draw 100,000 samples and counted the number of times A > B. Using this approach I got P(A>B) = 0.9351.

In the second approach, I used the SciPy stats.norm.sf() function to get the result directly. The key math trick is to look at the distribution of A B. As it turns out, the mean of A B is just u_A u_B. And the std of A B is sqrt(s_A^2 + s_B^2). Using that approach, I got the same answer.

This was a relatively easy problem for me, because I used to teach Statistics in college so I knew about the difference between two Normal distributions, and I have a lot of experience with NumPy and SciPy so I knew there’d be helpful functions (I just had to do a little searching through the documentation).

So, what’s the point? Ultimately, I want to look at rating sports teams by using these probability techniques to generate ratings where the ratings will be those that give the highest probability of observed results (“maximum likelihood estimation”, MLE). But that’s a few blog posts down the road.


The Probability that One Normal Random Variable is Greater than Another

Numeric ratings are very useful. But sometimes a written, subjective evaluation of a product is more informative. From Amazon.com

Data Structures & Algorithms in Python

$
0
0
Time Complexity \(O(1)\) - Swap two numbers. \(O(logN)\) - Search in a sorted array with binary search. \(O(N)\) - Search for a maximum element in an unsorted array. \(O(N*logN)\) - Merge Sort, Quick Sort, Heap Sort. \(O(N^2)\) - Bubble Sort. \(O(2^N)\) - Travelling Salesman Problem with Dynamic Programming. \(O(N!)\) - Travelling Salesman Problem with Brute Force Search. Complexity Classes \(\text{P}\) - Polynomial One of the most fundamental complexity classes. Contains all decision problems that can be solved by a deterministic Turing machine. \(\text{P}\) is the class of computational problems that are efficiently solvable. Ex: sorting algorithms. \(\text{NP}\) - Non-deterministic Polynomial If we have a solution to a problem, we can verify this solution in polynomial time (by a deterministic Turing machine). For instance where the answer in Yes, have efficiently verifiable proofs of the fact that the answer is indeed yes. The complexity class \(\text{P}\) is contained in \(\text{NP}\). Most important question is \(\text{N}\) = \(\text{NP}\) is it true? Ex: Integer Factorization, Travelling Salesman Problem. \(\text{NP complete}\) A decision problem is \(\text{NP complete}\) when it is both in \(\text{NP}\) and \(\text{NP hard}\). Although any given solution to an \(\text{NP complete}\) problem can be verified in polynomial time, there is no known efficient way to locate a solution in the first place. We ususually just look for an approximate solution. Ex: Chinese Postman Problem, Graph Coloring, Hamiltonian Cycle. \(\text{NP hard}\) This is a class of problems that are at least as hard as the hardest problems in \(\text{NP}\). A problem H is \(\text{NP hard}\) when every problem L in \(\text{NP}\) can be reduced in polynomial time to H. As a consequence, finding a polynomial algorithm to solve any \(\text{NP hard}\) problem would give polynomial algorithms for all the problems in \(\text{NP}\). Ex: Halting problem. Linked List

code linked_list.py

# class to create a node that has data and pointer class node: def __init__(self, data=None): self.data = data self.next = None # class to create a linked list of nodes class linked_list: def __init__(self): self.head = node() def append(self, data): new_node = node(data) cur = self.head while cur.next != None: cur = cur.next cur.next = new_node def length(self): cur = self.head total = 0 while cur.next != None: total += 1 cur = cur.next return total def display(self): elems = [] cur_node = self.head while cur_node.next != None: cur_node = cur_node.next elems.append(cur_node.data) print(elems) def get(self, index): if index >= self.length(): print("ERROR: index out of range!") return None cur_idx = 0 cur_node = self.head while True: cur_node = cur_node.next if cur_idx == index: return cur_node.data cur_idx += 1 def erase(self, index): if index >= self.length(): print("ERROR: index out of range!") return None cur_idx = 0 cur_node = self.head while True: last_node = cur_node cur_node = cur_node.next if cur_idx == index: last_node.next = cur_node.next return cur_idx += 1 if __name__ == '__main__': my_list = linked_list() my_list.append(1) my_list.append(2) my_list.append(3) my_list.append(4) my_list.display() print("Element at 2nd index: {}".format(my_list.get(2))) my_list.erase(2) print("Elements after erasing element at index 2") my_list.display() [1, 2, 3, 4] Element at 2nd index: 3 Elements after erasing element at index 2 [1, 2, 4] Bubble Sort
Data Structures &amp; Algorithms in Python
Bubble Sort

code bubble_sort.py

from random import randint # create randomized array of length "length" # array integers are of range 0, maxint def create_array(length=10, maxint=50): new_arr = [randint(0, maxint) for _ in range(length)] return new_arr #------------------------------------- # bubble sort algorithm to input array #------------------------------------- def bubble_sort(arr): swapped = True while swapped: swapped = False for i in range(1, len(arr)): if arr[i-1] > arr[i]: arr[i], arr[i-1] = arr[i-1], arr[i] swapped = True return arr if __name__ == '__main__': a = create_array() print(a) a = bubble_sort(a) print(a) [37, 36, 13, 12, 43, 4, 32, 14, 32, 4] [4, 4, 12, 13, 14, 32, 32, 36, 37, 43] Merge Sort
Data Structures &amp; Algorithms in Python
Merge Sort

code merge_sort.py

from random import randint # create randomized array of length "length"m # array integers are of range 0, maxint def create_array(length=10, maxint=50): new_arr = [randint(0, maxint) for _ in range(length)] return new_arr #------------------------------------- # merge sort to combine two arrays #------------------------------------- def merge(a,b): # final output array c = [] a_idx, b_idx = 0, 0 while a_idx<len(a) and b_idx<len(b): if a[a_idx]<b[b_idx]: c.append(a[a_idx]) a_idx += 1 else: c.append(b[b_idx]) b_idx += 1 if a_idx == len(a): c.extend(b[b_idx:]) else: c.extend(a[a_idx:]) return c #------------------------------------- # merge sort algorithm to input array #------------------------------------- def merge_sort(a): # a list of zero or one elements is sorted, by definition if len(a) <= 1: return a # split the list in half and call merge sort recursively on each half mid = int(len(a)/2) left, right = merge_sort(a[:mid]), merge_sort(a[mid:]) # merge the now-sorted sublists return merge(left,right) if __name__ == '__main__': a = create_array() print(a) s = merge_sort(a) print(s) [45, 8, 25, 1, 32, 37, 34, 3, 4, 3] [1, 3, 3, 4, 8, 25, 32, 34, 37, 45] Quick Sort
Data Structures &amp; Algorithms in Python
Quick Sort

code quick_sort.py

from random import randint # create randomized array of length "length"m # array integers are of range 0, maxint def create_array(length=10, maxint=50): new_arr = [randint(0, maxint) for _ in range(length)] return new_arr # quick sort algorithm to input array def quick_sort(a): # a list of zero or one elements is sorted, by definition if len(a) <= 1: return a # list to hold values based on pivot smaller, equal, larger = [], [], [] # choose a random pivot element pivot = a[randint(0,len(a)-1)] # iterate over each element and compare with pivot for x in a: if x<pivot: smaller.append(x) elif x==pivot: equal.append(x) else: larger.append(x) # recursively quick sort sub list and concatenate return quick_sort(smaller) + equal + quick_sort(larger) if __name__ == '__main__': a = create_array() print(a) s = quick_sort(a) print(s) [3, 27, 12, 8, 12, 39, 1, 2, 23, 8] [1, 2, 3, 8, 8, 12, 12, 23, 27, 39]

How do I access array elements in a Django model&quest;

$
0
0
I am getting an array arr passed to my Django template. I want to access individual elements of the array in the array e.g. arr[0] , arr[1] etc instead of looping through the whole array.

Is there a way to do that in a Django template?

Thank You.

Remember that the dot notation in a Django template is used for four different notations in python. In a template, foo.bar can mean any of:

foo[bar] # dictionary lookup foo.bar # attribute lookup foo.bar() # method call foo[bar] # list-index lookup

It tries them in this order until it finds a match. So foo.3 will get you your list index because your object isn't a dict with 3 as a key, doesn't have an attribute named 3, and doesn't have a method named 3.

MLflow v0.8.1 Features Faster Experiment UI and Enhanced Python Model

$
0
0

MLflow v0.8.1 was released this week. It introduces several UI enhancements, including faster load times for thousands of runs and improved responsiveness when navigating runs with many metrics and parameters. Additionally, it expands support for evaluating python models as Apache Spark UDFs and automatically captures model dependencies as Conda environments.

Now available on [PyPI] and with docs online , you can install this new release with pip install mlflow as described in the MLflow quickstart guide .

In this post, we will elaborate on a couple of MLflow v0.8.1 features:

A faster and more responsive MLflow UI experience when navigating experiments with hundreds or thousands of runs Expanded functionality of pyfunc_model when loaded as a Spark UDF. These UDFs can now return multiple scalar or string columns. Added support to automatically capture dependencies in a Conda environment when saving models, ensuring that they can be loaded in a new environment Ability to run MLflow projects from ZIP files Faster and Improved MLflow UI Experience

In our continued commitment to give ML developers an enjoyable experience, this release adds further enhancements to MLflow Experiment UI:

Faster Display of Experiments : The improved MLflow UI can quickly display thousands of experiment runs, including all of their associated parameters and artifacts. Users who train large numbers of models should observe quicker response times.

Better Visualizations With Interactive Scatter Plots: Scatter plots for comparing runs are now interactive, providing greater insight into model performance characteristics.

Enhanced Python Model as Spark UDF

When scoring Python models as Apache Spark UDFs, users can now filter UDF outputs by selecting from an expanded set of result types. For example, specifying a result type of pyspark.sql.types.DoubleType filters the UDF output and returns the first column that contains double precision scalar values. Specifying a result type of pyspark.sql.types.ArrayType(DoubleType) returns all columns that contain double precision scalar values. The example code below demonstrates result type selection using the result_type parameter. And the short example notebook illustrates Spark Model logged and then loaded as a Spark UDF.


MLflow v0.8.1 Features Faster Experiment UI and Enhanced Python Model

By default, pyfunc models produced by MLflow API calls such as save_model() and log_model() now include a Conda environment specifying all of the versioned dependencies necessary for loading them in a new environment. For example, the default Conda environment for the model trained in the example below has the following yaml representation:

channels: - defaults dependencies: - python=3.5.2 - pyspark=2.4.0 name: mlflow-env Other Features and Bug Fixes

In addition to these features, several other new pieces of functionality are included in this release. Some items worthy of note are:

Features [API/CLI] Support for running MLflow projects from ZIP files ( #759 , @jmorefieldexpe) [Python API] Support for passing model conda environments as dictionaries to save_model and log_model functions ( #748 , @dbczumar) [Models] Default Anaconda environments have been added to many Python model flavors. By default, models produced by save_model and log_model functions will include an environment that specifies all of the versioned dependencies necessary to load and serve the models. Previously, users had to specify these environments manually. (# 705 , #707 , #708 , #749 , @dbczumar) [Scoring] Support for synchronous deployment of models to SageMaker ( #717 , @dbczumar) [Tracking] Include the Git repository URL as a tag when tracking an MLflow run within a Git repository ( #741 , @whiletruelearn, @mateiz) [UI] Improved runs UI performance by using a react-virtualized table to optimize row rendering ( #765 , #762 , #745 , @smurching) [UI] Significant performance improvements for rendering run metrics, tags, and parameter information ( #764 , #747 , @smurching) [UI] Scatter plots, including run comparison plots, are now interactive ( #737 , @mateiz) [UI] Extended CSRF support by allowing the MLflow UI server to specify a set of expected headers that clients should set when making AJAX requests ( #733 , @aarondav) Bug fixes [Python/Scoring] MLflow Python models that produce Pandas DataFrames can now be evaluated as Spark UDFs correctly. Spark UDF outputs containing multiple columns of primitive types are now supported ( #719 , @tomasatdatabricks) [Scoring] Fixed a serialization error that prevented models served with Azure ML from returning Pandas DataFrames ( #754 , @dbczumar) [Docs] New example demonstrating how the MLflow REST API can be used to create experiments and log run information ( #750 , kjahan) [Docs] R documentation has been updated for clarity and style consistency ( #683 , @stbof) [Docs] Added clarification about user setup requirements for executing remote MLflow runs on Databricks ( #736 , @andyk)

The full list of changes and contributions from the community can be found in the 0.8.1 Changelog . We welcome more input on mlflow-users@googlegroups.com or by filing issues on GitHub. For real-time questions about MLflow, we also offer a Slack channel. Finally, you can follow @MLflowOrg on Twitter for the latest news.

What is the fastest axis of an array?

$
0
0

One of the participants in our geocomputing course asked us a tricky question earlier this year. She was a C++ and Java programmer ― we often teach experienced programmers who want to learn about python and/or machine learning ― and she worked mostly with seismic data. She had a question related to the performance of n-dimensional arrays: what is the fastest axis of a NumPy array?

I’ve written before about how computational geoscience is not ‘software engineering’ and not ‘computer science’, but something else. And there’s a well established principle in programming, first expressed by Michael Jackson :

We follow two rules in the matter of optimization:

Rule 1: Don’t do it.

Most of the time the computer is much faster than we need it to be, so we don’t spend too much time thinking about making our programs faster. We’re mostly concerned with making them work, then making them correct. But sometimes we have to think about speed. And sometimes that means writing smarter code. (Other times it means buying another GPU.) If your computer spends its days looping over seismic volumes extracting slices for processing, you should probably know whether you want to put time in the first dimension or the last dimension of your array.

The 2D case

Let’s think about a two-dimensional case first ― imagine a small 2D array, also known as a matrix in some contexts. I’ve coloured in the elements of the matrix to make the next bit easier to understand.


What is the fastest axis of an array?

When we store a matrix in a computer (or an image, or any array), we have a decision to make. In simple terms, the computer’s memory is like a long row of boxes, each with a unique address ― shown here as a 3-digit hexadecimal number:


What is the fastest axis of an array?

We can only store one number in each box, so we’re going to have to flatten the 2D array. The question is, do we put the rows in together, effectively splitting up the columns, or do we put the columns in together? These two options are commonly known as ‘row major’, or C-style, and ‘column major’, or Fortran-style:


What is the fastest axis of an array?

Let’s see what this looks like in terms of the indices of the elements. We can plot the index number on each axis vs. the position of the element in memory. Notice that the C-ordered elements are contiguous in axis 0:


What is the fastest axis of an array?

If you spend a lot of time loading seismic data, you probably recognize this issue ― it’s analgous to how traces are stored ina SEG-Y file. Of couse, with seismic data, two dimensions aren’t always enough…

Higher dimensions

The problem multiplies at higher dimensions. If we have a cube of data, then C-style ordering results in the first dimension having large contiguous chunks, and the last dimension being broken up. The middle dimension is somewhere in between. As before, we can illustrating this by plotting the indices of the data. This time I’m highlighting the positions of the elements with index 2 (i.e. the third element) in each dimension:


What is the fastest axis of an array?

So if this was a seismic volume, we might organize inlines in the first dimension, and travel-time in the last dimension. That way, we can access inlines very quickly, but timeslices will take longer.

In Fortran order, which we can optionally specify in NumPy, the situation is reversed. Now the fast axis is the last axis:


What is the fastest axis of an array?

Lots of programming languages and libraries use row-major memory layout, including C, C++, Torch and NumPy. Most others use column-major ordering, including MATLAB, R, Julia, and Fortran. (Some other languages, such as Java and .NET, use a variant of row-major order called Iliffe vectors ). NumPy calls row-major order ‘C’ (for C, not for column), and column-major ‘F’ for Fortran (thankfully they didn’t use R, for R not for row).

I expect it’s related to their heritage, but the Fortran-style languages also start counting at 1, whereas the C-style languages, including Python, start at 0.

What difference does it make?

The main practical difference is in the time it takes to access elements in different orientations. It’s faster for the computer to take a contiguous chunk of neighbours from the memory ‘boxes’ than it is to have to ‘stride’ across the memory taking elements from here and there.

How much faster? To find out, I made datasets full of random numbers, then selected slices and added 1 to them. This was the simplest operation I could think of that actually forces NumPy to do something with the data. Here are some statistics ― the absolute times are pretty irrelevant as the data volumes I used are all different sizes, and the speeds will vary on different machines and architectures:

2D data: 3.6× faster.Axis 0: 24.4 s, axis 1: 88.1 s (times relative to first axis: 1, 3.6).

3D data: 43× faster.229 s, 714 s, 9750 s (relatively 1, 3.1, 43).

4D data: 24× faster.1.27 ms, 1.36 ms, 4.77 ms, 30 ms (relatively 1, 1.07, 3.75, 23.6).

5D data: 20× faster.3.02 ms, 3.07 ms, 5.42 ms, 11.1 ms, 61.3 ms (relatively 1, 1.02, 1.79, 3.67, 20.3).

6D data: 5.5× faster.24.4 ms, 23.9 ms, 24.1 ms, 37.8 ms, 55.3 ms, 136 ms (relatively 1, 0.98, 0.99, 1.55, 2.27, 5.57).

These figures are more or less simply reversed for Fortran-ordered arrays (see the notebook for datails).

Clearly, the biggest difference is with 3D data, so if you are manipulating seismic data a lot and need to access the data in that last dimension, usually travel-time, you might want to think about ways to reduce this overhead.

What difference does it really make?

The good news is that, for most of us most of the time, we don’t have to worry about any of this. For one thing, NumPy’s internal workings (in particular, its universal functions , or ufuncs) know which directions are fastest and take advantage of this when possible. For another thing, we generally try to avoid looping over arrays at all, leaving the iterative components of our algorithms to the ufuncs ― so the slicing speed isn’t a factor. Even when it is a factor, or if we can’t avoid looping, it’s often not the bottleneck in the code. Usually the guts of our algorithm are what are slowing the computer down, not the access to memory. The net result of all this is that we don’t often have to think about the memory layout of our arrays.

So when does it matter? The following situations merit a bit of thought:

When you’re doing a very large number of accesses to memory or disk. Saving a few microseconds might add up to a lot if you’re doing it a billion times.

When the objects you’re accessing are v

Faster In-Memory Python Module Importing

$
0
0

I recently blogged about distributing standalone python applications. In that post, I announced PyOxidizer - a tool which leverages Rust to produce standalone executables embedding Python. One of the features of PyOxidizer is the ability to import Python modules embedded within the binary using zero-copy.

I also recently blogged about global kernel locks in APFS , which make filesystem operations slower on macOS. This was the latest wrinkle in a long battle against Python's slow startup times, which I've posted about on the official python-dev mailing list over the years .

Since I announced PyOxidizer a few days ago, I've had some productive holiday hacking sessions!

One of the reached milestones is PyOxidizer now supports macOS.

With that milestone reached, I thought it would be interesting to compare the performance of a PyOxidizer executable versus a standard CPython build.

I produced a Python script that imports almost the entirety of the Python standard library - at least the modules implemented in Python. That's 508 import statements. I then executed this script using a typical python3.7 binary (with the standard library on the filesystem) and PyOxidizer-produced standalone executables with a module importer that loads Python modules from memory using zero copy.

# Homebrew installed CPython 3.7.2 # Cold disk cache. $ sudo purge $ time /usr/local/bin/python3.7 < import_stdlib.py real 0m0.694s user 0m0.354s sys 0m0.121s # Hot disk cache. $ time /usr/local/bin/python3.7 < import_stdlib.py real 0m0.319s user 0m0.263s sys 0m0.050s # PyOxidizer with non-PGO/non-LTO CPython 3.7.2 $ time target/release/pyapp < import_stdlib.py real 0m0.223s user 0m0.201s sys 0m0.017s # PyOxidizer with PGO/non-LTO CPython 3.7.2 $ time target/release/pyapp < import_stdlib.py real 0m0.234s user 0m0.210s sys 0m0.019 # PyOxidizer with PTO+LTO CPython 3.7.2 $ sudo purge $ time target/release/pyapp < import_stdlib.py real 0m0.442s user 0m0.252s sys 0m0.059s $ time target/release/pyall < import_stdlib.py real 0m0.221s user 0m0.197s sys 0m0.020s

First, the PyOxidizer times are all relatively similar regardless of whether PGO or LTO is used to build CPython. That's not too surprising, as I'm exercising a very limited subset of CPython (and I suspect the benefits of PGO/LTO aren't as pronounced due to the nature of the CPython API).

But the bigger result is the obvious speedup with PyOxidizer and its in-memory importing: PyOxidizer can import almost the entirety of the Python standard library ~100ms faster - or ~70% of original - than a typical standalone CPython install with a hot disk cache! This comes out to ~0.19ms per import statement. If we run purge to clear out the disk cache, the performance delta increases to 252ms, or ~64% of original. All these numbers are on a 2018 6-core 2.9 GHz i9 MacBook Pro, which has a pretty decent SSD.

Using dtruss -c to execute the binaries, the breakdown in system calls occurring >10 times is clear:

# CPython standalone fstatfs64 16 read_nocancel 19 ioctl 20 getentropy 22 pread 26 fcntl 27 sigaction 32 getdirentries64 34 fcntl_nocancel 106 mmap 114 close_nocancel 129 open_nocancel 130 lseek 148 open 168 close 170 read 282 fstat64 403 stat64 833 # PyOxidizer lseek 10 read 12 read_nocancel 14 fstat64 16 ioctl 22 munmap 31 stat64 33 sysctl 33 sigaction 36 mmap 122 madvise 193 getentropy 315

PyOxidizer avoids hundreds of open() , close() , read() , fstat64() , and stat64() calls. And by avoiding these calls, PyOxidizer not only avoids the userland-kernel overhead intrinsic to them, but also any additional overhead that APFS is imposing via its global lock(s).

(Why the PyOxidizer binary is making hundreds of calls to getentropy() I'm not sure. It's definitely coming from Python as a side-effect of a module import and it is something I'd like to fix, if possible.)

With this experiment, we finally have the ability to better isolate the impact of filesystem overhead on Python module importing and preliminary results indicate that the overhead is not insignificant - at least on macOS (I'll get data for linux and windows later). While the test is somewhat contrived (I don't think many applications import the entirety of the Python standard library), some Python applications do import hundreds of modules. And as I've written before , milliseconds matter. This is especially true if you are invoking Python processes hundreds or thousands of times in a build system, when running a test suite, for scripting, etc. Cumulatively you can be importing tens of thousands of modules. So I think shaving even fractions of a millisecond from module importing is important.

It's worth noting that in addition to the system call overhead, CPython's path-based importer runs substantially more Python code than PyOxidizer and this likely contributes several milliseconds of overhead as well. Because PyOxidizer applications are static, the importer can remain simple (finding a module in PyOxidizer is essentially a Rust HashMap<String, Vec<u8> lookup). While it might be useful to isolate the filesystem overhead from Python code overhead, the thing that end-users care about is overall execution time: they don't care where that overhead is coming from. So I think it is fair to compare PyOxidizer - with its intrinsically simpler import model - with what Python typically does (scan sys.path entries and looking for modules on the filesystem).

Another difference is that PyOxidizer is almost completely statically linked. By contrast, a typical CPython install has compiled extension modules as standalone shared libraries and these shared libraries often link against other shared libraries (such as libssl). From dtruss timing information, I don't believe this difference contributes to significant overhead, however.

Finally, I haven't yet optimized PyOxidizer. I still have a few tricks up my sleeve that can likely shave off more overhead from Python startup. But so far the results are looking very promising. I dare say they are looking promising enough that Python distributions themselves might want to look into the area more thoroughly and consider distribution defaults that rely less on the every-Python-module-is-a-separate-file model.

Stay tuned for more PyOxidizer updates in the near future!

Python&colon; requests&period;exceptions&period;ConnectionErro ...

$
0
0
This is the script: import requests import json import urlparse from requests.adapters import HTTPAdapter s = requests.Session() s.mount('http://', HTTPAdapter(max_retries=1)) with open('proxies.txt') as proxies: for line in proxies: proxy=json.loads(line) with open('urls.txt') as urls: for line in urls: url=line.rstrip() data=requests.get(url, proxies=proxy) data1=data.content print data1 print {'http': line}

as you can see, its trying to access a list of urls through a list of proxies. Here is the urls.txt file:

http://api.exip.org/?call=ip

here is the proxies.txt file:

{"http":"http://107.17.92.18:8080"}

I got this proxy at www.hidemyass.com. Could it be a bad proxy? I have tried several and this is the result. Note: if you are trying to replicate this, you may have to update the proxy to a recent one at hidemyass.com. They seem to stop working eventually. here is the full error and traceback:

Traceback (most recent call last): File "test.py", line 17, in <module> data=requests.get(url, proxies=proxy) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get return request('get', url, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request return session.request(method=method, url=url, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 454, in send history = [resp for resp in gen] if allow_redirects else [] File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 144, in resolve_redirects allow_redirects=False, File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 438, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 327, in send raise ConnectionError(e) requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by <class 'httplib.BadStatusLine'>: '')

Looking at stack trace you've provided your error is caused by httplib.BadStatusLine exception, which, according to docs , is:

Raised if a server responds with a HTTP status code that we don’t understand.

In other words something that is returned (if returned at all) by proxy server cannot be parsed by httplib that does actual request.

From my experience with (writing) http proxies I can say that some implementations may not follow specs too strictly (rfc specs on http aren't easy reading actually) or use hacks to fix old browsers that have flaws in their implementation.

So, answering this:

Could it be a bad proxy?

... I'd say - that this is possible. The only real way to be sure is to see what is returned by proxy server.

Try to debug it with debugger or grab packet sniffer (something like Wireshark or Network Monitor ) to analyze what happens in the network. Having info about what exactly is returned by proxy server should give you a key to solve this issue.

超有趣!手把手教你用Python实现实时“人脸检测”

$
0
0

【51CTO.com原创稿件】Instagram 的联合创始人兼首席技术官 Mike Kreiger 说:“计算机视觉和机器学习其实已开始流行起来,但是对于大多数人来说,计算机看了图像后看到了什么这方面还是比较模糊。”


超有趣!手把手教你用Python实现实时“人脸检测”

近年来,计算机视觉这个神奇的领域已渐成气候。该技术在世界各地已有广泛的应用,而我们才开了个头!

我在这个领域最喜欢的事情之一是我们的社区拥抱开源这个概念。连各大科技巨头也愿意与每个人分享新的突破和创新,以便这项技术不会成为“有钱人的玩物”。

人脸检测就是这样一种技术,它在实际用例下拥有广泛的潜在用途(如果使用得当且符合伦理道德)。在本文中我将介绍如何使用开源工具构建一种功能强大的人脸检测算法。

人脸检测大有前景的应用

让我举几个表明人脸检测技术在普遍使用的典例。我确信你肯定在某个时候碰到过这些用例,只是没有意识到幕后使用了什么技术!

比如,Facebook 把图像手动标记换成了为上传到平台的每张图片建议自动生成的标记。

Facebook 使用一种简单的人脸检测算法来分析图像中人脸的像素,并将其与相关用户进行比较。

我们将学习如何自行构建一个人脸检测模型,但在深入介绍这方面的技术细节之前,不妨讨论另外几个用例。

我们习惯于使用最新的“人脸解锁”功能解锁手机。这是表明如何使用人脸检测技术来保持个人数据安全性的一个很小的例子。

同样技术可以在更大的规模内予以实现,使摄像头能够捕捉图像、检测人脸。

在广告、医疗保健和银行等行业,有另外几个鲜为人知的人脸检测应用。在大多数公司或甚至在许多会议中,你需要携带身份证件才能进入。

但如果我们能找到一种方法,不需要携带任何身份证件就能进入,将会怎么样?

人脸检测有助于使这个过程流畅简单。人只要看一眼摄像头,它就会自动检测要不要允许他/她进入。

人脸检测的另一个值得关注的应用是可以计算参加活动(比如会议或音乐会)的人数。

我们安装了一个可以捕获参与者图像并为我们提供总人数的摄像头,而不是手动计算参与者。这有助于使整个过程自动化,并节省大量手动工作。是不是觉得很有用?

在本文中我将着重介绍人脸检测的实际应用,简单介绍其中的算法是如何工作的。

如何使用手头开源工具实现人脸检测

你已了解了人脸检测技术的潜在应用场景,不妨看看我们如何使用手头的开源工具来实现这项技术。

具体就本文而言,这是我使用和推荐使用的软硬件:

用来在联想 E470 ThinkPad 笔记本电脑(酷睿 i5 第 7 代)上构建实时人脸检测系统的网络摄像头(罗技 C920)。

你还可以在其他任何适当的系统上使用笔记本电脑的内置摄像头或闭路电视摄像头用于实时视频分析,而不是采用我使用的这套设置。

使用 GPU 进行更快速的视频处理始终是额外好处。 在软件方面,我们使用了已安装所有必备软件的 Ubuntu 18.04 操作系统。

不妨更深入一点地探讨这几点,确保在构建人脸检测模型之前已正确设置好了一切。

第 1 步:硬件设置

你要做的第一件事是检查网络摄像头是否设置正确。Ubuntu 中的一个简单技巧是查看设备是否已被操作系统注册。

可以按照下列步骤来操作:

将网络摄像头连接到笔记本电脑之前,进入到命令提示符并输入 ls /dev/video*,检查所有已连接的视频设备。这会输出显示已连接到系统的视频设备。 连接网络摄像头,并再次运行上述命令。如果网络摄像头已成功连接,命令会显示一个新设备。 可以做的另一件事是使用任何网络摄像头软件来检查网络摄像头是否正常工作。你可以在 Ubuntu 中使用“Cheese”来执行这番操作。

这里我们可以看到网络摄像头已正确设置。硬件方面就是这些!

第 2 步:软件设置 ①安装 python

本文中的代码是用 Python 版本 3.5 构建的。虽然有多种方法来安装 Python,但我建议使用 Anaconda,这是最流行的数据科学 Python 发行版。

这是系统中安装 Anaconda 的链接:

https://www.anaconda.com/download ②安装 OpenCV

OpenCV(开源计算机视觉)是一个旨在构建计算机视觉应用程序的库。它有许多用于图像处理任务的预编写函数。

想安装 OpenCV,对库进行 pip 安装:

pip3installopencv-python ③安装 face_recognition API

最后,我们将使用 face_recognition,这号称是世界上最简单的面向 Python 的人脸识别 API。

想安装它,请运行下列命令:

pipinstalldlib pipinstallface_recognition 深入了解实现方式

现在你已设置好了系统,终于可以深入了解实际的实现方式。首先,我们将迅速构建程序,然后对其分解以了解我们所做的工作。

先创建一个文件 face_detector.py,然后拷贝如下所示的代码:

#importlibraries importcv2 importface_recognition #Getareferencetowebcam video_capture=cv2.VideoCapture("/dev/video1") #Initializevariables face_locations=[] whileTrue: #Grabasingleframeofvideo ret,frame=video_capture.read() #ConverttheimagefromBGRcolor(whichOpenCVuses)toRGBcolor(whichface_recognitionuses) rgb_frame=frame[:,:,::-1] #Findallthefacesinthecurrentframeofvideo face_locations=face_recognition.face_locations(rgb_frame) #Displaytheresults fortop,right,bottom,leftinface_locations: #Drawaboxaroundtheface cv2.rectangle(frame,(left,top),(right,bottom),(0,0,255),2) #Displaytheresultingimage cv2.imshow('Video',frame) #Hit'q'onthekeyboardtoquit! ifcv2.waitKey(1)&0xFF==ord('q'): break #Releasehandletothewebcam video_capture.release() cv2.destroyAllwindows()

然后,输入以下命令,运行该 Python 文件:

pythonface_detector.py

如果一切正常,会弹出一个新窗口,实时人脸检测在运行中。

总结一下,这是我们上述代码执行的操作:

首先,我们定义了将进行视频分析的硬件。 此后,我们实时捕捉视频,逐帧捕捉。 然后,我们处理每帧,并提取图像中所有人脸的位置。 最后,我们以视频形式渲染这些帧以及人脸位置。

是不是很简单?如果你想了解更具体的细节,我已在每个代码部分中包含注释。你可以随时返回查看。

人脸检测的用例

乐趣并不仅限于此!我们能做的另一件很酷的事情就是围绕上述代码构建完整的用例。而且你无需从头开始,我们只要对代码进行几处小小的改动即可。

比如说,假设你想构建一个基于摄像头的自动系统来实时跟踪说话人的位置。根据其位置,系统转动摄像头,以便说话人始终在视频的中间。

我们该如何解决这个问题?第一步是构建识别视频中一个人或多个人的系统,并关注说话人的位置。

不妨看看我们如何实现这一点。为了本文需要,我从 Youtube 上下载了一段视频(https://youtu.be/A_-KqX-RazQ),视频中有个人在 2017 年 DataHack 峰会上讲话。

首先,我们导入必要的库:

importcv2 importface_recognition

然后,阅读视频并获取长度:

input_movie=cv2.VideoCapture("sample_video.mp4") length=int(input_movie.get(cv2.CAP_PROP_FRAME_COUNT))

之后,我们创建一个拥有所需分辨率和帧速率的输出文件,与输入文件类似。

加载说话人的示例图像以便在视频中识别他:

image=face_recognition.load_image_file("sample_image.jpeg") face_encoding=face_recognition.face_encodings(image)[0] known_faces=[ face_encoding, ]

这一切都已完成,现在我们运行一个循环,它将执行以下操作:

从视频中提取帧。 找到所有人脸,并识别它们。 创建一个新视频,将原始帧与标注的说话人人脸位置相结合。

不妨看看这个代码:

#Initializevariables face_locations=[] face_encodings=[] face_names=[] frame_number=0 whileTrue: #Grabasingleframeofvideo ret,frame=input_movie.read() frame_number+=1 #Quitwhentheinputvideofileends ifnotret: break #ConverttheimagefromBGRcolor(whichOpenCVuses)toRGBcolor(whichface_recognitionuses) rgb_frame=frame[:,:,::-1] #Findallthefacesandfaceencodingsinthecurrentframeofvideo face_locations=face_recognition.face_locations(rgb_frame,model="cnn") face_encodings=face_recognition.face_encodings(rgb_frame,face_locations) face_names=[] forface_encodinginface_encodings: #Seeifthefaceisamatchfortheknownface(s) match=face_recognition.compare_faces(known_faces,face_encoding,tolerance=0.50) name=None ifmatch[0]: name="PhaniSrikant" face_names.append(name) #Labeltheresults for(top,right,bottom,left),nameinzip(face_locations,face_names): ifnotname: continue #Drawaboxaroundtheface cv2.rectangle(frame,(left,top),(right,bottom),(0,0,255),2) #Drawalabelwithanamebelowtheface cv2.rectangle(frame,(left,bottom-25),(right,bottom),(0,0,255),cv2.FILLED) font=cv2.FONT_HERSHEY_DUPLEX cv2.putText(frame,name,(left+6,bottom-6),font,0.5,(255,255,255),1) #Writetheresultingimagetotheoutputvideofile print("Writingframe{}/{}".format(frame_number,length)) output_movie.write(frame) #Alldone! input_movie.release() cv2.destroyAllWindows()

然后代码会给出这样的输出:

人脸检测真是了不起的本领。

结论

恭喜!你现在知道如何为许多潜在用例构建人脸检测系统了。深度学习是非常迷人的领域,我很期望下一步的方向。

我们在本文中学习了如何利用开源工具构建具有实际用途的实时人脸检测系统。

我鼓励各位构建众多这样的应用,并自己试一试。相信我,你能学到好多东西,而且蛮有意思。


超有趣!手把手教你用Python实现实时“人脸检测”

【51CTO原创稿件,合作站点转载请注明原文作者和出处为51CTO.com】

Software activities at AAS 233 in Seattle, Jan 2019

$
0
0

It’s that time of year again when software folks ― users and authors alike ― dream of all the software activities at the winter AAS meeting . So here is the ASCL’s (abbreviated*) annual round-up to jumpstart your dreams and warm your code-loving heart! If you have items you want added, please let me know in the comments below or send an email to editor@ascl.net. Thank you!

All rooms are in the Washington State Convention Center unless otherwise specified.

SATURDAY, 5 JANUARY 2019

Workshops

Introduction to Software Carpentry (Day 1), 9:00 AM 5:00 PM; 211

The AAS Chandra/CIAO Workshop, 9:00 AM 6:00 PM; 204

Using python to Search NASA’s Astrophysics Archives, 10:00 AM 11:30 AM; 213

SUNDAY, 6 JANUARY 2019

Workshops

SOFIA Workshop for FORCAST and HAWC+ Data Analysis, 8:30 AM 5:15 PM; 201

Adding LISA to your Astronomy Tool Box, 9:00 AM 5:00 PM; 213

Introduction to Software Carpentry (Day 2), 9:00 AM 5:00 PM; 211

Using Python and Astropy for Astronomical Data Analysis, 9:00 AM 5:00 PM; 4C-4

The AAS Chandra/CIAO Workshop, 9:00 AM 6:00 PM; 204

Advanced Searching in the New ADS: On the Web and Using the API, 3:00 PM 4:30 PM; 304

MONDAY, 7 JANUARY 2019

Splinter meetings

Data Science, 8:00 AM 6:00 PM, 4C-1

Updates on Implementing Software Citation in Astronomy, 12:30 PM 2:00 PM; 203

An Open Discussion on Astronomy Software, 2:00 PM 3:30 PM; 4C-4

Oral presentations

Session 126.Machine Learning in Astronomical Data Analysis, 2:00 PM 3:30 PM; 607 (5 presentations)

Also:

112.01. Constraining BH formation with 2M05215658+4359220, 10:00 AM 10:10 AM, 612

109.03. Real-time data reduction pipeline and image analysis software for FIREBall-2: first flight with a δ-doped UV-EMCCDs operating in counting mode, 10:30 AM 10:40 AM, 608

175.06. Python, Unix, Observing, and LaTeX: Introducing First Year Undergraduates to Astronomical Research, 10:50 AM 11:00 AM, 620

109.08. TESS Data Analysis using the community-developed Lightkurve Python Package, 11:20 AM 11:30 AM, 608

123.02D. A Uniform Analysis of Exoplanet Atmosphere Spectra Observed by HST WFC3 Is Consistent with Watery Worlds, 2:10 PM 2:30 PM, 6C

129.06. Reconstructing the Orphan Stream Progenitor with MilkyWay@home Volunteer Computing, 3:00 PM 3:10 PM, 611

Selected posters

144.25. Identifying and Comparing Centrally Star-Forming Galaxies Using MaNGA

144.29. Deriving star formation histories from photometric spectral energy distributions with diffusion k-means

144.30. Using Convolutional Neural Networks to predict Galaxy Metallicity from Three-Color Images

144.35. Automatic Detection and Analysis of Debris from Galactic Accretion Events

145.05. Galaxy Gradients Across Simulations

145.07. Reduction and Analysis of GMOS Spectroscopy for Herschel Sources in CANDELS

145.25. Comparison of the HI Signal Extraction Algorithms of SoFiA and ALFALFA

140.02. Tracking the TESS Pipeline

140.12. Undergraduates Can Find Planets Too

140.16. Identifying Transiting Exoplanets in with Deep Learning in K2 Data

140.20. The Impact of Small Statistics on Identifying Background False Positives in Kepler Data

140.23. AutoRegressive Planet Search for Ground-Based Transit Surveys

140.29. Getting to Know Your Star: A comparison of analytic techniques for deriving stellar parameters and abundances

149.18. NANOGrav: Data Accessibility, Analysis and Automation using Python

150.01. Revised Simulations of the Planetary Nebulae Luminosity Function

150.15. Identifying Binary Central Stars of Planetary Nebulae with PSF Fitting

158.02. HaloSat: X-Ray Calibration and Spectral Analysis for a NASA CubeSat

Selected iPosters

167.02. Modeling circumstellar dust around low-mass-loss rate carbon-rich AGB stars

167.04. The response of optical Fe II emission in AGNs to changes in the ionizing continuum, I: photoionization modelling

164.02. A Maximum Likelihood Approach to Extracting Photon-Starved Spectra of Directly Imaged Exoplanets

166.02. Smoothed Particle Inference Analysis of SNR DEM L71

171.03. The State of Software Tools for the Space Telescope Imaging Spectrograph

Other activities of possible interest

Monday, January 7: Data Science Splinter Meeting, 8:00 AM 6:00 PM, 4C-1

TUESDAY, 8 JANUARY 2019

Workshop

LSST Science Pipelines Stack Tutorial for AAS, 9:00 AM 5:00 PM; 310

Splinter meeting

Cafe SCiMMA: Conceptualizing an NSF Center for Scalable Cyberinfrastructure for Multimessenger Astrophysics, 3:15 PM 5:15 PM; Redwood (Sheraton Seattle Hotel)

Oral presentations

Session 225. Computation, Data Science, and Image Analysis, 2:00 PM 3:30 PM, 6E (6 presentations)

Also:

218.05. A Uniform Analysis of Kepler/K2 Exoplanet Transit Parameters, 10:40 AM 10:50 AM, 603

206.05D. High Resolution spatial analysis of z ~2 lensed galaxy using pixelated source-reconstruction algorithm, 10:50 AM 11:10 AM, 605/610

203.05. Atmosphere Retrieval of Planetary Mass Companions with the APOLLO Code: A Case Study of HD 106906b and Prospects for JWST, 11:00 AM 11:10 AM, 6B

207.10. astroquery: An Astronomical Web-Querying Package in Python, 11:03 AM 11:10 AM, 606

239.04D. Kinematics of Circumgalactic Gas and Cold Gas Accretion at Redshift z=0.2, 2:40 PM 3:00 PM, 609

227.07. Mu and You: Public Microlensing Analysis Tools and Survey Data, 3:12 PM 3:24 PM, 606

Poster presentations

Session 245. Computation, Data Science, and Image Analysis posters ( 31 posters! )

Selected posters

243.08. Utilizing Independent Component Analysis to Explore the Diversity of Quasars

245.01. Making organizational research software more discoverable

245.27. The MAESTROeX low Mach number stellar hydrodynamics code

245.29. The Castro Adaptive Mesh Refinement Hydrodynamics Code: Applications, Algorithm Development, and Performance Portability

247.30. Chemical Analysis of Tabby’s Star (KIC 8462852)

247.35. VPLanet: The VIrtual Planet Simulator

249.11. Know Your Neighbors: New Catalogs and Analysis of Star Clusters in the LMC, SMC, & M33

250.02. X-Ray Source Analysis In The Globular Clusters NGC 6341 and NGC 6541

253.06. Structure Function Analysis of Turbulent Properties in the Small and Large Magellanic Clouds

259.05. Forward-Modeling Analysis of Late-T Dwarf Atmospheres

259.15. Finding age relations for low mass stars using magnetic activity and kinematics

259.24. A Uniform Retrieval Analysis on a Sample of 16 T-dwarfs

258.25. SuperNovae Analysis aPplication (SNAP): Identifing and Understanding the Physics of Supernovae

Selected iPosters

268.02. Towards 3D Parameter Space Studies of CCSNe With Grey, Two-Moment Neutrino Transport

261.12. Using Machine Learning to Predict the Masses of Galaxy Clusters

261.15. Mapping Galaxy Cluster Orientations from Cosmo-OWLS Simulations

261.16. A Hydrodynamical Simulation of the Off-Axis Cluster Merger Abell 115

WEDNESDAY, 9 JANUARY 2019

Open meeting

AAS WorldWide Telescope with Python and Astropy, 10:00 AM 11:30 AM; 214

Oral presentations

316.04D. Feedback and Chemical Enrichment in Low Mass Dwarf Galaxies: Insights from Simulations Tracking Individual Stars, 10:30 AM 10:50 AM, 617

304.03. Recent upgrades to the pyLIMA software for microlensing modeling and analysis of two binary events, 10:10 AM 10:20 AM, 6E

311.05. Quantifying the effects of spatial resolution and noise on galaxy metallicity gradients, 11:00 AM 11:10 AM, 612

313.05D. Probabilistic data analysis methods for large photometric surveys, 10:50 AM 11:10 AM, 614

336.04D. Simultaneous modelling of X-rays emission and optical polarization of intermediate polars using the CYCLOPS code: the case of V405 Aurigae, 2:40 PM 3:00 PM, 614

342.06. On Open Cluster Disruption, 3:00 PM 3:10 PM, 620

341.01. Reproducing Stellar Rotation Periods in the Kepler Field via Magnetic Braking and Tidal Torques

Selected posters

346.04. Designing a Python Module for the Calculation of Molecular Parameters and Production Rates in Comets

347.01. Hyperlink preservation in astrophysics papers

348.19. The COBAIN code. Basic principles and geometrical considerations

348.27. Considerations and Design Principles for the 2.1 Release of the PHOEBE Eclipsing Binary Modeling Code

356.06. Analysis of a large number of spiral galaxies shows asymmetry between clockwise and counterclockwise galaxies

Session 381. Computation, Data Science, and Image Analysis session (8 iPosters)

Selected iPosters

381.03. ASTROstream: Automated claSsification of Transient astRonomical phenOmena in the streaming mode

381.05. Understanding and using the Fermitools

381.07. Polarization Calibration Post-Pipeline in CASA: Pilot Implementation

381.08. Transitioning from ADS Classic to the new ADS search platform

THURSDAY, 10 JANUARY 2019

Hack Together Day

8:30 AM 7:00 PM; 4C-2

Oral presentations

413.06. The Radio Astronomy Software Group: Foundational Tools for 21 cm Cosmology and Beyond, 11:10 AM 11:20 AM, 614

408.07D. Hundreds of New Planet Candidates from K2, 11:00 AM 11:20 AM, 608

411.05D. AzTEC Survey of the Central Molecular Zone: Modeling Dust SEDs and N-PDF with Hierarchical Bayesian Analysis, 10:40 AM 11:00 AM, 612

405.05. How can new data analysis methods get more out of Kepler/K2 data?, 10:40 AM 10:50 AM, 605/610

425.01. The Dedalus project: open source science in astrophysics with examples in convection and stellar dynamos, 2:00 PM 2:22 PM, 606

430.02D. Analysis of the spatially-resolved V-3.6μm colors and dust extinction within 257 nearby NGC and IC galaxies, 2:20 PM 2:40 PM, 612

Selected posters

443.11. WFC3 PSF Database and Analysis Tools

457.02. The Stak Notebooks: Transitioning From IRAF to Python

442.01. ExoPhotons: Exoplanet Monte Carlo Radiative Transfer

442.02. Quantifying inhomogeneities in the HI distributions of simulated galaxies

445.01. Lightkurve v1.0: Kepler, K2, and TESS time series analysis in Python

445.05. Using Kepler DR25 Products to Compute Exoplanet Ocurrence Rates

465.07. Distribution of stellar rotation periods using light curve analysis of second phase Kepler data

* abbreviated as in I haven’t listed all the posters that could be listed here, as the list was getting very very long…

微软宣布 Azure Function 支持 Python

$
0
0

在最近举行的 Connect() 大会上,微软宣布 Azure Functions 对 python 的支持。开发人员可以使用 Python 3.6 基于开源的 Functional Runtime 2.0 构建 Function,并将它们发布到 Consumption Plan。

10 月上旬,InfoQ 报道了 Azure Function Runtime 2.0 的普遍可用性,从那时开始,对 Python 的支持一直是最重要的用户请求之一,并且私有预览版早已可用。现在它普遍可用,开发人员可以开始构建用于数据操作、机器学习、脚本和自动化场景的 Function。

Azure Runtime 2.0 包含了一个语言 worker 模型,为非.NET 语言(如 Java 和 Python)提供支持。因此,开发人员可以导入现有的.py 脚本和模块,然后开始编写 Function。此外,开发人员可以使用 requirements.txt 文件为 pip 配置其他依赖项。


微软宣布 Azure Function 支持 Python

Azure Function 编程模型提供了触发器和绑定,开发人员因此可以配置触发 Function 执行的事件以及 Function 需要编排的数据源。根据微软 Azure Function 团队项目经理 Asavari Tayal 的说法,预览版将支持绑定到 HTTP 请求、计时器事件、Azure Storage、Cosmos DB、服务总线、事件中心和事件网格。配置完成后,开发人员可以快速从这些绑定中检索数据,或使用入口点 Function 的方法属性进行回写。

熟悉 Python 的开发人员不需要学习新工具,他们可以在 Mac、linuxwindows 上本地调试和测试 Function。借助 Azure Functions Core Tools(CLI),开发人员可以使用触发器模板并直接发布到 Azure,Azure 平台将负责处理构建和配置。此外,开发人员还可以在 Visual Studio Code 中使用 Azure Functions 扩展,包括 Python 扩展,以便在任意平台上获得 Python 的自动完成、IntelliSense、linting 和调试等功能。


微软宣布 Azure Function 支持 Python

可以通过 Consumption Plan 或 Service App Plan 来托管使用 Python 编写的 Azure Function。Tayal 在一篇关于 Python 预览的博文中解释道:

实际上,两个托管计划都是在基于开源的 azure-function/python 基础镜像的 docker 容器中运行你的 Function。平台对容器进行了抽象,你只需要负责提供 Python 文件,无需操心与管理底层 Azure Function 和 Python 运行时相关的事情。

最后,由于竞争对手亚马逊提供的 AWS Lambda 支持 Python 3.6,所以微软也支持这一版本的 Python。微软和亚马逊都试图通过在云平台上推广更多语言来正确更多的用户。

查看英文原文:

https://www.infoq.com/news/2018/12/azure-functions-python-support

Viewing all 9596 articles
Browse latest View live