Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all 9596 articles
Browse latest View live

Python远程执行库Fabric简要介绍

$
0
0

之前的文章<<Ansible入门>>介绍了做为自动化配置管理的Ansible,这篇文章简要来介绍python远程执行库 Fabric 。

Fabric是一个通过SSH远程执行SHELL命令的库,主要用于自动化的安装部署及远程管理任务。Ansible也能够实现Fabric的这些能力,但Fabric更为轻量,只是单纯地用于远程执行。它本身是一个Python库,可以在我们自己的Python程序中直接 import ,并基于它实现远程命令执行。此外,它还提供了一个 fab 命令行工具,使用它可以更简单地编写我们需要执行的任务。这种模式被使用的更为普遍,我们主要来介绍以 fab 命令来使用fabric。

需要注意的是,Fabric的2.x版本并不兼容1.x版本,而且2.x版本很大程度上边缘化了fab命令行工具,因而我们还是来介绍1.x版本。当前最新1.x版本为1.14, 文档地址为: http://docs.fabfile.org/en/1.14/%E3%80%82

首先我们使用 pip 安装fabric:

pip install 'fabric<2.0'

安装完可以执行 fab 命令查看版本:

[root@centos1 fabv1]# fab --version Fabric 1.14.0 Paramiko 2.4.2

fab 命令行工具默认从当前目录下的文件 fabfile.py 中加载Python代码,其中的每个python函数定义了一个 task , 可以直接以fab命令行直接执行。

如, fabfile.py 内容如下:

def helloworld(): print("Hello world!") def hello(name="world"): print("Hello %s!" % name)

可以通过执行如下命令查看当前 fabfile.py 中定义的task:

[root@centos1 fabv1]# fab -l Available commands: hello helloworld [root@centos1 fabv1]#

可以看到 hello 和 helloworld 两个 task 被定义。

其中, hello 函数中定义了参数,在调用时可以如下形式调用 fab 命令:

<task name>:<arg>,<kwarg>=<value>,...

比如,我们使用两种参数传入方式都可以:

[root@centos1 fabv1]# fab hello:name=flygoast Hello flygoast! Done. [root@centos1 fabv1]# fab hello:flygoast Hello flygoast! Done. [root@centos1 fabv1]#

Fabric提供了若干操作函数,主要有:

local(): 在本地主机执行命令 run(): 在远程主机执行命令 sudo(): 类似于run(), 区别在于在SHELL命令前加上”sudo” get(remote, local): 从远程主机下载文件到本地 put(local, remote): 将本地文件上传到远程主机 prompt(): 提示用户输入,并返回其输入内容 reboot(): 重启服务器

在远程主机上需要执行的操作,我们直接定义在Python函数中, 如:

from fabric.api import run def test_remote(): run("hostname")

不过,在哪些主机上执行如何指定呢?Fabric提供了许多方式,我们简要介绍几种。

一种是直接由fab命令的选项来指定如:

[root@centos1 fabv1]# fab -H dev01 test_remote [dev01] Executing task 'test_remote' [dev01] run: hostname [dev01] Login password for 'root': [dev01] out: dev01 [dev01] out: Done. Disconnecting from dev01... done. [root@centos1 fabv1]#

支持的主机格式为SSH风格: user@host:port 。值得一提的是,当执行上述命令时,会提示用户需要手动输入密码。

也可以在fabric的全局变量 env.hosts 中指定远程主机,我们将代码修改为:

from fabric.api import run, env env.hosts=['dev01', 'dev02'] def test_remote(): run("hostname")

执行结果如下:

[root@centos1 fabv1]# fab test_remote [dev01] Executing task 'test_remote' [dev01] run: hostname [dev01] Login password for 'root': [dev01] out: dev01 [dev01] out: [dev02] Executing task 'test_remote' [dev02] run: hostname [dev02] out: bogon [dev02] out: Done. Disconnecting from dev01... done. Disconnecting from dev02... done. [root@centos1 fabv1]#

通过 env.hosts 设置远程主机,对于所有的 task 全局有效。如果每个 task 有不同的执行主机,可以给 task 使用 hosts 修饰器单独指定, 如

my_hosts=('dev01', 'dev03') @hosts(my_hosts) def test_remote2(): run('hostname')

执行结果如下:

[root@centos1 fabv1]# fab test_remote2 [dev01] Executing task 'test_env' [dev01] run: hostname [dev01] Login password for 'root': [dev01] out: dev01 [dev01] out: [dev03] Executing task 'test_env' [dev03] run: hostname [dev03] out: bogon [dev03] out: Done. Disconnecting from dev01... done. Disconnecting from dev03... done. [root@centos1 fabv1]#

此外,还可以给不同的远程主机定义 Roles , 再将不同角色分配到不同的 task 。 具体细节可参考: http://docs.fabfile.org/en/1.14/usage/execution.html#roles

在上面的示例中,在执行前都需要输入主机密码。也可以将密码提前在代码中写好,则不再需要输入。

env.password = '123456'

若主机密码不一致,可以通过 env.passwords 中按每台主机来指定密码。该变量是 dict 类型, 它的 key 必须必须是用户名、主机、端口的组合,这三个元素必须全部提供,如:

env.passwords = {'root@dev01:22':'123456', 'root@dev03:22’:’456789'}

这种方法需要将密码编写在文件中,存在安全风险,官方还是推荐使用SSH KEY登录。

本文只简要介绍fabric的用法,其他如并行执行,与远程程序的交互等可以参考官方文档: http://docs.fabfile.org/en/1.14/


Python反序列漏洞分析

$
0
0

什么是序列化?

程序运行的过程中,变量都是在内存中的,当程序一旦执行完毕结束退出后,变量占有的内存就被释放。

如果将内存中的变量持久化存储到磁盘中,这个过程就成为序列化;下次运行的时候从磁盘中读取变量到内存中,这个过程就成为反序列化。

python中序列化称为pickling,反序列化被称为pickling;在php中序列化被称为serialization,反序列化被称为unserialization。


Python反序列漏洞分析

Pickle and marshal

涉及到Python反序列化安全问题的模块主要包含两个pickle(cpickle)和marshal模块。

Pickle marshal的基本操作

pickle.dump(obj, file, [,protocol]) 将obj对象序列化存入已经打开的file中 import marshal
import pickle
dataList = ['test1', 'test2']
f = open('dataFile.txt', 'wb')
pickle.dump(dataList, f)
f.close()

pickle.load(file) 从file中读取序列化字符串,反序列化转换为python的数据对象

f = open('dataFile.txt', 'r')
dataList = pickle.load(f)
print(dataList) #['test1', 'test2']
f.close() pickle.dumps(obj[, protocol]) 将obj对象序列化为string形式 class A:
def __init__(self):
print('This is A')
a = A()
p_a = pickle.dumps(a)
print(p_a)

pickle.loads(string) 从string中反序列化读出obj对象

class A:
def __init__(self):
print('This is A')
a = A()
p_a = pickle.dumps(a)
pickle.loads(p_a)

marsha模块同样包括dump,load/dumps,loads四个操作函数,基本操作和pickle模块相似。

支持pickle的数据类型

None,True和False

整数,长整数,浮点数,复数

普通和Unicode字符串

元组,列表,集和仅包含可序列化对象的字典

在模块顶层定义的函数

在模块顶层定义的内置函数

在模块顶层定义的类

__dict__或者调用__getstate__()并产生结果的类的实例

pickle marshal区别

一般情况下pickle应该始终是序列化Python对象的首选方法,marshal是一个更原始的序列化模块,marshal主要用于支持Python的.pyc文件。

pickle模块会跟踪已经序列化的对象,因此以后对同一对象的引用将不会再次序列化。marshal则不会这样做。

marshal不能用于序列化用户定义的类及其实例。pickle可以透明地保存和恢复类实例,但是类定义必须是可导入的,并且存储在与存储对象时相同的模块中。

marshal序列化格式不能保证在Python版本之间可移植。pickle序列化格式保证在Python版本之间向后兼容。

class A:
def __init__(self):
print('This is A')
a = A()
pickle.dumps(a)
marshal.dumps(a) # marshal不能用于序列化用户定义的类及其实例,报错 ValueError: unmarshallable object

Python反序列化代码执行问题

object.__reduce__() __reduce__()方法在序列化的字符被反序列化为对象的时候调用(类似PHP的wakeup魔术方法)。在新式类中生效,不带参数,应返回字符串或是一个元组。

如果返回一个字符串,该字符串应该被解释为全局变量的名称,它应该是对象相对于其模块的本地名称。

当返回一个元组时,它必须包含两到五个成员。可选成员可以省略,也可以提供None作为其值。

每个成员的意义是按顺序规定的:

第一个成员,将被调用的对象,callable。

第二个成员,可调用对象的参数的元组。如果callable不接受任何参数,则必须给出一个空元组。

当Python定义的类中的__reduce__函数返回的元组包含危险代码或可控,就会造成代码执行。

class A(object):
def __init__(self, func, arg):
self.func = func
self.arg = arg
print('This is A')
def __reduce__(self):
return (self.func, self.arg)
a = A(os.system, ('whoami',))
p_a = pickle.dumps(a)
pickle.loads(p_a)
print('==========')
print(p_a)
'''
This is A
rai4over
==========
cposix
system
p0
(S'whoami'
p1
tp2
Rp3
.
'''

pickle.loads

pickle.loads或者pickle.load的参数可控同样会造成代码执行。

payload = '''cposix
system
p0
(S'whoami'
p1
tp2
Rp3
.'''
pickle.loads(payload)
#rai4over

Pickle模块源码浅析

源码总体关键对象

首先是定义的四个异常类,分别是pickle.PickleError,pickle.PicklingError,pickle.UnpicklingError,_Stop。

接着就是非常重要的Pickle opcodes,在解析和调度中起到非常重要的作用

MARK = '(' # push special markobject on stack
STOP = '.' # every pickle ends with STOP
POP = '0' # discard topmost stack item
POP_MARK = '1' # discard stack top through topmost markobject
DUP = '2' # duplicate top stack item
.................
.................
NEWFALSE = '\x89' # push False
LONG1 = '\x8a' # push long from < 256 bytes
LONG4 = '\x8b' # push really big long
class Pickler,pickle.dump和pickle.dumps都会实例化这个类。
class Unpickler,pickle.load和pickle.loads都会实例化这个类。
def dump(obj, file, protocol=None):
Pickler(file, protocol).dump(obj)
def dumps(obj, protocol=None):
file = StringIO()
Pickler(file, protocol).dump(obj)
return file.getvalue()
def load(file):
return Unpickler(file).load()
def loads(str):
file = StringIO(str)
return Unpickler(file).load()

序列化流程浅析

以pickle.dumps序列化方法为例,测试代码不变:

class A(object):
def __init__(self, func, arg):
self.func = func
self.arg = arg
print('This is A')
def __reduce__(self):
return (self.func, self.arg)
a = A(os.system, ('whoami',))
p_a = pickle.dumps(a)

首先调用dumps方法,然后实例化Pickler类,会传入空的可写入对象进入__init__完成初始化,并调用该类的dump方法并传入序列化对象开始进行序列化。

def dumps(obj, protocol=None):
file = StringIO()
Pickler(file, protocol).dump(obj)
return file.getvalue()

初始化过程中对协议的类型进行了判断,还有将可写入对象的赋值给self.write等操作

class Pickler:
def __init__(self, file, protocol=None):
if protocol is None:
protocol = 0
if protocol < 0:
protocol = HIGHEST_PROTOCOL
elif not 0 <= protocol <= HIGHEST_PROTOCOL:
raise ValueError("pickle protocol must be <= %d" % HIGHEST_PROTOCOL)
self.write = file.write
self.memo = {}
self.proto = int(protocol)
self.bin = protocol >= 1
self.fast = 0

除了初始化,还需要注意的就是类变量dispatch,这个类变量是一个字典,键名是并非常见的string类型,而是使用的types模块下定义的数据的类型。

#types.py
NoneType = type(None)
TypeType = type
ObjectType = object
IntType = int
LongType = long
FloatType = float
BooleanType = bool
try:
ComplexType = complex
except NameError:
pass
StringType = str
try:
UnicodeType = unicode
StringTypes = (StringType, UnicodeType)
except NameError:
StringTypes = (StringType,)
BufferType = buffer
TupleType = tuple
ListType = list
DictType = DictionaryType = dict
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.func_code)

键值则为各个处理方法的地址,dispatch建立起了变量类型和处理方法的映射,可以称之为调度表。


Python反序列漏洞分析
Python反序列漏洞分析

初始化完关键的变量之后,就会进入dump方法,这里面最重要的就是self.save方法

def dump(self, obj):
"""Write a pickled representation of obj to the open file."""
if self.proto >= 2:
self.write(PROTO + chr(self.proto))
self.save(obj)
self.write(STOP)

save方法类似于一个分析调度器,分析我们传进来的需要序列化对象的数据类型,属性,根据结果进行不同调度,当传入的对象类型存在于dispatch调度表内时,直接传入处理函数,完成序列化。


Python反序列漏洞分析

示例中的序列化对象类型就是<class '__main__.A'>,不存在于dispatch调度表,因此是通过分析__reduce_ex__属性得到结果变量rv,我们可以发现这个就是我们定义类中的__reduce__的回调内容。


Python反序列漏洞分析

最后将obj和rv传入save_reduce函数。

self.save_reduce(obj=obj, *rv)

在save_reduce函数内,obj和rv两个参数分别传入save函数。

def save_reduce(self, func, args, state=None,
listitems=None, dictitems=None, obj=None):
if not isinstance(args, TupleType):
raise PicklingError("args from reduce() should be a tuple")
if not hasattr(func, '__call__'):
#.......................
#.......................
if self.proto >= 2 and getattr(func, "__name__", "") == "__newobj__":
#.......................
#.......................
else:
save(func) #再次进入save
save(args) #再次进入save
write(REDUCE)

此时重新经过save函数分析传进来了对象类型为<type 'builtin_function_or_method'>,可以在调度表里面找到。


Python反序列漏洞分析

save(args)是元组,同样可以在调度表中找到对应方法,序列化过程基本完成。

反序列化流程浅析

以pickle.loads反序列化方法为例,测试代码为上例序列化字符串:

payload = '''cposix
system
p0
(S'whoami'
p1
tp2
Rp3
.'''
pickle.loads(payload)

首先调用loads方法,然后实例化Unpickler类,进入__init__完成初始化,并调用该类的load方法开始反序列化。

def loads(str):
file = StringIO(str)
return Unpickler(file).load()

初始化后,同样拥有一个字典调度表dispatch,但这个这个调度表和pickle类中的不一样,键名是Pickle opcodes,键值是反序列化的处理方法。


Python反序列漏洞分析

然后进入load函数,对序列化字符进行关键字节读取,然后在调度表dispatch中寻找对应的处理函数。

def load(self):
"""Read a pickled object representation from the open file.
Return the reconstituted object hierarchy specified in the file.
"""
self.mark = object() # any new unique object
self.stack = []
self.append = self.stack.append
read = self.read
dispatch = self.dispatch
try:
while 1:
key = read(1) #key=c
dispatch[key](self)
except _Stop, stopinst:
return stopinst.value

比如我们反序列化的字符串的第一个字符为c,则根据调度表进入load_global函数,分别读取模块posix和方法名称system,然后进入find_class函数。


Python反序列漏洞分析

在find_class函数中,根据模块名导入模块,然后获取模块的方法,存入klass,然后作为返回值,并添加到stack中。


Python反序列漏洞分析

依次读取Pickle opcodes,完成反序列化,关键操作如下。

当为S时,调用函数load_string,读取命令字符串whoami并添加到stack中。


Python反序列漏洞分析

当为R时,调用load_reduce,从栈中获取回调函数和参数,并执行。


Python反序列漏洞分析

Python Bytes: #105 Colorizing and Restoring Old Images with Deep Learning

$
0
0

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Brian #1: Colorizing and Restoring Old Images with Deep Learning

Text interview by Charlie Harrington of Jason Antic, developer of DeOldify A whole bunch of machine learning buzzwords that I don’t understand in the slightest combine to make a really cool to to make B&W photos look freaking amazing. “This is a deep learning based model. More specifically, what I've done is combined the following approaches: Self-Attention Generative Adversarial Network Training structure inspired by (but not the same as) Progressive Growing of GANs. Two Time-Scale Update Rule. Generator Loss is two parts: One is a basic Perceptual Loss (or Feature Loss) based on VGG16. The second is the loss score from the critic.”

Michael #2: PlatformIO IDE for VSCode

via Jason Pecor PlatformIO is an open source ecosystem for IoT development Cross-platform IDE and unified debugger. Remote unit testing and firmware updates Built on Visual Studio Code which has a nice extension for Python PlatformIO, combined with the features of VSCode provides some great improvements for project development over the standard Arduino IDE for Arduino-compatible microcontroller based solutions. Some of these features are paid, but it’s a reasonable price With Python becoming more popular for microcontroller design, as well, this might be a very nice option for designers. And for Jason’s, specifically, it provides a single environment that can eventually be configured to handle doing the embedded code design, associated Python supporting tools mods, and HDL development. The PlatformIO Core written in Python. Python 2.7 (hiss…) Jason’s test drive video from Tuesday: Test Driving PlatformIO IDE for VSCode

Brian #3: Python Data Visualization 2018: Why So Many Libraries?

Nice overview of visualization landscape, by Anaconda team Differentiating factors, API types, and emerging trends Related: Drawing Data with Flask and matplotlib Finally! A really simple example app in Flask that shows how to both generate and display matplotlib plots. I was looking for something like this about a year ago and didn’t find it.

Michael #4: coder.com - VS Code in the cloud

Full Visual Studio Code, but in your browser Code in the browser Access up to 96 cores VS Code + extensions, so all the languages and features Collaborate in real time, think google docs Access linux from any OS Note: They sponsored an episode of Talk Python To Me, but this is not an ad here...

Brian #5: * By Welcoming Women, Python’s Founder Overcomes Closed Minds In Open Source*

Forbes’s article about Guido and the Python community actively working to get more women involved in core development as well as speaking at conferences. Good lessons for other projects, and work teams, about how you cannot just passively “let people join”, you need to work to make it happen.

Michael #6: Machine Learning Basics

From Anna-Lena Popkes Plain python implementations of basic machine learning algorithms Repository contains implementations of basic machine learning algorithms in plain Python (modern Python, yay!) All algorithms are implemented from scratch without using additional machine learning libraries. Goal is to provide a basic understanding of the algorithms and their underlying structure, not to provide the most efficient implementations. Most of the algorithms Linear Regression Logistic Regression Perceptron k-nearest-neighbor k-Means clustering Simple neural network with one hidden layer Multinomial Logistic Regression Decision tree for classification Decision tree for regression Anna-Lena was on Talk Python on 186: http://talkpython.fm/186

Extras:

Michael: PSF Fellow Nominations are open Michael: Shiboken has no meaning Brian: Python 3.7 runtime now available in AWS Lambda

Learning AI if You Suck at Math-Part 3-Building an AI Dream Machine

$
0
0
Ready to learn Artificial Intelligence?Browse courseslike Uncertain Knowledge and Reasoning in Artificial Intelligence developed by industry thought leaders and Experfy in Harvard Innovation Lab.

Welcome to the third installment ofLearning AI if You Suck at Math. If you missed the earlier articles, be sure to check outpart 1, part 2 .

Today we’re going to build our own Deep Learning Dream Machine .

We’ll source the best parts and put them together into a number smashing monster. We’ll also walk through installing all the latest deep learning frameworks step by step on Ubuntu linux 16.04.

This machine will slice through neural networks like a hot laser through butter. Other than forking over $129,000 for Nvidia’s DGX-1 , the AI supercomputer in a box, you simply can’t get better performance than what I’ll show you right here.

Lastly, if you’re working with a tighter budget, don’t despair, I’ll also outline very budget friendly alternatives. First, a TL;DR, Ultracheap UpgradeOption

Before we dig into building a DL beast, I want to give you the easiest upgrade path.

If you don’t want to build an entirely new machine, you still have one perfectly awesome option.
Learning AI if You Suck at Math-Part 3-Building an AI Dream Machine

Simply upgrade your GPU (with either a Titan X or a GTX 1080 ) and get VMware Workstation or use another virtualization software that supports GPU acceleration ! Or you could simply install Ubuntu bare metal and if you need a windows machine run that in a VM, so you max your performance for deep learning.

Install Ubuntu and the DL frameworks using the tutorial at the end of the article and bam! You just bought yourself a deep learning superstar on the cheap!

All right, let’s get to it.

I’ll mark dream machine parts and budget parts like so:

MINO (Money is No Object) = Dream Machine ADAD (A Dollar and a Dream) = Budget Alternative Dream Machine Parts Extravaganza GPUs First

CPUs are no longer the center of the universe. AI applications have flipped the script. If you’ve ever build a custom rig for gaming, you probably pumped it up with the baddest Intel chips you could get your hands on.

But times change.

Nvidia is the new Intel .

The most important component of any deep learning world destroyer is the GPU(s).

While AMD have made headway in cyptocoin mining in the last few years, they have yet to make their mark on AI. That will change soon, as they race to capture a piece of this exploding field, but for now Nvidia is king. And don’t sleep on Intel either. They purchased Nervana Systems and plan to put out their own deep learning ASICs in 2017 .


Learning AI if You Suck at Math-Part 3-Building an AI Dream Machine

The king of DLGPUs

Let’s start with MINO. The ultimate GPU is the Titan X. It has no competition.

It’s packed with 3584 CUDA cores at 1531 MHz, 12GB of G5X and it boasts a memory speed of 10 Gbps.

In DL, cores matter and so does more memory close to those cores.

DL is really nothing but a lot of linear algebra. Think of it as an insanely large Excel sheet. Crunching all those numbers would slaughter a standard 4 or 8 core Intel CPU.

Moving data in and out of memory is a massive bottleneck, so more memory on the card makes all the difference, which is why the Titan X is the king of the world.

You can get Titan X directly from Nvidia for $1,200 MSRP. Unfortunately, you’re limited to two. But this is a Dream Machine and we’re buying four. That’s right quad SLI!

For that you’ll need to pay a slight premium from a third party seller . Feel free to get two from Nvidia and two from Amazon. That will bring you to $5300, by far the bulk of the cost for this workstation.

Now if you’re just planning to run Minecraft, it’ll still look blocky but if you want to train a model to beat cancer , these are your cards.:)

Gaming hardware benchmark sites will tell you that anything more than two cards is well past the point of diminishing returns but that’s just for gaming! When it comes to AI you’ll want to hurl as many cards at it as you can. Of course, AI has its point of diminishing returns too but it’s closer to dozens or hundreds of cards (depending on the algo), not four. So stack up, my friend.

Please note you will NOT need an SLI bridge, unless you’re also planning to use this machine for gaming. That’s strictly for graphics rendering and we’re doing very little graphics here, other than plotting a few graphs in matplotlib.

Budget-Friendly Alternative GPUs
Learning AI if You Suck at Math-Part 3-Building an AI Dream Machine
Your ADAD card is the GeForce GTX 1080 Founders Edition. The 1080 packs 2560 CUDA cores, a lot less than the Titan X, but it rings in at half the price, with an MSRP of $699.

It also boasts less RAM, at 8GB versus 12.

EVGA has always served me well so grab four of them for your machine . At $2796 vs $5300, that’s a lot of savings for nearly equivalent performance.

The second best choice for ADAD is the GeForce GTX 1070. It packs 1920 CUDA cores so it’s still a great choice. It comes in at around $499 MSRP but superclocked EVGA 1070s will run you only $389 bucks so that brings the price to a more budget-friendly $1556. Very doable.

Of course if you don’t have as much money to spend you can always get two or three cards. Even one will get you moving in the right direction.

Let’s do the math on best bang for the b

Python代理IP爬虫的简单使用

$
0
0

Python代理IP爬虫的简单使用

python爬虫要经历爬虫、爬虫被限制、爬虫反限制的过程。当然后续还要网页爬虫限制优化,爬虫再反限制的一系列道高一尺魔高一丈的过程。爬虫的初级阶段,添加headers和ip代理可以解决很多问题。

本人自己在爬取豆瓣读书的时候,就以为爬取次数过多,直接被封了IP.后来就研究了代理IP的问题.

(当时不知道什么情况,差点心态就崩了...),下面给大家介绍一下我自己代理IP爬取数据的问题,请大家指出不足之处.

问题

这是我的IP被封了,一开始好好的,我还以为是我的代码问题了


Python代理IP爬虫的简单使用
思路:

从网上查找了一些关于爬虫代理IP的资料,得到下面的思路

爬取一些IP,过滤掉不可用. 在requests的请求的proxies参数加入对应的IP. 继续爬取. 收工 好吧,都是废话,理论大家都懂,上面直接上代码...

思路有了,动手起来.

运行环境

Python 3.7, Pycharm

这些需要大家直接去搭建好环境...

准备工作 爬取IP地址的网站(国内高匿代理) 校验IP地址的网站 你之前被封IP的py爬虫脚本...

上面的网址看个人的情况来选取

爬取IP的完整代码

PS:简单的使用bs4获取IP和端口号,没有啥难度,里面增加了一个过滤不可用IP的逻辑

关键地方都有注释了

#!/usr/bin/env python3 # -*- coding: utf-8 -*- # @Time : 2018/11/22 # @Author : liangk # @Site : # @File : auto_archive_ios.py # @Software: PyCharm import requests from bs4 import BeautifulSoup import json class GetIp(object): """抓取代理IP""" def __init__(self): """初始化变量""" self.url = 'http://www.xicidaili.com/nn/' self.check_url = 'https://www.ip.cn/' self.ip_list = [] @staticmethod def get_html(url): """请求html页面信息""" header = { 'User-Agent': 'Mozilla/5.0 (windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } try: request = requests.get(url=url, headers=header) request.encoding = 'utf-8' html = request.text return html except Exception as e: return '' def get_available_ip(self, ip_address, ip_port): """检测IP地址是否可用""" header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } ip_url_next = '://' + ip_address + ':' + ip_port proxies = {'http': 'http' + ip_url_next, 'https': 'https' + ip_url_next} try: r = requests.get(self.check_url, headers=header, proxies=proxies, timeout=3) html = r.text except: print('fail-%s' % ip_address) else: print('success-%s' % ip_address) soup = BeautifulSoup(html, 'lxml') div = soup.find(class_='well') if div: print(div.text) ip_info = {'address': ip_address, 'port': ip_port} self.ip_list.append(ip_info) def main(self): """主方法""" web_html = self.get_html(self.url) soup = BeautifulSoup(web_html, 'lxml') ip_list = soup.find(id='ip_list').find_all('tr') for ip_info in ip_list: td_list = ip_info.find_all('td') if len(td_list) > 0: ip_address = td_list[1].text ip_port = td_list[2].text # 检测IP地址是否有效 self.get_available_ip(ip_address, ip_port) # 写入有效文件 with open('ip.txt', 'w') as file: json.dump(self.ip_list, file) print(self.ip_list) # 程序主入口 if __name__ == '__main__': get_ip = GetIp() get_ip.main() 复制代码 使用方法完整代码

PS: 主要是通过使用随机的IP来爬取,根据request_status来判断这个IP是否可以用.

为什么要这样判断?

主要是虽然上面经过了过滤,但是不代表在你爬取的时候是可以用的,所以还是得多做一个判断.

#!/usr/bin/env python3 # -*- coding: utf-8 -*- # @Time : 2018/11/22 # @Author : liangk # @Site : # @File : get_douban_books.py # @Software: PyCharm from bs4 import BeautifulSoup import datetime import requests import json import random ip_random = -1 article_tag_list = [] article_type_list = [] def get_html(url): header = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36' } global ip_random ip_rand, proxies = get_proxie(ip_random) print(proxies) try: request = requests.get(url=url, headers=header, proxies=proxies, timeout=3) except: request_status = 500 else: request_status = request.status_code print(request_status) while request_status != 200: ip_random = -1 ip_rand, proxies = get_proxie(ip_random) print(proxies) try: request = requests.get(url=url, headers=header, proxies=proxies, timeout=3) except: request_status = 500 else: request_status = request.status_code print(request_status) ip_random = ip_rand request.encoding = 'gbk' html = request.content print(html) return html def get_proxie(random_number): with open('ip.txt', 'r') as file: ip_list = json.load(file) if random_number == -1: random_number = random.randint(0, len(ip_list) - 1) ip_info = ip_list[random_number] ip_url_next = '://' + ip_info['address'] + ':' + ip_info['port'] proxies = {'http': 'http' + ip_url_next, 'https': 'https' + ip_url_next} return random_number, proxies # 程序主入口 if __name__ == '__main__': """只是爬取了书籍的第一页,按照评价排序""" start_time = datetime.datetime.now() url = 'https://book.douban.com/tag/?view=type&icn=index-sorttags-all' base_url = 'https://book.douban.com/tag/' html = get_html(url) soup = BeautifulSoup(html, 'lxml') article_tag_list = soup.find_all(class_='tag-content-wrapper') tagCol_list = soup.find_all(class_='tagCol') for table in tagCol_list: """ 整理分析数据 """ sub_type_list = [] a = table.find_all('a') for book_type in a: sub_type_list.append(book_type.text) article_type_list.append(sub_type_list) for sub in article_type_list: for sub1 in sub: title = '==============' + sub1 + '==============' print(title) print(base_url + sub1 + '?start=0' + '&type=S') with open('book.text', 'a', encoding='utf-8') as f: f.write('\n' + title + '\n') f.write(url + '\n') for start in range(0, 2): # (start * 20) 分页是0 20 40 这样的 # type=S是按评价排序 url = base_url + sub1 + '?start=%s' % (start * 20) + '&type=S' html = get_html(url) soup = BeautifulSoup(html, 'lxml') li = soup.find_all(class_='subject-item') for div in li: info = div.find(class_='info').find('a') img = div.find(class_='pic').find('img') content = '书名:<%s>' % info['title'] + ' 书本图片:' + img['src'] + '\n' print(content) with open('book.text', 'a', encoding='utf-8') as f: f.write(content) end_time = datetime.datetime.now() print('耗时: ', (end_time - start_time).seconds) 复制代码 为什么选择国内高匿代理!
Python代理IP爬虫的简单使用
总结

使用这样简单的代理IP,基本上就可以应付在爬爬爬着被封IP的情况了.而且没有使用自己的IP,间接的保护?!?!

大家有其他的更加快捷的方法,欢迎大家可以拿出来交流和讨论,谢谢。

个人博客网站


Python代理IP爬虫的简单使用
感谢

感谢大表锅GXCYUZY , 感谢 大表锅lhf 感谢他们两个的支持.

Comment some lines in a text file using Python

$
0
0
Special end-of-line / string characters from lines read from text file, using python

I need to read lines from a text file but, where the 'end of line' caracter is not always \n or \x or a combination and may be any combination of characters like 'xyz' or '|', but the 'end of line' is always the same and known for each type of file.

Tringler a line from a text file using Python

I am trying to create genetic signatures. I have a textfile full of DNA sequences. I want to read in each line from the text file. Then add 4mers which are 4 bases into a dictionary. For example: Sample sequence ATGATATATCTATCAT What I want to add is

How do I delete some lines from the text file using php?

I have a text file contains this data : 947 11106620030 Ancho Khoren MKK6203 Introduction Busy 2,00 948 balblalbllablab 949 balblalbllablab 950 balblalbllablab 951 11106620031 Adagasa Goo MKB6201 Economy Inside 3,00 952 balblalbllablab 953 balblalbll

How to read a certain line in a text file using Python?

I have a text file with a location and its coordinates on a new line e.g. A&AB 42.289567 -83.717143 AH 42.276620 -83.739620) I have a for loop that iterates through this list and if the location matches the user input, it returns the next two lines (

Jumping lines from a text file using Python

I am processing a very large log file to extract information using Python regex. However, I would like to process all the lines only after I find a particular string, which in this case is Starting time loop. The minimal version of the log file is as

How to pass an empty line in a text file using python I have a text file as below. l[0]l[1]l[2]l[3]l[4]l[5]l[6] ----------------------------------- 1| abc is a book and cba too 2| xyz is a pencil and zyx too 3| def is a pen and fed too 4| aaa is Actual file is: abc is a book and cba too xyz is a pencil Verifying / writing lines to a .text file using Python

I'm new both to this site and python, so go easy on me. Using Python 3.3 I'm making a hangman-esque game, and all is working bar one aspect. I want to check whether a string is in a .txt file, and if not, write it on a new line at the end of the .txt

Delete some lines in the text file using the batch

I have a txt file . I want to delete line 4 and line 5 only. Before Line1 Line2 Line3 Line4 (need to delete) Line5 (need to delete) Line6 After Line1 Line2 Line3 Line6 @echo off setlocal EnableDelayedExpansion call :CopyLines < input.txt > output.tx

End-of-line characters from lines read from a text file, using Python When reading lines from a text file using python, the end-line character often needs to be truncated before processing the text, as in the following example: f = open("myFile.txt", "r") for line in f: line = line[:-1] # do something wi How to read a specific line of a text file in Python?

I'm having trouble reading an entire specific line of a text file using Python. I currently have this: load_profile = open('users/file.txt', "r") read_it = load_profile.readline(1) print read_it Of course this will just read one byte of the firs

How to delete a specific line from a text file with python

This question already has an answer here: Fastest Way to Delete a Line from Large File in Python 9 answers How can i remove a specific line from a text file using python. This is my code def Delete(): num=int(input("Enter the line number you would li

Replace strings in a text file using Python adds strange characters

I want to replace a text with a path in each line of a text file using Python, but I am getting weird characters (squares) in the path in output file. Current code: #!/usr/bin/env python f1 = open('input.txt', 'r') f2 = open('output.txt', 'w') for li

splitting text files using python

This question already has an answer here: Read specific columns from a csv file with csv module? 7 answers Im new in reading text files using python. I need to read a file which have in each line 4 data that I need, here is my text file 1 -10 0 0 2

how to get a list of files from a directory to a text file using Python

This question is how to get list of files from a directory into text file using python. Result in the text file should exactly be like this: E:\AA\a.jpg E:\AA\b.jpg ... How to correct the code below: WD = "E:\\AA" import glob files = glob.glob (

django-debug-toolbar: django开发之性能强大的检测工具

$
0
0

Django 是一个 python 重量级 Web 框架。

官网描述:Django 的使用能够容易的以更少的代码更快地构建更好的 Web 应用程序

调试与优化时,我们常常想知道比如以下问题:

执行了多少条 SQL 语句,花费的时间,每次每条语句查询的时间 渲染页面的模板是哪些,渲染时间 缓存是否影响性能

django-debug-toolbar 是一款非常强大的 Django 的性能检测工具

安装 下载 django-debug-toolbar pip install django-debug-toolbar 复制代码 修改 setting.py 保证开启 debug DEBUG = True 复制代码

INSTALLED_APPS 中加入 debug-toolbar

INSTALLED_APPS = ( ...... 'django.contrib.messages', 'django.contrib.staticfiles', 'library.apps.libraryConfig', 'debug_toolbar', ) 复制代码 添加中间件 MIDDLEWARE = [ 'debug_toolbar.middleware.DebugToolbarMiddleware', ...... ...... ] 复制代码 配置 URL

urls.py 文件添加:

from django.conf.urls import include, url if settings.DEBUG: import debug_toolbar urlpatterns = [ url(r'^__debug__/', include(debug_toolbar.urls)), ] + urlpatterns 复制代码 运行项目 python3 manage.py runserver 0.0.0.0:8000 复制代码

这里我借用了 github 上的一个 Django 项目作为实验: 图书馆借还系统

效果图如下:


django-debug-toolbar: django开发之性能强大的检测工具
运作与配置

调试工具栏分两个阶段工作。首先,它在 Django 处理请求时收集数据并将此数据存储在内存中。接着,当在浏览器中打开面板时,它会获取服务器上的数据并显示它。如果在浏览站点时看到过多的 CPU 或内存消耗,则有必要考虑优化“收集”阶段。如果显示面板很慢,则有必要考虑优化“渲染”阶段。

django-debug-toolbar 默认将在过去的 10 个请求期间收集的数据保留在内存中。

可以在 setting.py 中的 DEBUG_TOOLBAR_CONFIG 中通过添加或者修改以下配置进行更改:

RESULTS_CACHE_SIZE : 10 (默认) 复制代码

一些其他配置请参考: Configuration ― Django Debug Toolbar 1.10.1 documentation

Three-Speed Logic

$
0
0

Three-Speed Logic

This article describes a style of coding in python that permits easy mixing of synchronous and asynchronous code. As part of the control software for large microwave telescopes (including the South Pole Telescope ), we have been using this style of code under a Tornado / Python 2.x stack with success.

Unfortunately, architectural changes in Python 3.7 conspire against the @tworoutine . In the hopes of contributing to a lively discussion about Python's asynchronous ecosystem, we describe why they have been so useful to us.

Table of Contents

Coding for Telescopes Enter the @tworoutine A Lament: Of Course There's A Catch

Asynchronous coding in Python was pioneered by third-party libraries like Twisted , Tornado , and gevent . An "official" event-loop implementation landed in Python 3.4 , and was expanded significantly in Python 3.7 . A new breed of asynchronous libraries like curio and trio continue to push the boundaries beyond what's "normal" in the space.

There are also some excellent (and opinionated) articles about Python's asynchronous ecosystem. I don't always agree with them and I don't intend to recapitulate them. To allow me to get the point, though, I will provide a few links that set the stage for what follows.

PEP 3156 -- Asynchronous IO Support Rebooted: the "asyncio" Module I don't understand Python's Asyncio How the heck does async/await work in Python 3.5? Controlling Python Async Creep

Of these, the last one is probably the most interesting because it identifies and attempts to address the same problem we run into when designing telescope tuning software: asynchronous and synchronous coding styles occupy different universes in Python, but it is exteremly useful to mix them freely.

To motivate mixing synchronous and asynchronous code, here is a short description of the kind of code we write for tuning telescopes.

Coding for Telescopes

My day job includes work on CMB telescopes including the South Pole Telescope in Antarctica and the Simons Array on Chile's Atacama Plateau.

The readout electronics in these telescopes is a large array of software defined radios , with many thousands of transmitters and receivers used to bias and measure the leftover signature of the Big Bang. These radios are implemented in hundreds of custom boards hosting FPGAs installed in crates near the telescope, and controlled by a PC. This PC gets the system up and running, controls cryogenic refrigerators, aims the telescope, and captures the torrent of data it produces.

The entire tuning, control, and analysis stack makes very heavy use of Python, along with C, C++, and VHDL. (I am inexpressibly grateful to the many open-source communities we rely on, and it is a great privilege when I can give back in some capacity.)

As you can imagine, we don't just deploy code straight onto the telescope. Along with the telescopes themselves are small-scale installations ranging from a circuit board or two on a benchtop, to crates of cryogenic equipments at university labs around the world. During development, code might be running in a Jupyter notebook or an IPython shell , perhaps with a small crate of electronics or nothing at all. Here, interactive REPL sessions are used to prototype algorithms, explore data, and try out new tuning and analysis techniques.

For an algorithm to be useful in deployment, however, it needs to run at scale. Here's where we use asynchronous code heavily: command interactions with many hundreds of circuit boards are a natural fit for asynchronous coding styles. This leads to the following workflow:


Three-Speed Logic

Design flow, with separate asynchronous/synchronous implementations:

Prototype code, probably synchronous and focused on proofing out an algorithm or technique; Test for function on a small-scale deployment, likely in an interactive (ipython) environment; Re-code the algorithm using an asynchronous style; and Integration testing, optimization, and deployment.

This approach has advantages:

When developing a proof-of-concept, developers are able to ignore performance and focus on the problem (physics, instrumentation, cryogenics, electronics) that they are attempting to address. During prototyping, when interactive exploration is most useful, synchronous code promotes use of environments such as IPython or Jupyter.

However, this workflow has three major disadvantages:

It's clumsy: it requires writing and testing a synchronous version, then shifting it wholesale to an asynchronous environment. It is easy to imagine this workflow looping back on itself as bugs are discovered or introduced along the way. The synchronous version never stops being useful , despite not scaling to telescope-level performance. We would often much rather have the simpler semantics, more predictable control flow, and shorter error traces associated with a synchronous call when debugging or experimenting. In addition, it can be conveniently invoked in a REPL environment -- invaluable if the telescope is operating and we need to do some quick hand-tuning. It's not composable. Over the years, we have build up libraries of useful tuning and control algorithms, and as long as synchronous and asynchronous code is kept distinct, we cannot meaningfully compose algorithms out of smaller pieces without two implementations of everything.

Asking developers to maintaining two versions under different coding idioms (and expecting to keep the versions synchronized) is resolving a technical flaw by requiring skilled labourers to do menial work; this is often an expensive mistake. (Interactive use of asynchronous code is getting easier in IPython 7.0 due to the autoawait functionality. This extension addresses the second but not the third point.)

Instead, we are looking for a way to freely mix asynchronous and synchronous coding styles.

Enter the @tworoutine

What's a @tworoutine ? It is a synchronous wrapper around an asynchronous function, allowing a single piece of code to be called in either idiom.

(If you are following along at home, you will need the source code . You will also need nest_asyncio .)

import tworoutine import asyncio @tworoutine.tworoutine async def double_slowly(x): await asyncio.sleep(0.1) return 2*x

How can we call this function synchronously? Just call it!

>>> double_slowly(1) 2

How did this work? The @tworoutine decorator returns a class whose __call__ method is a synchronous wrapper that obtains an event loop and invokes the asynchronous code, blocking until it's complete. Because we want synchronous calling to be convenient and carpal-tunnel-friendly, that's the default.

If there's already an event loop running, this code is reasonably efficient (aside from being a blocking call, of course!) Any asynchronous events already queued in the event loop are allowed to proceed alongside this one. Only the current execution context is blocked until the coroutine completes.

So much for synchronous calls. How can we call this function asynchronously? We first have to undo or "invert" the wrapper and obtain a reference back to the coroutine.

>>> (~double_slowly)(2) <coroutine object double_slowly at 0x7f5d494fd348>

With the exception of the invert operator around the function name, this is ordinary asynchronous code; there is no additional overhead except for the operator itself. Here is a complete example showing mixed coding styles within an event loop:

async def main(): # Run asynchronously r = await (~double_slowly)(2) print(r) # Run synchronously within an event loop r2 = double_slowly(3) print(r2) # try asynchronous entry asyncio.run(main())

The obvious benefit, here, is the ability to call asynchronous code synchronously when we're too lazy to carry around an event loop or deal with the turtles-all-the-way-down nature of Python's asynchronous coding idiom.


Three-Speed Logic

Design flow with @tworoutine . The synchronous and asynchronous implementations are replaced with a single implementation that can mix idioms.

A Lament: Of Course There's A Catch

@tworoutine 's days are probably numbered. This style of coding has been implicitly but firmly rejected by Python developers:

Issue 22239: asyncio: nested event loop

We have been using this approach (implemented on Python 2.7 and Tornado <4.5) for several years now at the South Pole and elsewhere, and we will have to adapt.

To complete a synchronous @tworoutine call, we need to obtain an event loop, schedule the asynchronous (decorated) call, and block until it is complete. Currently there is no way to do that in Python 3.7 asyncio without patching it. Asynchronous code at any point in the call stack must be linked to the event loop via asynchronous calls only, all the way up.

To work around this problem in the Python 3.7 code shown here, I have used the nest_asyncio monkey patch. It is a short and effective piece of code, but it runs against Python orthodoxy and adopting this kind of patch in production risks being stranded by changes to Python's core libraries.

Without this patch, we are able to upgrade as far as Tornado 4.5 on Python 3.x, but Tornado 5.0 moves to an asyncio event loop and we are suddenly unable to upgrade.

The code examples here have been forward-ported from Python 2.7 and Tornado 4.5 to Python 3.7 and "pure" asyncio. It's an experiment -- this is not production code!


Python String casefold()

$
0
0

Python String casefold()
python string casefold() function returns a casefolded copy of the string. This function is used to perform case-insensitive string comparison. Python String casefold()

Let’s look at a simple example of casefold() function.

Copy

s = 'My name is Pankaj' print(s.casefold()) s1 = 'Python' s2 = 'PyThon' print(s1.casefold() == s2.casefold())

Output:

Copy

my name is pankaj True

From above program, casefold() function looks exactly same asstring lower() function. Actually, it’s effect is same when string is made of ASCII characters.

However, casefolding is much more aggressive and it’s intended to remove all case distinctions in a string.

For example, the German lowercase letter ‘’ is equivalent to “ss”. Since it’s already lowercase, lower() would do nothing to ‘’ but casefold() will convert it to “ss”.

Let’s look at another example to confirm this behavior.

Copy

s1 = '' # typed with Option+s in Mac OS s2 = 'ss' s3 = 'SS' if s1.casefold() == s2.casefold(): print('s1 and s2 are equals in case-insensitive comparison') else: print('s1 and s2 are not-equal in case-insensitive comparison') if s1.casefold() == s3.casefold(): print('s1 and s3 are equals in case-insensitive comparison')

Output:

Copy

s1 and s2 are equals in case-insensitive comparison s1 and s3 are equals in case-insensitive comparison

You can checkout more Python String examples from our GitHub Repository .

Reference: Official Documentation

Divide a string with square brackets using a regular expression in python

$
0
0

Capture a group in a multi-line string with multiple matches using a regular expression in Ruby

I'm trying to capture the String '1611650547*42' in the multiple line String bellow. myString = "'/absfucate/wait.do;cohrAwdSessID=jbreW9yA8R0xh9b? obfuscateId=jbreW9yA8R0xh9b&checksum=1611650547*42&tObfuscate=null& tSession_1DS=null&

How to use the regular expression in python

Sorry this might seem like a repetetive question but I really need help So I have a text file which has a line of the form: Thu Apr 28 20:51:37 +0000 2011 :: Melanie Caldwell :: judeyqwaller :: Hong Kong :: P000352670 - Toshiba Satellite 5205 Series

Removing html tags from text using the regular expression in python

I'm trying to look at a html file and remove all the tags from it so that only the text is left but I'm having a problem with my regex. This is what I have so far. import urllib.request, re def test(url): html = str(urllib.request.urlopen(url).read()

Using multiline regular expressions in Python?

I am using regular expressions in Python to search through a page source, and find all the json information in the javascript. Specifically an example would look something like this: var fooData = { id: 123456789, name : "foo bar", country_name:

How to check if a number starts with special characters using the regular expression

Using following regexp I can get if a string starts with a or b. Pattern p = Pattern.compile("^(a|b)"); Matcher m = p.matcher("0I am a string"); boolean b = m.find(); System.out.println("....output..."+b); But I need to check

Checking the String for Illegal Characters Using the Regular Expression

I want to check a for any illegal character using the following regular expression in php. Essentially, I want to allow only alphanumeric and underscore (_). Unfortunately the follow piece of code does not seem to work properly. It should return true

Find matching brackets using a regular expression

Assuming I have this string: "abc{def{ghi{jkl}mno{pqr}st}uvw}xyz" and I want to match this: "{def{ghi{jkl}mno{pqr}st}uvw}" what should my regular expression look like..? In other words, the match should start with "{" and end

How to extract topics from e-mail headers using a regular expression in Python?

I just start learning regex and encounter a problem when extracting subjects from the email headers. In order to only keep the subjects of each header and also neglect "Re:" and "Fwd:" (case insensitive), I use the following regex whic

How to extract the exact position using a regular expression in Python?

Possible Duplicate: Python Regex Use - How to Get Positions of Matches I am new to python. I have written program where I used regular expression to extract the exact number from webpage using command line arguments.First argument should be 'Amount'

Finding IP addresses using a regular expression in python

I am trying to parse the results of the TSHARK capture Here is the line I am filtering on: Internet Protocol, Src: 10.10.52.250 (10.10.52.250), Dst: 224.0.0.2 (224.0.0.2) I am trying to extract the Src and Dst, Here is my code to do that: str(re.sear

Extract 2 numbers preceded by two different strings from the paragraph using Tcl Regular expression

I need to extract two different numbers preceded by two different strings. Employee Id--> Employee16(I need 16) and Employee links--> Employee links:2 (I need 2). Source String looks like following: Employee16, Employee name is QueenRose Working for

Get a particular string from a data using the regular expression

I am trying to get particular string from the data below.It is too long am here with sharing sample data. From this I have to get the 'france24Id=7GHYUFGty6fdGFHyy56' am not that much familier with regex. How can I retreive the string 'france24Id=7GH

How do I take the first word in a string separated by commas using a regular expression

I want to take the first comma seperated value from that string. "Lines.No;StartPos=3;RightAligned;MaxLength =2" I used "\b.*\;" regex to take "Lines.No". But the result is "Lines.No;StartPos=3;RightAligned;" thanks

Separate a string in JavaScript by using a regular expression I'm trying to write a regex for use in javascript. var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}"; var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script); I would like split t

Pygame or App Development using Kivy or XCode?

$
0
0

Sunil writes:

I would, first of all, like to thank you for the amazing book you have written "python for Kids" It's really helped my nephew learn Coding in a very exciting and fun way. He is now very keen on continuing with Python. He is 8. But at this point, I do not know what would be the next set of things to teach him. Should we proceed with learning Pygame or App Development using Kivy or XCode. I am a little confused on this matter. I tried a bit of reading online, but it did not help my decision making.

What would you suggest is the best Python learning path for an 8-year-old after having followed your book.

I'm quite partial to the idea of Pygame as a natural progression from the games in the book. Adapting the Bounce or Stickman game to use Pygame, instead of tkinter, seems like a good way to get started, and then Pygame has support for audio ( mixer ) and joystick modules, so there's opportunities to enhance and extend the games to add new features. This would be building on learning from the book, which I think is a nice way to move forward. It's certainly going to be the easiest way to get some instant feedback as well (since it's not hard to get up-and-running), so
Pygame or App Development using Kivy or XCode?
from that perspective.

I don't know a lot about Kivy - so I can't really comment on how suitable it would be for an 8 year old -- but even if it proved too difficult, if he's interested in mobile apps, then I don't think that time would be wasted .

XCode will probably mean learning a new language (depending on what your nephew wants to do with it). If iOS apps, then the options are Objective-C or Swift -- which is not to say you're entirely limited to those languages, but you're going to find the most community support with one of the primary languages. Of the two, Swift is the most approachable (IMO), but I still wouldn't call that easy going for a child to pick up.

So with very limited experience in the other two options, I guess my personal preference would be Pygame, then Kivy, finally XCode. Do let me know how you get on...

Reference links for other readers:

Pygame: pygame.org , Getting started documentation Kivy: kivy.org , Getting started documentation XCode: developer.apple.com/xcode , Getting started documentation (iOS)

Placement Update 18-47

$
0
0

It's been a while since the last placement update. Summit happened. Seemed pretty okay, except for the food. People have things they'd like to do with placement.

Most Important

We're starting to approach the point where we're thinking about the possibility of maybe turning off placement-in-nova. We're not there yet, and as is always the case with these kinds of things, it's the details at the end that present the challenges. As such there are a mass of changes spread around nova, placement, devstack, grenade, puppet and openstack-ansible related to making things go. More details on those below, but what we need is the same as ever: reviews. Don't be shy. If you're not a core or not familiar with placement, reviews are still very helpful. A lot of the patches take the form of "this might be the right way to do this".

What's Changed

There is now a placement-manage command which can do database migrations, driven by alembic. This means that the devstack patch which uses the extracted placement can merge soon. Several other testing related (turning on tempest and grenade for placement) changes depend-on that.

Matt did a placement-status command which has a no-op we-are-here upgrade check. We've already met the python3 goals (I think?), so I reckon placement is good to go on community-wide goals. Woot.

The PlacementFixture that placement provides for other projects to do functional tests with it has merged. There's a patch for nova using it .

The spec for counting quota usage in placement has been revived after learning at summit that a proposed workaround that didn't use placement wasn't really all that good for people using cells v2.

Bugs Placement related bugs not yet in progress : 17. +1. In progress placement bugs 13. +2 Specs

Summit and U.S. Thanksgiving has disrupted progress on some of these, but there are still plenty of specs awaiting their future.

Many of these have unaddressed negative review comments.

https://review.openstack.org/#/c/544683/ Account for host agg allocation ratio in placement (Still in rocky/)

https://review.openstack.org/#/c/595236/ Add subtree filter for GET /resource_providers

https://review.openstack.org/#/c/597601/ Resource provider - request group mapping in allocation candidate

https://review.openstack.org/#/c/549067/ VMware: place instances on resource pool (still in rocky/)

https://review.openstack.org/#/c/555081/ Standardize CPU resource tracking

https://review.openstack.org/#/c/599957/ Allow overcommit of dedicated CPU (Has an alternative which changes allocations to a float)

https://review.openstack.org/#/c/591037/ Modelling passthrough devices for report to placement

https://review.openstack.org/#/c/603955/ Nova Cyborg interaction specification.

https://review.openstack.org/#/c/601596/ supporting virtual NVDIMM devices

https://review.openstack.org/#/c/603352/ Spec: Support filtering by forbidden aggregate

https://review.openstack.org/#/c/552924/ Proposes NUMA topology with RPs

https://review.openstack.org/#/c/569011/ Count quota based on resource class

https://review.openstack.org/#/c/141219/ Adds spec for instance live resize

https://review.openstack.org/#/c/612497/ Provider config YAML file

https://review.openstack.org/#/c/509042/ Propose counting quota usage from placement and API database

https://review.openstack.org/603545 Resource modeling in cyborg.

Main Themes Making Nested Useful

Progress is being made on gpu-reshaping for libvirt and xen:

https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open

Making use of nested is bandwidth-resource-provider:

https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/bandwidth-resource-provider

Somewhat related to nested are a stack of changes to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Stack at: https://review.openstack.org/#/c/615646/ Extraction

(There's an etherpad which tracks some of the work related to extraction. Please refer to that for additional information.)

TripleO and OpenStack-Ansible are both working on tooling to install and/or upgrade to extracted placement:

https://review.openstack.org/#/q/topic:tripleo-placement-extraction https://review.openstack.org/#/q/project:openstack/openstack-ansible-os_placement

libvirt support for GPU reshaping:

https://review.openstack.org/#/c/599208/

Grenade and tempest testing for extracted placement:

Extracted placement in devstack: https://review.openstack.org/600162 Turning on tests: https://review.openstack.org/#/c/617565/ Some fixes to grenade using python3: https://review.openstack.org/#/c/619728/

A replacement for placeload performance testing that was in the nova-next job: https://review.openstack.org/#/c/619248/ . This might be of interest to people trying to do testing of live services without devstack. It starts with a basic node, turns on mysql, runs placement with uwsgi, and does the placeload testing. Note that this found a pretty strange bug in _ensure_aggregates .

Documentation tuneups:

Front page: https://review.openstack.org/#/c/619273/ Release-notes: https://review.openstack.org/#/c/618708/ This is blocked until we refactor the release notes to reflect now better. The main remaining task here is participating in openstack-manuals .

We've been putting off making a decision about os-resource-classes. Anyone have strong opinions?

Other

Besides the 20 or so open changes in placement itself, and those mentioned above, here are some other changes that may be of interest.

https://review.openstack.org/#/q/topic:bp/initial-allocation-ratios Improve handling of default allocation ratios

https://review.openstack.org/#/q/topic:minimum-bandwidth-allocation-placement-api Neutron minimum bandwidth implementation

https://review.openstack.org/#/c/602160/ Add OWNERSHIP $SERVICE traits

https://review.openstack.org/#/c/586960/ zun: Use placement for unified resource management

https://review.openstack.org/#/q/topic:cd/gabbi-tempest-job Using gabbi-tempest for integration tests.

https://review.openstack.org/#/q/project:openstack/blazar+topic:bp/placement-api Blazar using the placement-api

https://review.openstack.org/619626 Tenks doing some node management, with a bit of optional placement.

https://review.openstack.org/617273 Extracted placement in loci

https://review.openstack.org/#/c/613589/ Extracted placement in kolla

https://review.openstack.org/#/c/613629/ Extracted placement in kolla-ansible

End

Lot going on. Thanks to everyone for their contributions.

Not Invented Here: Python Argument Surprise

$
0
0

python function signatures are flexible, complex beasts, allowing for positional, keyword, variable, and variable keyword arguments (and parameters). This can be extremely useful, but sometimes the intersection between these features can be confusing or even surprising, especially on Python 2. What do you expect this to return?

>>> def test(arg1, **kwargs): ... return arg1 >>> test(**{'arg1': 42}) ...

Contents

Terminology Surprises Non-Default Parameters Accept Keyword Arguments Corollary: Variable Keyword Arguments Can Bind Non-Default Parameters Corollary: Positional Parameters Consume Keyword Arguments Default Parameters Accept Positional Arguments Corollary: Variable Positional Arguments Can Bind Default Parameters Mixing Variable Parameters and Keyword Arguments Will Break Functions Implemented In C Can Break The Rules Python 3 Improvements Terminology

Before we move on, let's take a minute to define some terms.

parameter The name of a variable listed by a function definition. These are sometimes called "formal parameters" or "formal arguments." In def foo(a, b) , a and b are parameters. argument The expression given to a function application (function call). In foo(1, "str") , 1 and "str" are arguments. function signature The set of parameters in a function definition. This is also known as a "parameter list." binding The process of associating function call arguments to the parameter names given in the function's signature. In foo(1, "str") , the parameter a will be assigned the value 1 , and the parameter b will be assigned the value "str" . This is also called "argument filling." default parameter In a function signature, a parameter that is assigned a value. An argument for this parameter does not have to be given in a function application; when it is not, the default value is bound to the parameter. In def foo(a, b=42) , b=42 creates a default parameter. It can also be said that b has a default parameter value. The function can be called as foo(1). positional argument An argument in a function call that's given in order of the parameters in the function signature, from left to right. In foo(1, 2) , 1 and 2 are positional arguments that will be bound to the parameters a and b . keyword argument An argument in a function call that's given by name, matching the name of a parameter. In foo(a=1) , a=1 is a keyword argument, and the parameter a will have the value 1 . variable parameters A function signature that contains *args (where args is an arbitrary identifier) accepts an arbitrary number of unnamed arguments in addition to any explicit parameters. Extra parameters are bound to args as a list. def foo(a, b, *args) creates a function that has variable parameters, and foo(1, 2) , foo(1, 2, 3) , foo(1, 2, 3, 4) are all valid ways to call it. This is commonly called "varargs," for "variable arguments" (even though it is a parameter definition). variable positional arguments Passing an arbitrary (usually unknown from the function call itself) number of arguments to a function by unpacking a sequence. Variable arguments can be given to a function whether or not it accepts variable parameters (if it doesn't, the number of variable arguments must match the number of parameters). This is done using the * syntax: foo(*(1, 2)) is the same as writing foo(1, 2) , but more often the arguments are created dynamically. For example, args = (1, 2) if full_moon else (3, 4); foo(*args) . variable keyword parameters A function signature that contains **kwargs (where kwargs is an arbitrary identifier) accepts an arbitrary number of keyword arguments in addition to any explicit parameters (with or without default values). The definition def foo(a, b, **kwargs) creates a function with a variable keyword parameter. It can be called like foo(1, 2) or foo(1, 2, c=42, d=420) . variable positional arguments Similar to variable positional arguments , but using keyword arguments. The syntax is ** , and the object to be unpacked must be a mapping; extra arguments are placed in a mapping bound to the parameter identifier. A simple example is foo(**{'b':"str", 'a':1}) .

Some language communities are fairly strict about the usage of these terms, but the Python community is often fairly informal. This is especially true when it comes to the distinction between parameters and arguments (despite it being a FAQ ) which helps lead to some of the surprises we discuss below.

Surprises

On to the surprises. These will all come from the intersection of the various terms defined above. Not all of these will surprise everyone, but I would be surprised if most people didn't discover at least one mildly surprising thing.

Non-Default Parameters Accept Keyword Arguments

Any parameter can be called using a keyword argument, whether or not it has a default parameter value:

>>> def test(a, b, c=42): ... return (a, b, c) >>> test(1, 2) (1, 2, 42) >>> test(1, b='b') (1, 'b', 42) >>> test(c=1, b=2, a=3) (3, 2, 1)

This is surprising because sometimes parameters with a default value are referred to as "keyword parameters" or "keyword arguments," suggesting that only they can be called using a keyword argument. In reality, the parameter just has a default value. It's the function call site that determines whether to use a keyword argument or not.

One consequence of this: the parameter names of public functions, even if they don't have a default, are part of the public signature of the function. If you distribute a library, people can and will call functions using keyword arguments for parameters you didn't expect them to. Changing parameter names can thus break backwards compatibility. (Below we'll see how Python 3 can help with this.)

Corollary: Variable Keyword Arguments Can Bind Non-Default Parameters

If we introduce variable keyword arguments, we see that this behaviour is consistent:

>>> kwargs = {'a': 'akw', 'b': 'bkw'} >>> test(**kwargs) ('akw', 'bkw', 42)

Corollary: Positional Parameters Consume Keyword Arguments

Knowing what we know now, we can answer the teaser from the beginning of the article:

>>> def test(arg1, **kwargs): ... return arg1, kwargs >>> test(**{'arg1': 42}) (42, {})

The named parameter arg1 , even when passed in a variable keyword argument, is still bound by name. There are no extra arguments to place in kwargs .

Default Parameters Accept Positional Arguments

Any parameter can be called using a positional argument, whether or not it has a default parameter value:

>>> def test(a=1, b=2, c=3): ... return (a, b, c) >>> test('a', 'b', 'c') ('a', 'b', 'c')

This is the inverse of the previous surprise. It may be surprising for the same reason, the conflation of keyword arguments and default parameter values.

Of course, convention often dictates that default parameters are passed using keyword arguments, but as you can see, that's not a requirement of the language.

Corollary: Variable Positional Arguments Can Bind Default Parameters

Introducing variable positional arguments shows consistent behaviour:

>>> pos_args = ('apos', 'bpos') >>> test(*pos_args) ('apos', 'bpos', 3)

Mixing Variable Parameters and Keyword Arguments Will Break

Suppose we'd like to define some parameters with default values (expecting them to be passed as keyword arguments by convention), and then also allow for some extra arguments to be passed:

>>> def test(a, b=1, *args): ... return (a, b, args)

The definition works. Now lets call it in some common patterns:

>>> test('a', 'b') ('a', 'b', ()) >>> test('a', 'b', 'c') ('a', 'b', ('c',)) >>> test('a', b=1) ('a', 1, ()) >>> test('a', b='b', *(1, 2)) Traceback (most recent call last): ... TypeError: test() got multiple values for argument 'b'

As long as we don't mix keyword and variable (extra) arguments, everything works out. But as soon as we mix the two, the variable positional arguments are bound first, and then we have a duplicate keyword argument left over for b .

This is a common enough source of errors that, as we'll see below, Python 3 added some extra help for it, and linters warn about it .

Functions Implemented In C Can Break The Rules

We generally expect to be able to call functions with keyword arguments, especially when the corresponding parameters have default values, and we expect that the order of keyword arguments doesn't matter. But if the function is not implemented in Python, and instead is a built-in function implemented in C, that may not be the case. Let's look at the built-in function open . On Python 3, if we ask for the function signature, we get something like this:

>>> import math >>> help(math.tan) Help on built-in function tan in module math: <BLANKLINE> ... tan(x) <BLANKLINE> Return the tangent of x (measured in radians). <BLANKLINE>

That sure looks like a parameter with a name, so we expect to be able to call it with a keyword argument:

>>> math.tan(x=1) Traceback (most recent call last): ... TypeError: tan() takes no keyword arguments

This is due to how C functions bind Python arguments into C variables, using functions like PyArg_ParseTuple .

In newer versions of Python, this is indicated with a trailing / in the function signature, showing that the preceding arguments are positional only paramaters . (Note that Python has no syntax for this.)

>>> help(abs) Help on built-in function abs in module builtins: <BLANKLINE> abs(x, /) Return the absolute value of the argument.

Python 3 Improvements

Python 3 offers ways to reduce some of these surprising characteristics. (For backwards compatibility, it doesn't actually eliminate any of them.)

We've already seen that functions implemented in C can use new syntax in their function signatures to signify positional-only arguments. Plus, more C functions can accept keyword arguments for any arbitrary parameter thanks to new functions and the use of tools like Argument Clinic .

The most important improvements, though, are available to Python functions and are outlined in the confusingly named PEP 3102 : Keyword-Only Arguments.

With this PEP, functions are allowed to define parameters that can only be filled by keyword arguments. In addition, this allows functions to accept both variable arguments and keyword arguments without raising TypeError .

This in done by simply moving the variable positional parameter before any parameters that should only be allowed by keyword:

>>> def test(a, *args, b=42): ... return (a, b, args) >>> test(1, 2, 3) (1, 42, (2, 3)) >>> test(1, 2, 3, b='b') (1, 'b', (2, 3)) >>> test(1, 2, 3, b='b', c='c') Traceback (most recent call last): ... TypeError: test() got an unexpected keyword argument 'c' >>> test() Traceback (most recent call last): ... TypeError: test() missing 1 required positional argument: 'a'

What if you don't want to allow arbitrary unnamed arguments? In that case, simply omit the variable argument parameter name:

>>> def test(a, *, b=42): ... return (a, b) >>> test(1, b='b') (1, 'b')

Trying to pass extra arguments will fail:

>>> test(1, 2, b='b') Traceback (most recent call last): ... TypeError: test() takes 1 positional argument but 2 positional arguments (and 1 keyword-only argument) were given

Finally, what if you want to require certain parameters to be passed by name? In that case, you can simply leave off the default value for the keyword-only parameters:

>>> def test(a, *, b): ... return (a, b) >>> test(1) Traceback (most recent call last): ... TypeError: test() missing 1 required keyword-only argument: 'b' >>> test(1, b='b') (1, 'b')

The above examples all produce SyntaxError on Python 2. Much of the functionality can be achieved on Python 2 using variable arguments and variable keyword arguments and manual argument binding, but that's slower and uglier than what's available on Python 3. Lets look at an example of implementing the first function from this section in Python 2:

>>> def test(*args, **kwargs): ... "test(a, *args, b=42) -> tuple" # docstring signature for Sphinx ... # This raises an IndexError instead of a TypeError if 'a' ... # is missing; that's easy to fix, but it's a programmer error ... a = args[0] ... args = args[1:] ... b = kwargs.pop('b', 42) ... if kwargs: ... raise TypeError("Got extra keyword args %s" % (list(kwargs))) ... return (a, b, args) >>> test(1, 2, 3) (1, 42, (2, 3)) >>> test(1, 2, 3, b='b') (1, 'b', (2, 3)) >>> test(1, 2, 3, b='b', c='c') Traceback (most recent call last): ... TypeError: Got extra keyword args ['c'] >>> test() Traceback (most recent call last): ... IndexError: tuple index out of range

Python Basics: List Comprehensions

$
0
0

Python Basics: List Comprehensions
Python Basics: List Comprehensions

After reading this article you’ll learn:

What are the list comprehensions in python What are set comprehensions and dictionary comprehensions What are List Comprehensions?

List comprehensionsprovide us with a simple way to create a list based on some iterable . During the creation, elements from the iterable can be conditionally included in the new list and transformed as needed.

An iterable is something you can loop over . If you want a more detailed explanation you can read myprevious blog post.

The components of a list comprehension are:

Output Expression ( Optional ) Iterable Iterator variable which represents the members of the iterable
Python Basics: List Comprehensions
Example

Output:

[1, 4, 9, 16, 25]

We can also create more advanced list comprehensions which include a conditional statement on the iterable .


Python Basics: List Comprehensions
Example

Output:

[9, 16, 25] List Comprehensions vsloops

The list comprehensions are more efficient both computationally and in terms of coding space and time than a for loop. Typically, they are written in a single line of code.

Let’s see how much more space we’ll need to get the same result from the last example using a for loop.

We can clearly see that the list comprehension above was much easier to write. However, keep in mind that:

Every list comprehension can be rewritten as a for loop, but not every for loop can be rewritten as a list comprehension.

Source: https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

What about the computational speed? We can use the timeit library to compare the speed of a for loop vs the speed of a list comprehension. We can also pass the number of executions using the number argument. We’ll set this argument to 1 million.

Output:

6.255051373276501 3.7140220287915326

I’ve run this on my machine, so you can get different results. However, the list comprehension implementation will be faster in all cases.

List Comprehensions vs map andfilter

List comprehensions are a concise notation borrowed from the functional programming language Haskell . We can think of them like a syntactic sugar for the filter and map functions.

We have seen that list comprehensions can be a good alternative to for loops because they are more compact and faster.

Lambda Functions

Lambda functions are small anonymous functions. They can have any number of arguments but can have only one expression.

Mostly, the lambda functions are passed as parameters to functions which expect a function object as one of their parameters like map and filter .

Map Function

The map function returns an iterator that applies a function to every item of iterable , yielding the results. Let’s compare it with a list comprehension.

Output:

[1, 4, 9, 16, 25] [1, 4, 9, 16, 25] Filter Function

The filter function constructs an iterator from elements of iterable for which the passed function returns true. Again, let’s compare the filter function versus the list comprehensions.

Output:

[2, 4] [2, 4] More Complex List Comprehensions

Additionally, when we’re creating a list comprehension we can have many conditional statements on the iterable .


Python Basics: List Comprehensions

Output:

[6, 18]

Moreover, we can also have an if-else clause on the output expression .


Python Basics: List Comprehensions

Output:

['small', 'big'] Readability

We can see that some list comprehensions can be very complex and it’s hard to read them. Python allows line breaks between brackets and braces. We can use this to make our complex comprehension more readable.

For example, we can our last transform example to this:

Output:

['small', 'big']

However, be careful with the list comprehensions, in some cases is better to use for loops. If your code is not readable, it’s better to use for loops.

Nested ForLoops

In some cases, we need nested for loops to complete some task. In this cases, we can also use a list comprehension to achieve the same result.

Imagine that we have a matrix and we want to flatten it. We can do this easily with two for loops like this:

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

We can achieve the same result using a list comprehension .

Tip: the order of the for clauses remain the same as in the original for loops.

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9] Nested List Comprehensions

In other cases, we may need to create a matrix . We can do that with nested list comprehensions . This sound a little bit crazy, but the concept is simple.

One list comprehension returns a list, right? So, if we place a list comprehension in the output expression of another list comprehension, we’ll get a matrix as result.

Output:

[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

The range type represents an immutable sequence of numbers and is commonly used for looping a specific number of times in for loops

Source: https://docs.python.org/3/library/stdtypes.html#ranges

Other Comprehensions

In Python, we have also dictionary comprehensions and set comprehensions . All the principles we saw are the same for these comprehensions, too. We just have to know some very little details to create a dictionary or set comprehensions.

Dictionary Comprehensions To create a dictionary comprehension we just need to change the brackets [] to curly braces {} . Additionally, in the output expression, we need to separate key and value by a c

PS: Import and Exporting Certs

$
0
0
Description:

So as part of the provisioning process, many companies will have their servers import and export certs. It shouldn’t matter if you use a third party CA or an Enterprise CA, these scripts simply create a CSR ‘Request-NewCert’ and import the .cer file ‘Import-Cert’.

To Resolve:

1a. Go to my gwSecurity section on Github and run the scripts for importing and exporting certificates.

The ‘Request-NewCert’ will create a CSR that you can run through a third party CA and get the .cer file to import.

Then you can run ‘Import-Cert’ to import it to the Cert:\LocalMachine\My\ location.

If you want, you can also run the ‘Show-ComputerCerts’ scripts to open an MMC file directly to your local machine certificates.

2. After importing, make sure that you see the lock icon next to the certs name. This verifies you have both the public and private key for the cert.

I have seen cases where certs didn’t import correctly. If that happens, just run:

Open an admin CMD prompt and type:

certutil repairstore my <serial number>….(get SerialNumber from viewing the cert properties; make sure to remove any special characters or spaces)

Example:

certutil repairstore my 43e5e29096b64fd91a03b44eb040283f


Python nested loop with condition

$
0
0
my_list = [[1,2],[1,3],[1,3],[1,3]] my_var = 7 My goal is to be able to see if my_var is larger than all of the positions at my_list[0][1] and my_list[1][1] and my_list[2][1] and so on.

my_list can vary in length and my_var can also vary so I am thinking a loop is the best bet?

*very new to python

all(variable > element for element in list)

or for element i of lists within a list

all(variable > sublist[i] for sublist in list)

This has the advantage of kicking out early if any of the elements is too large. This works because the ... for ... in list is an instance of Python's powerful and multifarious generator expressions . They are similar to list comprehensions but only generate values as needed.

all itself just checks to see if all of the values in the iterable it's passed are true. Generator expressions, list comprehensions, lists, tuples, and other sorts of sequences are all iterables.

So the first statement ends up being equivalent to calling a function like

def all_are_greater_than_value(list, value): for element in list: if element <= value: return False return True

with all_are_greater_than_value(list, variable) ...

or simply

all_greater = True for element in list: if element <= value: all_greater = False break

However, the generator expression is generally preferred, being more concise and "idiomatic".

Plant AI―Deploying Deep Learning Models

$
0
0
Plant AI ― Deploying Deep LearningModels

So in my last post , I talked about how I built Plant AI ― a Plant Disease detection model using Convolutional Neural Network. At the end, we had a model which we would be deploying in this post. The code for the Plant AI can be found here and the output here .


Plant AI―Deploying Deep Learning Models

But what good is a Model that can’t be used? So in this Post I will talk about how I deployed this model for use with an Android Application via an API.

To deploy our trained model for use via an API, we would do something similar to the following:

Load our trained model Accept incoming data and preprocess it Predict using our loaded model Handling the prediction output.

We can use our model in production in many ways such as:

Load it directly in our application: Here we assume your model can be saved alongside other application files. This allows us load the model directly. We could load our model by simply writing

model_file = pickle.load(open(“cnn_model.pkl”,’rb’))

or model_file = load_model(‘my_model.h5’)

Make the model available for use via an API. Which is what I did for Plant AI. There are many ways in which we can make a model available via an API. Some of them include:

Custom REST-API with Django or Flask: in this case we build a custom REST-API with either one of Django of Flask . With this we have to make our model available in our project folder as stated above, load it and perform prediction then send back result as a JSON response.

Tensorflow: we could also used Tensorflow to deploy our machine learning model using Tensorflow Serving . Tensorflow Serving was developed for deploying machine learning models to production and as such it contains dome out of the box tools for integration with Tensorflow models or any other model. You can check outthis article on how to Machine learning models with Tensorflow.

AWS Lambda/Serverless: this involves the use of AWS Lambda to make your deep learning model available. You could check out AWS Documentation on deploying deep learning model with Tensorflow for help with this.

Kubernetes: another option is to make use of Kubernetes to deploy your model. You could check out Kubernetes Documentation or this medium post for guide on how to deploy your deep learning model with Kubernetes, Docker and Flask

I developed the API for Plant AI with Django ― python Web Framework. Whatever Python Framework you decide to use the process should be the same.

First make sure you have these packages installed
Plant AI―Deploying Deep Learning Models

Django : a python web framework. It perfectly fine to use Flask or any other Python web framework of your choice.

Django-heroku (Only needed if you will be hosting your app on Heroku )

Django-RestFramework : a powerful tool for building web APIs.

Gunicorn : is a WSGI HTTP Server for UNIX.

Numpy : a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. (Source: Wikipedia )

Keras : Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, or Theano. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.(Source: Wikipedia )

Sklearn : a free software machine learning library for the Python programming language. It features various classification , regression and clustering algorithms including support vector machines , random forests , gradient boosting , k -means and DBSCAN , and is designed to interoperate with the Python numerical and scientific libraries

Real-time 3D Face Reconstruction

$
0
0
Face Alignment in Full Pose Range: A 3D Total Solution
Real-time 3D Face Reconstruction
[Updates] 2018.11.17 : Refine code and map the 3d vertex to original image space. 2018.11.11 : Update end-to-end inference pipeline: infer/serialize 3D face shape and 68 landmarks given one arbitrary image, please see readme.md below for more details. 2018.11.9 : Update trained model with higher performance in models . 2018.11.9 : Add removed-neck version of BFM model in BFM_Remove_Neck . 2018.10.4 : Add Matlab face mesh rendering demo in visualize . 2018.9.9 : Add pre-process of face cropping in benchmark . Introduction

This repo holds the pytorch improved re-implementation of paper Face Alignment in Full Pose Range: A 3D Total Solution . Several additional works are added in this repo, including real-time training, training strategy and so on. Therefore, this repo is far more than re-implementation. One related blog will be published for some important technique details in future. As far, this repo releases the pre-trained first-stage pytorch models of MobileNet-V1 structure, the training dataset and code. And the inference time is about 0.27ms per image (input batch with 128 iamges) on GeForce GTX TITAN X.

These repo will keep being updated, thus any meaningful issues and PR are welcomed.

Several results on ALFW-2000 dataset (inferenced from model phase1_wpdc_vdc.pth.tar ) are shown below.


Real-time 3D Face Reconstruction
Real-time 3D Face Reconstruction
Applications Face Alignment
Real-time 3D Face Reconstruction
Face Reconstruction
Real-time 3D Face Reconstruction
Getting started Requirements PyTorch >= 0.4.1 python >= 3.6 (Numpy, Scipy, Matplotlib) Dlib (Dlib is used for detecting face and landmarks. There is no need to use Dlib if you can provide face bouding bbox and landmarks. Optionally, you can use the two-step inference strategy without initialized landmarks.) OpenCV (Python version, for image IO opertations.) # installation structions sudo pip3 install torch torchvision # for cpu version. more option to see https://pytorch.org sudo pip3 install numpy scipy matplotlib sudo pip3 install dlib==19.5.0 # 19.15+ version may cause conflict with pytorch, this may take several minutes sudo pip3 install opencv-python

In addition, I strongly recommend using Python3.6+ instead of older version for its better design.

Usage

Clone this repo (this may take some time as it is a little big)

git clone https://github.com/cleardusk/3DDFA.git # or git@github.com:cleardusk/3DDFA.git cd 3DDFA

Run the main.py with arbitrary image as input

python3 main.py -f samples/test1.jpg

If you can see these output log in terminal, you run it successfully.

Dump tp samples/test1_0.ply Dump tp samples/test1_0.mat Save 68 3d landmarks to samples/test1_0.txt Dump tp samples/test1_1.ply Dump tp samples/test1_1.mat Save 68 3d landmarks to samples/test1_1.txt Save visualization result to samples/test1_3DDFA.jpg

Because test1.jpg has two faces, there are two mat (stores dense face vertices, can be rendered by Matlab, see visualize ) and ply files (can be rendered by Meshlab or Microsoft 3D Builder) predicted.

Please run python3 main.py -h or review the code for more details.

The result samples/test1_3DDFA.jpg is shown below


Real-time 3D Face Reconstruction

Additional example

python3 ./main.py -f samples/emma_input.jpg --box_init=two --dlib_bbox=false
Real-time 3D Face Reconstruction
Citation @article{zhu2017face, title={Face Alignment in Full Pose Range: A 3D Total Solution}, author={Zhu, Xiangyu and Lei, Zhen and Li, Stan Z and others}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2017}, publisher={IEEE} } @misc{3ddfa_cleardusk, author = {Jianzhu Guo and Xiangyu Zhu}, title = {3DDFA}, howpublished = {\url{https://github.com/cleardusk/3DDFA}}, year = {2018} } Inference speed

When batch size is 128, the inference time of MobileNet-V1 takes about 34.7ms. The average speed is about 0.27ms/pic .


Real-time 3D Face Reconstruction
Evaluation

First, you should download the cropped testset ALFW and ALFW-2000-3D in test.data.zip , then unzip it and put it in the root directory. Next, run the benchmark code by providing trained model path. I have already provided four pre-trained models in models directory. These models are trained using different loss in the first stage. The model size is about 13M due to the high efficiency of MobileNet-V1 structure.

python3 ./benchmark.py -c models/phase1_wpdc_vdc.pth.tar

The performances of pre-trained models are shown below. In the first stage, the effectiveness of different loss is in order: WPDC > VDC > PDC. While the strategy using VDC to finetune WPDC achieves the best result.

Model AFLW (21 pts) AFLW 2000-3D (68 pts) phase1_pdc.pth.tar 6.956±0.981 5.644±1.323 phase1_vdc.pth.tar 6.717±0.924 5.030±1.044 phase1_wpdc.pth.tar 6.348±0.929 4.759±0.996 phase1_wpdc_vdc.pth.tar 5.401±0.754 4.252±0.976 phase1_wpdc_vdc_v2.pth.tar [newly add] 5.298±0.776 4.090±0.964 Training

The training scripts lie in training directory. The related resources are in below table.

Data Link Description train.configs BaiduYun or Google Drive , 217M The directory contraining 3DMM params and filelists of training dataset train_aug_120x120.zip BaiduYun or Google Drive , 2.15G The cropped images of augmentation training dataset test.data.zip BaiduYun or Google Drive , 151M The cropped images of AFLW and ALFW-2000-3D testset model_refine.mat BaiduYun or Google Drive , 160M BFM Model without neck part

After preparing the training dataset and configuration files, go into training directory and run the bash scripts to train. The training parameters are all presented in bash scripts.

FQA

Face bounding box initialization

The original paper validates that using detected bounding box instead of ground truth box will cause a little performance drop. Thus the current face cropping method is robustest. Quantitative results are shown in below table.


Real-time 3D Face Reconstruction
Acknowledgement Thanks for Xiangyu Zhu 's great work. Thanks for Yao Feng 's fantastic work PRNet and face3d .

Thanks for your interest in this repo. If your work or research benefit from this repo, please cite it and star it :smiley:

Logging a Big Process

$
0
0

Lets say you have a web site. When the user clicks a link it runs a process that generates a huge report, lots of ins and outs . Lots of places where some of the data might be questionable, but not bad enough to give up. What you really want to do is warn the user. The problem is your code is pretty modular. You could pass around a variable to keep track of the issues, but wouldn’t it be better if there were a more unified approach? Some sort of error accumulator… maybe a logger. Wait that’s built in to python. This works:

# other_module.py
import logging
logger = logging.getLogger('my logger')
def f3():
logger.debug('test f3')

and

# logging_example.py
import logging
try:
from cStringIO import StringIO # Python 2
except ImportError:
from io import StringIO
import other_module
log_stream = StringIO()
logging.basicConfig(
stream=log_stream,
level=logging.DEBUG,
format='%(module)s.%(funcName)s:%(lineno)d - %(message)s'
)
logger = logging.getLogger('my logger')
def f1():
logger.error('test f1')
def f2():
logger.debug('test f2')
f1()
f2()
other_module.f3()
error_message = log_stream.getvalue()
print(error_message)

Results in :

logging_example.f1:16 - test f1
logging_example.f2:20 - test f2
other_module.f3:8 - test f3

Visualize your own image fitler

$
0
0

First load necessary packages

import pandas as pd import numpy as np from keras.models import Sequential from keras.layers import Dense, Dropout, Conv2D import keras.backend as K import scipy, imageio import matplotlib.pyplot as plt from PIL import Image %matplotlib inline

Then show original picture of my Jeep

# 首先将图片读入为矩阵 # 我们可以用pyplot的imshow()方法来展示图片 # 这是我曾经拥有的牧马人JK Rubicon Unlimited # img_data = imageio.imread('./pics/wranglerJK.jpg') print(img_data.data.shape) img = Image.fromarray(img_data, 'RGB') plt.imshow(img)
Visualize your own image fitler
添加图片说明
Visualize your own image fitler
Visualize your own image fitler
Visualize your own image fitler

Now, build our 2-D convolutional function that takes a custom filter matrix and comput the filtered output image matrix.

def my_init(shape, dtype=None): new_mat = np.zeros((shape[0], shape[1], 3, 3)) for i in range(shape[0]): for j in range(shape[1]): new_mat[:, :, i, j] = filter_mat return np.array(new_mat, dtype=dtype) def MyFilter(filter_mat): print(len(filter_mat.shape)) if len(filter_mat.shape)!=2: print('Invalid filter matrix. It must be 2-D') return [] else: kernel_size=filter_mat.shape row, col, depth = img_data.shape input_shape=img_data.shape filter_size = row*col*depth print(filter_size) model = Sequential() model.add(Conv2D(depth, kernel_size=kernel_size, input_shape=input_shape, padding='same', activation='linear', data_format='channels_last', kernel_initializer=my_init, name='Conv') ) model.add(Dense(1, activation='linear')) model.compile(optimizer='sgd', loss='mse') model.summary() inX = model.input outputs = [layer.output for layer in model.layers if layer.name=='Conv'] functions = [K.function([inX], [out]) for out in outputs] layer_outs = [func([img_data.reshape(1, row, col, depth)]) for func in functions] activationLayer = layer_outs[0][0] temp = (activationLayer-np.min(activationLayer)) normalized_activationLayer = temp/np.max( np.max(temp)) return(normalized_activationLayer.reshape(row, col, depth))

Now, insert our own fixed filter matrix and get the output, using pyplot.imshow() to display the filtered picture. This time we throw in an edge detector.

filter_mat = np.array([-1, -2, -3, 0, 0, 0, 1, 2, 3]).reshape(3, 3) outLayer = MyFilter(filter_mat) plt.imshow(outLayer)

Below is the filtered picture.

Viewing all 9596 articles
Browse latest View live