Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all 9596 articles
Browse latest View live

Python内置函数(7)――bytearray

$
0
0

英文文档:

class bytearray ( [ source [, encoding [, errors ] ] ] )

Return a new array of bytes. Theclass is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described inMutable Sequence Types, as well as most methods that thetype has, seeBytes and Bytearray Operations.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string , you must also give the encoding (and optionally, errors ) parameters;then converts the string to bytes using. If it is an integer , the array will have that size and will be initialized with null bytes. If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array. If it is an iterable , it must be an iterable of integers in the range 0 <= x < 256 , which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

说明:

1. 返回值为一个新的字节数组

2. 当3个参数都不传的时候,返回长度为0的字节数组

>>> b = bytearray() >>> b bytearray(b'') >>> len(b) 0

3. 当source参数为字符串时,encoding参数也必须提供,函数将字符串使用str.encode方法转换成字节数组

>>> bytearray('中文') Traceback (most recent call last): File "<pyshell#48>", line 1, in <module> bytearray('中文') TypeError: string argument without an encoding >>> bytearray('中文','utf-8') bytearray(b'\xe4\xb8\xad\xe6\x96\x87')

4. 当source参数为整数时,返回这个整数所指定长度的空字节数组

>>> bytearray(2) bytearray(b'\x00\x00') >>> bytearray(-2) #整数需大于0,使用来做数组长度的 Traceback (most recent call last): File "<pyshell#51>", line 1, in <module> bytearray(-2) ValueError: negative count

5. 当source参数为实现了buffer接口的object对象时,那么将使用只读方式将字节读取到字节数组后返回

6. 当source参数是一个可迭代对象,那么这个迭代对象的元素都必须符合0<=x<256,以便可以初始化到数组里

>>> bytearray([1,2,3]) bytearray(b'\x01\x02\x03') >>> bytearray([256,2,3]) #不在0-255范围内报错 Traceback (most recent call last): File "<pyshell#53>", line 1, in <module> bytearray([256,2,3]) ValueError: byte must be in range(0, 256)

I like the Python 3 string .translate method

$
0
0

Suppose, hypothetically, that you wanted to escape the & character in text as a HTML entity:

txt = txt.replace('&', '&')

Okay, maybe there's a character or two more:

... txt = txt.replace('<', '<')

And so it goes. The .replace() string method is an obvious and long standing hammer, and I've used it to do any number of single-character replacements years (as well as some more complicated multi-character ones, such as replacing \r\n with \n ).

Recently I was working on my Exim attachment type logger , and more specifically I was fixing its handling of odd characters in the messages that it logged as part of making it work in python 3 . My Python 2 approach to this was basically to throw repr() at the problem and forget about it, but using repr() for this is a hack (especially in Python 3). As part of thinking about just what I actually wanted, I decided that I wanted control characters to be explicitly turned into some sort of clear representation of themselves. This required explicitly remapping and replacing them, and I needed to do this to a fair number of characters.

At first I thought that I would have to do this with .replace() (somehow) or a regular expression with a complicated substitution or something equally ugly, but then I ran across the Python 3 str.translate() method. In Python 2 this method is clearly very optimized but also only useful for simple things, since you can only replace a character with a single other character. In Python 3, .translate() has become much more general; it takes a dictionary of translations and the values in the dictionary don't have to be single characters.

So here's what my handling of control characters now looks like:

# ctrl-<chr> -> \0xNN escape ctrldict = {c: "\\x%02x" % c for c in range(0,32)} ctrldict[127] = "\\x7f" # A few special characters get special escapes ctrldict[ord("\n")] = "\\n"; ctrldict[ord("\r")] = "\\r"; ctrldict[ord("\t")] = "\\t" ctrldict[ord("\\")] = "\\\\" def dectrl(msg): return msg.translate(ctrldict)

That was quite easy to put together, it's pretty straightforward to understand, and it works. The only tricky bit was having to read up on how the keys for the translation dictionaries are not characters but the (byte) ordinal of each character (or the Unicode codepoint ordinal if you want to be precise). Once I found .translate() , the whole exercise was much less annoying than I expected.

Python 2's string .translate() still leaves me mostly unenthused, but now that I've found it, Python 3's has become an all purpose tool that I'm looking forward to making more use of. I have any number of habitual uses of .replace() that should probably become .translate() in Python 3 code. That you can replace a single character by multiple characters makes .translate() much more versatile and useful, and the simplified calling sequence is nice.

(Python 3's version merges the Python 2 deletechars into the translation map, since you can just map characters to None to delete them.)

PS: Having read the documentation a bit, I now see that str.maketrans() is the simple way to get around the whole ord() stuff that I'm doing in my code. Oh well, the original code is already written. But I'll have to remember maketrans() for the future.

(The performance and readability of .replace() versus .translate() is something that can be measured (for performance) and debated (for readability). I haven't made any performance measurements and I don't really care for most of my code. As far as readability, probably I'll conclude that .translate() wins if I'm doing more than one or two substitutions.)

分享一本Python进阶书籍

$
0
0
分享一本python进阶书籍

一点号编程派1小时前

国外 Python 博客不少,但经常更新的却不多。

Michael Driscoll 的 The Mouse vs. Python 算是其中一个。除了每周采访 Python 开发者,发布 PyDev of the Week 之外,还会定期写些入门教程和进阶教程。而且通过国外的自出版平台 Gumroad 和 Leanpub 将部分博文结集出版成书,包括 Python 101 和 近期的 Python 201。

Python 101 是一系列 Python 基础知识教程,Python 201 关注的则是 Python 中一些进阶库和功能。Python 201 也就是今天想分享给大家的书。


php?url=0Ej4YHnZox" alt="分享一本Python进阶书籍" />

编程派和 PythonTG 翻译组也曾经从这两个系列中选取部分文章进行翻译,大致有以下这些:

上述链接均可点击查看详细内容。

下面是一个好消息了:作者 10 月 17 日的时候将自己的新书 Python 201 短暂的免费了一段时间。因此也就有了今天和大家分享该书的推送。如果你习惯阅读 Python 英文书籍,可以在微信后台发送关键词“py201”获取本书的百度云分享链接。如果觉得读英文吃力的话,欢迎关注编程派后续的更新。

P.S 不知道网盘服务还能用多久呢。。。

欢迎转发至朋友圈。如无特殊注明,本公号所发文章均为原创或编译,如需转载,请联系「编程派」获得授权。

扫码关注编程派

Python 中的 property

$
0
0

使用 property 一般有以下两种方式

通过装饰器

class Demo(object): def __init__(self, val): self._x = val # 经过装饰器后,x 为 property对象 @property def x(self): # after some opertation return self._x # 调用 x 的setter方法(一个装饰器) 返回新的 property 对象 @x.setter def x(self, val): self._x = val @x.deleter def x(self): print 'del x'

通过创建 property 的实例

class Demo2(object): def __init__(self, val): self._x = val def getx(self): return self._x def setx(self, val): self._x = val def delx(self): print 'del x' x = property(getx, setx, delx)

下面是一个使用 __get__ __set__ 的等价 property 实现 参考自 http://pyzh.readthedocs.io/en/latest/Descriptor-HOW-TO-Guide.html

class Property(object): "Emulate PyProperty_Type() in Objects/descrobject.c" def __init__(self, fget=None, fset=None, fdel=None, doc=None): self.fget = fget self.fset = fset self.fdel = fdel self.__doc__ = doc def __get__(self, obj, objtype=None): if obj is None: return self if self.fget is None: raise AttributeError, "unreadable attribute" return self.fget(obj) def __set__(self, obj, value): if self.fset is None: raise AttributeError, "can't set attribute" self.fset(obj, value) def __delete__(self, obj): if self.fdel is None: raise AttributeError, "can't delete attribute" self.fdel(obj) def setter(self, fset): return type(self)(self.fget, fset, self.fdel, self.__doc__) def deleter(self, fdel): return type(self)(self.fget, self.fset, fdel, self.__doc__)

假如创建了 Demo 类的实例 demo

当调用 demo.x 时, 会调用 Property 的 __get__ 方法

此时的参数 obj 为 demo。通过调用注册好的 fget() 来返回 _x

一个奇怪的例子,

大概没人会这么写 class Demo3(object): def __init__(self, val): self._x = val self.x = property(self.getx, self.setx, self.delx) def getx(self): return self._x def setx(self, val): self._x = val def delx(self): print 'del x' demo = Demo3(0) demo.x # <property at 0x7f26e24ae890>

这是因为

demo.x 首先会调用 __getattribute__ 方法,在 Demo3 的字典中找 x ,如果 x 定义了 __get__ 方法,那么 x. get (demo, Demo3)

此过程等价于

type(demo).__dict__['x'].__get__(demo, type(demo))

那么我们尝试

demo.x.__get__(demo, type(demo)) TypeError: getx() takes exactly 1 argument (2 given)

这是因为我们通过实例去调用的 property ,那么我们调用 getx 时自然会和 self 绑定,所以这里多传了一个参数

所以这么改改

class Property(object): "Emulate PyProperty_Type() in Objects/descrobject.c" def __init__(self, fget=None, fset=None, fdel=None, doc=None): self.fget = fget self.fset = fset self.fdel = fdel self.__doc__ = doc def __get__(self, obj, objtype=None): if obj is None: return self if self.fget is None: raise AttributeError, "unreadable attribute" # 不需要传 obj return self.fget()

Do Not Snap

$
0
0
DoNotSnap
Do Not Snap

An experiment in detecting DoNotSnap badges in photos, to protect privacy.


Do Not Snap

This program allows you to detect and identify DoNotSnap badges via a sliding-window decision tree classifier (custom heuristics are used to reduce search space). The classifier is trained by matching samples against image templates using Affine-transform invariant SURF features.

You can find examples of using the classifier in classify.py and training a new classifier in train.py

A pre-trained classifier can be found in classifier.pkl Alternative versions of the same classifier are in classifier_alt_1.pkl and classifier_alt_2.pkl

Running classification

Run python classify.py <path-image-to-be-tested> This will deserialize the classifier from classifier.pkl and run it on the image you supplied. A sample image could be found in sample.jpg

Training your own classifier

Run python train.py <output-file> <total-number-of-samples> This will read the sample filenames from positive.txt and negative.txt files. Templates filenames are specified in templates.txt . A sample template could be found in template.png The output is a <output-file>.pkl with serialized classifier.

Dependencies opencv numpy sklearn matplotlib PIL

Create a Python Module

$
0
0

I wrote the module to parse an Oracle version string into what we’d commonly expect to see, like the release number, an “R”, a release version, and then the full version number. The module name is more or less equivalent to a package name, and the file name is effectively the module name. The file name is strVersionOracle.py , which makes the strVersionOracle the module name.

# Parse and format Oracle version. def formatVersion(s): # Split string into collection. list = s.split(".") # Iterate through the result set. for i, l in enumerate(list): if i == 0 and list[i] == "11": label = str(l) + "g" elif i == 0 and list[i] == "12": label = label + str(l) + "c" elif i == 1: label = label + "R" + list[i] + " (" + s + ")" # Return the formatted string. return label

You can put this in any directory as long as you add it to the python path. There are two Python paths to maintain. One is in the file system and the other is in Python’s interactive IDLE environment. You can check the contents of the IDLE path with the following interactive commands:

import sys print sys.path

It prints the following:

['', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib/python2.7/site-packages']

You can append to the IDLE path using the following command:

sys.path.append("/home/student/Code/python")

After putting the module in the runtime path, you can test the code in the IDLE environment:

import cx_Oracle db = cx_Oracle.connect("student/student@xe") print strVersionOracle.formatVersion(db.version)

Line 3 prints the result by calling the formatVersion function inside the strVersionOracle module. It prints the following:

11gR2 (11.2.0.2.0)

You can test the program outside of the runtime environment with the following oracleConnection.py file. It runs

# Import the Oracle library. import cx_Oracle import strVersionOracle try: # Create a connection. db = cx_Oracle.connect("student/student@xe") # Print a message. print "Connected to the Oracle " + strVersionOracle.formatVersion(db.version) + " database." except cx_Oracle.Error, e: # Print the error. print "ERROR %d: %s" % (e.args[0], e.args[1]) finally: # Close connection. db.close()

The oracleConnection.py program works when you call it from the Bash shell provided you do so from the same directory where the strVersionOracle.py file (or Python module) is located. If you call the oracleConnection.py file from a different directory, the reference to the library raises the following error:

Traceback (most recent call last): File "oracleConnection.py", line 3, in <module> import strVersionOracle ImportError: No module named strVersionOracle

You can fix this error by adding the directory where the strVersionOracle.py file exists, like

export set PYTHONPATH=/home/student/Code/python

Then, you can call successfully the oracleConnection.py file from any directory:

python oracleConnection.py

I hope this helps those trying to create and use Python modules.

Using Rollbar: Capturing and Logging All Python Exceptions

$
0
0

I recently wrote an article about tracking user exceptions using Rollbar. For the article, I implemented an experiment using python. While I think Rollbar is a fantastic tool, I also think that in some ways developers could take a better approach than inserting a call to rollbar.report_exc_info in the except block. This method only traps exceptions you are expecting to throw and ignores unintended exceptions. In addition, if the code is not in a try … except block, then the exception does not get reported to your Rollbar account.

Fortunately, there is another solution in Python for globally capturing exceptions. In this article, I’ll take a look at the problems that this approach solves, then explain how to implement it.

First, Some Problems

So let’s take a look at some simple example code. Granted, it is silly, but it makes the point:

importrollbar importsys defprint_message(obj): try: if not isinstance(obj.msg, str): raiseTypeError print(obj.msg) except (AttributeError, TypeError) as e: exc_type, exc_value, traceback = sys.exc_info() rollbar.report_exc_info((exc_type, exc_value, traceback)) # other code to handle or re-raise

Every try … except block will have to have a call to sys.exc_info and a call to Rollbar to report the exception. This ends up littering logging code throughout your code base. While many developers may be fine with this, I find it gets in the way of reading the logic of the code.

Our above code will only phone home to Rollbar if there is an AttributeError or Type error. If another type of error is thrown, the exception is not sent to Rollbar. Consider the following code:

class SampleClass(object): def__init__(self, msg): self._msg = msg @property defmsg(self): raiseNameError

For the sake of the example, the class is sabotaged with a NameError in place of a legitimate reason to raise an exception. If we create an instance of SampleClass and access its msg property with our print_message function above, a NameError will be raised. However, no exception will be reported to Rollbar.

This implies that every time it is determined by the developers, a method or function should throw an exception not listed in the except block, and the code will have to be refactored to catch the new exception so it will be reported to Rollbar. This is an error-prone process we would wish to avoid.

Consider also that any code not wrapped in a try … except block will never report to Rollbar at all. This could possibly lead to developers overusing try … except in Python or implementing a bare except in hopes of catching everything.

A Solution

Fortunately, Python’s sys module provides sys.excepthook which allows developers to inject custom behavior into Python’s exception handling.

Here is an example of how it could be used:

importsys importrollbar ROLLBAR_POST_ACCESS_TOKEN = < yourtokenhere > rollbar.init(ROLLBAR_POST_ACCESS_TOKEN, 'production') defrollbar_except_hook(exc_type, exc_value, traceback): # Report the issue to rollbar here. rollbar.report_exc_info((exc_type, exc_value, traceback)) # display the error as normal here sys.__excepthook__(exc_type, exc_value, traceback) sys.excepthook = rollbar_except_hook

By replacing the default sys.excepthook with a custom functionality to log exceptions to Rollbar, we get a single place in our code, which will be called when any exception is raised. Note that sys also provides the excepthook dunder, which can be called to trigger the default behavior of sys.excepthook.

Let’s revisit our print_message function:

importsys defprint_message(obj): if not isinstance(obj.msg, str): raiseTypeError try: print(obj.msg) exceptAttributeErroras e: # other code to handle or re-raise

That cleans up rather nicely, and both expected and unexpected exceptions will be logged to Rollbar (rather than just the expected exceptions), while allowing developers to create custom handling for expected exceptions in the standard manner without extra code.

Further, if an exception is thrown by code not wrapped in a try … except block, it will trigger our custom behavior and log the exception to Rollbar.

In this article, we have taken a look at some of the problems with using Rollbar. We’ve looked at some of the issues around logging uncaught exceptions to Rollbar, and a possible solution.


Using Rollbar: Capturing and Logging All Python Exceptions
Do you think you can beat this Sweet post? If so, you may have what it takes to become a Sweetcode contributor...Learn More.

厉害了我的Python:STM32开发板制作红外加速度小车

$
0
0

python

现在无线控制已经成为了电子科学领域的主流,这次就来教大家做一个主流中的主流--无线控制的小车,先给大家看一下最终的成品演示图片:


厉害了我的Python:STM32开发板制作红外加速度小车

首先介绍一下需要用到的材料:

TPYBoardv102开发板两块

小车底盘一个

LORA无线模块两块

充电宝一个

9014三极管两个(为什么用到它呢,后面再说)。

在这个开发板上有一个及速度传感器,我是看到开发板上有个加速度传感器才想起来这样做的,这里的呢我们先介绍一下加速度传感器。(注:本人用的是下面的开发板,大家可根据个人喜好自行选择,这里仅以此板子为例)

关于TPYBoard

TPYBoard是遵守MIT协议由TurnipSmart公司制作的一款MicroPython开发板,它是基于STM32F405单片机,通过USB接口进行数据传输。该开发板内置4个LED灯、一个加速传感器、时钟模块,可在3V-10V之间的电压正常工作。


厉害了我的Python:STM32开发板制作红外加速度小车

TPYBoard能让用户通过Python轻松实现对单片机的控制,让更多的计算机初学者来动手做硬件,用户完全可以通过Python脚本语言实现硬件底层的访问和控制,比如说控制LED灯泡、LCD显示器、读取电压、控制电机、访问SD卡等。

简而言之,TPYBoard通过Python脚本语言实现单片机的控制,单片机能做什么就等于TPYBoard能做什么。


厉害了我的Python:STM32开发板制作红外加速度小车

加速度传感器,包括由硅膜片、上盖、下盖,膜片处于上盖、下盖之间,键合在一起;一维或二维纳米材料、金电极和引线分布在膜片上,并采用压焊工艺引出导线;工业现场测振传感器,主要是压电式加速度传感器。其工作原理主要利于压电敏感元件的压电效应得到与振动或者压力成正比的电荷量或者电压量。目前工业现场典型采用IEPE型加速度传感器,及内置IC电路压电加速度传感器,传感器输出与振动量正正比的电压信号,例如:100mV/g(每个加速度单位输出100mV电压值。1g=9.81m/s-2)。

关于上面的介绍你是不是没看懂?没看懂也没关系,那是我参照官方的介绍写的,其实我也看不懂。其实通俗的说吧,加速度传感器就是通过测量由于重力引起的加速度,你可以计算出设备相对于水平面的倾斜角度。通过分析动态加速度,你可以分析出设备移动的方式。是不是还是不太懂怎么获取这个倾斜的值?那也没关系,我们的Python语言里有获得这个倾斜值的函数,直接使用就可以啦。但是这里值得注意的是,这个函数返回的倾斜度是一个值,每一个传感器因为做工时的差异,返回值不同,这个需要大家自己做实验看一下。

得到倾斜值后,下面的工作的就简单了,那就是判断板子在怎么倾斜,然后把倾斜的信号传出去,这样就OK啦,妥妥哒。

介绍完了这控制端的,那咱们得说说怎么把控制的信号传出吧。这里呢主要是使用了lora模块,这个模块现在还是挺流行的。我亲自去做过一个传输距离的实验,具体的距离我没测,但是我感觉最起码也得有个二三里地吧,这距离对于做个小车妥妥哒够用啦。

说一下lora模块的使用吧,lora模块的使用呢,也很简单,串口通信,无线透传。就是说你使用单片机通过串口给模块什么,模块就给你传输什么(定点的话需要带上地址信道),这个lora模块说明说的很详细。但是是不是觉得还要用串口,感觉好麻烦?我也觉得麻烦,但是Python语言和这个开发板的功能都很强大,有一个写好的使用串口的方法,直接调用就可以(瞬间感觉开发好简单啦)。

上面介绍了控制端的工作和原理,下面说一下被控制端(就是按在小车上的)。

被控制端就是要使用开发板控制小车地盘的电机转动,这里被我被坑了一次,我在某宝上买这架车的时候,问了客服需不需要其他的东西,客服说不用。我感觉现在连电机的驱动都不用啦,感觉好高端,但是买回来发现还是需要一个L298N驱动。瞬间感觉被骗了,但是,悲愤的同时,我的两个9014上场了,简单的做了一个三极管开关电路,妥妥哒(虽然速度略慢)。

信号接收部分,这个和控制端差不多的,都是使用了lora模块,然后把收到的数据做判断。判断后再按照自己的逻辑驱动电机,小车就开起来了(小车怎么拐弯的我就不介绍了,网上教程大把多)。

上面说了这么多,其实也很抽象啦,下面来个聚象的,上图。先上一个自己画的简单的原理图。


厉害了我的Python:STM32开发板制作红外加速度小车

控制端


厉害了我的Python:STM32开发板制作红外加速度小车

被控制端

这两张图是我画来帮助大家理解的(我这样做的被控制端的电路,速度略慢。大家可以在驱动那里做个放大电路,速度可以上去的,但是不能后退,大家可以直接使用L298N驱动。),我做的时候是使用杜邦线的,并没有电路图,再上一张成品图给大家看。


厉害了我的Python:STM32开发板制作红外加速度小车

控制端源代码:

import pyb
xlights = (pyb.LED(2), pyb.LED(3))
ylights = (pyb.LED(1), pyb.LED(4))
from pyb import UART
from pyb import Pin
#from ubinascii import hexlify
from ubinascii import *
accel = pyb.Accel()
u2 = UART(2, 9600)
i=0
K=1

*******************************主程序**********************************

print('while')
while (K>0):
_dataRead=u2.readall()
if(1>0):
x = accel.x()
print("x=")
print(x)
if x > 10:
xlights[0].on()
xlights[1].off()
u2.write('\x00\x05\x18YOU')
#pyb.delay(1000)
print('\x00\x01\x18YOU')
elif x < -10:
xlights[1].on()
xlights[0].off()
u2.write('\x00\x05\x18ZUO')
print('\x00\x01\x18ZUO')
#pyb.delay(1000)
else:
xlights[0].off()
xlights[1].off()
y = accel.y()
print("y=")
print(y)
if y > 15:
ylights[0].on()
ylights[1].off()
#u2.write('\x00\x05\x18HOU')
#pyb.delay(1000)
#print('\x00\x01\x18HOU')
elif y < -15:
ylights[1].on()
ylights[0].off()
u2.write('\x00\x05\x18QIAN')
#pyb.delay(1000)
print('\x00\x01\x18QIAN')
else:
ylights[0].off()
ylights[1].off()
pyb.delay(10)

被控制端源代码:

import pyb
from pyb import UART
from pyb import Pin
from ubinascii import hexlify
from ubinascii import *
M1 = Pin('X1', Pin.OUT_PP)
M3 = Pin('Y1', Pin.OUT_PP)
u2 = UART(2, 9600)
i=0
K=1

*******************************主程序**********************************

print('while')
while (K>0):
M1.high()
pyb.delay(3)
M3.high()
if(u2.any()>0):
print('1234')
M1.low()
M3.low()
pyb.delay(3)
_dataRead=u2.readall()
print('123',_dataRead)
if(_dataRead.find(b'QIAN')>-1):
M1.low()
M3.low()
print('QIAN')
pyb.delay(250)
elif(_dataRead.find(b'ZUO')>-1):
M1.low()
M3.high()
print('ZUO')
pyb.delay(250)
elif(_dataRead.find(b'YOU')>-1):
M1.high()
M3.low()
print('ZUO')
pyb.delay(250)

pyJacqQ: Python Implementation of Jacquez's Q-Statistics for Space-Time Clusteri ...

$
0
0
Authors: Saman Jirjies, Garrick Wallstrom, Rolf U. Halden, Matthew Scotch Title: pyJacqQ: python Implementation of Jacquez's Q-Statistics for Space-Time Clustering of Disease Exposure in Case-Control Studies Abstract: Jacquez's Q is a set of statistics for detecting the presence and location of space-time clusters of disease exposure. Until now, the only implementation was available in the proprietary SpaceStat software which is not suitable for a pipeline linux environment. We have developed an open source implementation of Jacquez's Q statistics in Python using an object-oriented approach. The most recent source code for the implementation is available at https://github.com/sjirjies/pyJacqQ under the GPL-3. It has a command line interface and a Python application programming interface. Page views: : 0. Submitted: 2015-06-19. Published: 2016-10-20. Paper: pyJacqQ: Python Implementation of Jacquez's Q-Statistics for Space-Time Clustering of Disease Exposure in Case-Control StudiesDownload PDF (Downloads: 0) Supplements: pyJacqQ.zip: Python source code Download (Downloads: 0; 638KB) v74i06-replication.zip: Replication materials Download (Downloads: 0; 2KB) DOI: 10.18637/jss.v074.i06
pyJacqQ: Python Implementation of Jacquez's Q-Statistics for Space-Time Clusteri ...
This work is licensed under the licenses
Paper: Creative Commons Attribution 3.0 Unported License
Code: GNU General Public License (at least one of version 2 or version 3 ) or a GPL-compatible license .

Lintel Technologies: Writing shorthand statements in python

$
0
0

python is havingshorthand statements and shorthand operators. These things will help you write more logic with less number of statements.

We will see those available shorthand statements.

lambda statement

Probably every body is aware of the lambda functions. The statement lambda is helpful to write single line functions with out naming a function. This will return the function reference where you can assign it to any arbitrary variable. It’s more like javascript anonymous functions.

>>> foo = lambda a: a+3 >>> foo(3) 6 >>> foo(8) 11 >>> >>> Self called Lambda

You can write the lambda and you can make it call it self like self-invoking functions in javascript. Let’s see an example,

>>> (lambda a: a+3)(8) 11 >>> (lambda x: x**x)(3) 27 >>> >>> List Comprehension

List Comprehension is the great feature that python is having. Using this feature you can reduce the lot of code, you can reduces space complexity of the code. Simple for loops can be written using list comprehension.

Syntax:

L = [mapping-expression for element in source-list if filter-expression ]

Where:

L Variable, result gets assigned to

mapping-expression Expression, which is executed on every loop if only filter-expression in if condition resolved as True

This list comprehension is equivalent to,

>>> result = [] >>> for elementin source-list: ...if filter-expression: ...result.append(mapping-expression) ... ... Example

Lets see list comprehension example. Get even number from the given range.

Usual code

>>> result = [] >>> for i in range(10): ...if i%2 == 0: ...result.append(i) ... >>> print result [0, 2, 4, 6, 8] >>>

List Comprehension

>>> >>> [i for i in range(10) if i%2==0] >>> [0, 2, 4, 6, 8] >>> Dict Comprehension Dict comprehension is available in python 2.7 and 3.x. This syntax will provide you the way to encapsulate several lines you use to create dictionaries into one line. It’s is similar to list comprehension but we use dict literals {} instead of []

Syntax:

{key:valuefor element in source-list if filter-expression }

Let’s how we use it by an example,

I have alist of fruits, I want to make it dictionary by changing their case

[‘APPLE’:, ‘MANGO’, ‘ORANGE’]

I want to convert all keys into lower case. This is we would do with out using comprehension

>>> l = ['MANGO', 'APPLE', 'ORANGE'] >>> >>> d = {} >>> for i in l: ...d[i.upper()] = 1 ... >>> >>> d {'ORANGE': 1, 'MANGO': 1, 'APPLE': 1}

Using Simple list comprehension,

{i.upper(): 1 for i in l}

Set Comprehension

Set comprehension syntax is very much similar to dict comprehension with a small difference.

Let’s consider dict comprehension example. Using following statement you generate set

{i.upper() for i in l}

Where we haven’t specified value like we do in dict comprehension

Generator Expression

You might have already know about generators. Any function which contains yield statment is called generator. generator gives iterable where we can call next method to get the next item in the sequence. Python got short notation for this generators like lambda. It is same as list comprehension but weenclose the expression with touple literals instead.

Generator Function def gen(): for i in range(10): yield i >>>g = gen() <generatorobject genat 0x7f60fa104410> >>>g.next() 0 >>>g.next() 1 Generator Expression

Same generator function can written as follow,

>>> g = (i for i in range(10)) >>> g <generatorobject <genexpr> at 0x7f60fa1045f0> >>> g.next() 0

:wink:

Shorthand If Else

Like C and javascript ternary operator (?:) you can write short hand if-else comparison. By taking readability into account we have following syntax in python

if-expression if ( condition ) else else-expression

This is equivalent to,

if condiction:

if-expression

else:

else-expression

Tuple Unpacking

Python 3 even more powerful unpacking feature. Here it is,

Example: a, rest = [1, 3, 4, 6] In this case, a will get 1 and rest of the list will get assigned to variable rest. i.e [3, 4, 6] String Concatenation with delimiter

If you want to concatenate list of strings with some random delimiter. You can do that by using string method join

>>>" || ".join(["hello", "world", "how", "are", "you"]) >>>'hello || world || how || are || you'

Analyzing social networks using Python and SAS Viya

$
0
0

Analyzing social networks using Python and SAS Viya
The study of social networks has gained importance over theyears within social and behavioral research on HIV and AIDS. Social network research can show routes of potential viral transfer, and be used to understandthe influence of peer norms and practices on the risk behaviors of individuals.

This example analyzes the results of a study of high-risk drug use for HIV prevention in Hartford, Connecticut , using python and SAS. This social network has 194 nodes and 273 edges, which represent drug usersand the connections between those users.

Background

SAS support for network analysis has been around for a while. In fact, I have shownrelated techniques usingSAS Visual Analyticsin myprevious post. If you are new to social network analysis you may want toreview theblog first as it provides a great introduction into the world of networks.

This post is written for the application developeror data scientist who has programming experience and seeks self-service access to comprehensive analytics. I will highlight how to gain access to SAS Viya TM using REST API in Python as well as demonstrate how to drive a simple analytical pipeline to analyse a social network.

The recent release ofSASViyaprovides a full set of innovative algorithms and proven analytical methods for exploring experimental questions but it's also built based on an open architecture. This means youcan integrate SAS Viya seamlessly into yourapplication infrastructure as well as drive analytical models using any programming language. This blog post highlights one example of how this openness can be used to access powerful SAS analytics.

Prerequisites

While you could go ahead and simply issue a series of REST API calls to access the data it's typically more efficient to use a programming language to structure your work and make it repeatable. I decided to use Python, as it's very popular among young data scientists and very common in universities.

For demonstration purposes, I'm using an interface called Jupyter ,an open and interactive web-based platform capable of running Python code as well as embed markup text. The SAS community also hosts many additional examples for accessing SASdata with Jupyter. In fact, Jupyter supports many different programming languages, including SAS. You may also be interested in trying out the related SAS kernel .

After installing Jupyter you will also need to install the SAS Scripting Wrapper for Analytics Transfer (SWAT) . This package is the Python client to SAS Cloud Analytic Services (CAS). It allows users to execute CAS actions and process the results all from Python. SWAT package information and Jupyter Notebook examples for getting started are also available from https://github.com/sassoftware .

Accessing SAS Cloud Analytic Services (CAS)

The core of SAS Viya is the analytical run-time environmentcalled SAS Cloud Analytic Services (CAS). In order for you to execute actions or access data, a connection session is required. You can either use a binary connection (which is recommended for transferring large amount of data) or use REST API via HTTP or HTTPS communication. Since I'm analyzing a very small network for demonstration purposes I will use the REST protocol. More information about Viya and CAS can be found in the relatedonline documentation.

One of the first steps in any program is to define the libraries you are going to use. In Python, this is done using the import statement. Besides the very common matplotlib library, I'm also going to use networkx to render and visualize the network graphs in Python.

from swat import * import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib.colors as colors # package includes utilities for color ranges import matplotlib.cm as cmx import networkx as nx # to render the network graph %matplotlib inline

Now the SWAT libraries have been loaded we can issue the first command to connect to CAS and create a session for the given user. Note, that parameters used will vary dependent on your environment. The variable "s"will hold the session object and will be referenced in future calls.

s = CAS('http://sasviya.mycompany.com:8777', 8777, 'myuser', 'mypass')

Action sets

The CAS server organizes analytical actions into action sets. An action set can hold many different actions from simple data or sessionmanagement tasks to sophisticated analytical tasks. For this network analysis I'm going to use an action set named hyperGroup that has only one action, also called hyperGroup.

s.loadactionset('hyperGroup')

Loading data

In order to perform any analytical modelling, we need data. We have several options to load dataincluding using an existing data set on the server or uploading a new set from the local environment. The SAS community web sites showsadditional examples how data can be loaded. The following examples uploads a local CSV file to the server and stores data into a table named DRUG_NETWORK . The table has only two columns FROM and TO of type numeric.

out = s.upload("data/drug_network.csv", casout=dict(name='DRUG_NETWORK', promote = True))

During analytical modelling you often have to change data structures, filter or merge data sources. The following code lines show an example of how to execute SAS Data Step code and derive new columns. The put function here converts both numeric columns to new character columns SOURCE and TARGET .

sasCode =

Lintel Technologies: Python metaclasses explained

$
0
0

python is having obscure wonderfulmechanism of classes and it’s implementation. In python every thing is an object. If you define a class, that class it self is an object in memory and it is an instance of other class . You may call it as a class object. If you instantiate it you will get brand new object called instance( instance of class).

Probably, meta classes are sort of confusing concept. Many people are afraid of this concept, but believe me it is very nice and simple if you understand it.

In simple words, meta classes are classes which are responsible to create a class object( in memory) .

As mentioned above, when you define a class. A class object will be created in memory but behind the senses it is an instance of other class by default type. So, classes are created by meta classes. You can specify your custom meta class which is responsible to create a class.

Mostly, meta classes are used when you write APIs or frameworks. Like django uses metaclasses in models.


Lintel Technologies: Python metaclasses explained
Meta Classes

I would like to explain this concept by taking real world example, to get you good understanding. Let’s take a look at following picture.


Lintel Technologies: Python metaclasses explained
Python MetaClasses Illustrated

As show in picture, A factory can be treated a metaclass. Where, it producesvending machine, can be considered as a class. Where, the class (vending machine) produces instances, which are cans, bottles etc.

An example representing normal class definition and instance creation

>>> class Foo(object): ...a = 3 ... >>> >>> Foo <class '__main__.Foo'> >>> >>> i = Foo() >>> i <__main__.Fooobject at 0x110126910> >>> >>> isinstance(i, Foo) True >>> isinstance(Foo, type) True

Python is having builtin function called isinstance. Using this function we can determine if any object is an instance of specified object.

In the aboveexamplewe haven’t specifiedthe metaclass. So, python would use default one. i.e type.

In python 2, we specify metaclass using magic method __metaclass__.

__metaclass__ can be specified either as a class attribute or global variable. When you specify it globally in module level, all classes will become instances of this metaclass. If you specify this as a class attribute, only that class will become instance of specified metaclass. In python3 this__meatclass__ attribute is removed in favour of using metaclass argument that you have to specify in class definition.

Implementation of simpleclass with metaclass in place

>>> class MetaClass(type): ...pass ... >>> >>> class Foo(object): ...__metaclass__ = MetaClass ...pass ... >>> >>> i = Foo() >>> >>> isinstance(i, Foo) True >>> isinstance(Foo, MetaClass) True >>> isinstance(MetaClass, type) True >>>

Here in this example, you can see class Foo is the instance of metaclass MetaClass . Thus, Foo gets created as an instance of MetaClass. And, i is an instance of class Foo. As you can see, we confirmed this relation using isinstance function.


Lintel Technologies: Python metaclasses explained
Python metaclasses instance, class and metaclass

There is no restriction over how many times one class can be used as a metaclass for other classes.

>>> class MetaClass(type): ...pass ... >>> >>> class Foo(object): ...__metaclass__ = MetaClass ... >>> >>> class Bar(object): ...__metaclass__ = MetaClass ... >>> >>> >>> i = Foo() >>> j = Bar() >>> >>> isinstance(i, Foo) True >>> isinstance(j, Bar) True >>> isinstance(Foo, MetaClass) True >>> isinstance(Bar, MetaClass) True >>>

In the above example, the meta class MetaCalss is used in both Foo and Bar classes. Like wise you can use it in as many classes you want. If you want to apply it for all module level classes. You better use global __metaclass__ attribute. As you can see in the following example

""" This is a simple module to demostrate global __metaclass__ attribute """ class MetaClass(type): pass __metaclass__ = MetaClass# This will affect all class in module class Foo(): pass class Bar(): pass class NewStyleClass(object): pass print "is instance of Foo: MetaClass:: %s" % isinstance(Foo, MetaClass) print "is instance of Bar: MetaClass:: %s" % isinstance(Bar, MetaClass) print print "is instance of NewStyleClass: MetaClass:: %s" % isinstance(NewStyleClass, MetaClass) OutPut: pythonglobal_metaclass.py is instanceofFoo: MetaClass:: True is instanceofBar: MetaClass:: True is instanceofNewStyleClass: MetaClass:: False

This module level __metaclass__ magic variable doesn’t work on new style classes as show in above code.

Python Metaclasses Key Notes Metaclasses are callables Subclasses inherit the metaclass Restriction over multiple metaclasses inmultiple inheritance MetaClasses are callable

The MetaClass always need not be a class. You can use any callable as a metaclass. A Simple example demonstrates using callable (function) as a metaclass

>>> def metaClass(name, bases, d): ...print "CreatingClass %s" % name ...c = type(name, bases, d) ...return c ... >>> >>> >>> class Foo(object): ...pass ... >>> >>> class Foo(object): ...__metaclass__ = metaClass ... CreatingClass Foo >>> >>> class Bar(object): ...__metaclass__ = metaClass ... CreatingClass Bar >>> >>> Foo <class '__main__.Foo'> >>> Bar <class '__main__.Bar'> >>>

If you are using callable as ametaclass class. It should have same signature(arguments) as type. That is,

type(nameoftheclass, tupleoftheparent class (for inheritance, canbeempty), dictionarycontainingattributesnamesand values)

Following example is useless but explanatory. You can hack the class creation using metaclass. You can see below example, Foo became 3 instead of class object as we did return 3 from metaclass.

>>> def makeClass(name, bases, d): ...print name, bases, d ...return 3 ... >>> >>> class Foo(object): ...__metaclass__ = makeClass ... ... >>> >>> Foo 3 Subclasses inherit the metaclass

Like all other attributes and methods subclasses inherit metaclass.

class M1(type): def __new__(meta, name, bases, atts): print "Meta Class M1 called for class " + name return super(M1, meta).__new__(meta, name, bases, atts) class Base(object): __metaclass__ = M1 class Sub(Base): pass OutPut: pythonmetaclass_inheritance.py MetaClass M1calledfor class Base MetaClass M1calledfor class Sub #TODO Restriction over multiple metaclasses in inheritance Classes can have multiple base classes. Those maetaclasses may have different metaclass. If s

Import Python: ImportPython Issue 95

$
0
0

Worthy Read

Fixing python Performance with Rust

performance

Excellent post from Armin Ronacher on tackling a CPython performance bottleneck with a custom Rust extension module.

How to create read only attributes and restrict setting attribute values on object in python ?

core python

There are different way to prevent setting attributes and make attributes read only on object in python. We can use any one of the following way to make attributes readonly. 1) Property Descriptor 2) Using descriptor methods __get__ and __set__ 3) Using slots (only restricts setting arbitary attributes).

Filestack - Upload Files From Anywhere.

The API for file uploads. Integrate Filestack in 2 lines of code. Python library for Filestack https://github.com/filepicker/filepicker-python

Sponsor

How to Deploy a Django Application to Digital Ocean

deployment

In this tutorial we will be deploying https://github.com/sibtc/urban-train ,a empty Django project I created to illustrate the deployment process.

Asynchronous Scraping with Python

Scraping is often an example of code that is embarrassingly parallel. With some slight changes, our tasks can be done asynchronously, allowing us to process more than one URL at a time. In version 3.2, Python introduced the concurrent.futures module, which is a joy to use for parallelizing tasks like scraping. The rest of this post will show how we can use the module to make our previously synchronous code asynchronous.

Weekly Python Chat: Class-Based Views in Django

video

Most Django programmers use function-based views, but some use class-based views. Why? Special guest Buddy Lindsey will be joining us this week to talk about how class-based views are different.

Talk Python to Me: #80 TinyDB: A tiny document db written in Python

podcast

I'm excited to introduce you to Markus Siemens and TinyDb. This is a 100% pure python, embeddable, pip-installable document DB for Python.

Handling statuses in Django

django

,

finite state machine

Whether you're building up a CMS or a bespoke application, chances are that you will have to handle some states / statuses. Let's discuss your options in Django.

JIRA

IT Help Desk & Ticketing. Start a free trial of JIRA Service Desk and get your free Konami Code shirt.

Sponsor

Upgrading Django - Never Clever

django

General Guidelines when upgrading Django.

My Startling Encounter With Python Debuggers (Part 2)

debugging

Benoit writes about debugging his software using gdb, python-debuginfo.

Yoda on python dependency

humor

Check the tweet :)

lptrace

opensource project

lptrace is strace for Python programs. It lets you see in real-time what functions a Python program is running. It's particularly useful to debug weird issues on production.

Static types in Python, oh my(py)!

mypy

In this post, I’ll explain how mypy works, the benefits and pain points we’ve seen in using mypy, and share a detailed guide for adopting mypy in a large production codebase (including how to find and fix dozens of issues in a large project in the first few days of using mypy!).

sanic

web server

Python 3.5+ web server that's written to go fast

Great Dev - Meet Great Jobs

Try Hired and get in front of 4,000+ companies with one application. No more pushy recruiters, no more dead end applications and mismatched companies, Hired puts the power in your hands.

Sponsor

CTO / Lead Developer at Patch

Hoxton, City of London, London, United Kingdom

Patch are hiring a CTO / Lead developer. We are expanding our tech team as part of scaling the company. This is an opportunity to make a big impact on our E- commerce platform and help shape the new services we’re creating.

Upcoming Conference / User Group Meet

Inland Empire Pyladies


Import Python: ImportPython Issue 95

PyKla Monthly meetup


Import Python: ImportPython Issue 95

PyCon CZ 2016


Import Python: ImportPython Issue 95

PyCon Finland 2016


Import Python: ImportPython Issue 95

PyCon Ireland 2016


Import Python: ImportPython Issue 95

PyCon Canada 2016


Import Python: ImportPython Issue 95

PyHPC 2016


Import Python: ImportPython Issue 95

PyCon Jamaica 2016


Import Python: ImportPython Issue 95

Projects

fast-neural-style.tf - 17 Stars, 6 Fork

Feed-forward neural network for real-time artistic style transfer. Curator's Note - This is a pretty cool project.

TextSum - 8 Stars, 1 Fork

Preparing a dataset for TensorFlow text summarization (TextSum) model.

countrynames - 5 Stars, 0 Fork

Utility library to turn country names into ISO two-letter codes.

celery-redundant-scheduler - 4 Stars, 0 Fork

Celery beat sheduler provides ability to run multiple celerybeat instances.

SlackUptimeMonitor - 3 Stars, 3 Fork

Receive notifications in Slack when your websites/api/services are down

confluence-dumper - 3 Stars, 0 Fork

Tool to export Confluence spaces and pages recursively via its API

asyncio-nats-streaming - 3 Stars, 0 Fork

A asyncio library for NATS Streaming.

ansible作为python模块库使用

$
0
0

Asible是运维工具中算是非常好的利器,我个人比较喜欢,可以根据需求灵活配置yml文件来实现不同的业务需求,因为不需要安装客户端,上手还是非常容易的,在某些情况下你可能需要将ansible作为python的一个库组件写入到自己的脚本中,今天的脚本脚本就将展示下ansible如何跟python脚本结合,也就是如何在python脚本中使用ansible,我们逐步展开,先看第一个例子:

#!/usr/bin/python importansible.runner importansible.playbook importansible.inventory fromansibleimportcallbacks fromansibleimportutils importjson # the fastest way to set up the inventory # hosts list hosts = ["10.11.12.66"] # set up the inventory, if no group is defined then 'all' group is used by default example_inventory = ansible.inventory.Inventory(hosts) pm = ansible.runner.Runner( module_name = 'command', module_args = 'uname -a', timeout = 5, inventory = example_inventory, subset = 'all' # name of the hosts group ) out = pm.run() printjson.dumps(out, sort_keys=True, indent=4, separators=(',', ': '))

这个例子展示我们如何在python脚本中运行如何通过ansible运行系统命令,我们接下来看第二个例子,跟我们的yml文件对接,简单的yml文件内容如下:

- hosts: sample_group_name tasks: - name: justanuname command: uname -a

调用playbook的python脚本如下:

#!/usr/bin/python importansible.runner importansible.playbook importansible.inventory fromansibleimportcallbacks fromansibleimportutils importjson ### setting up the inventory ## first of all, set up a host (or more) example_host = ansible.inventory.host.Host( name = '10.11.12.66', port = 22 ) # with its variables to modify the playbook example_host.set_variable( 'var', 'foo') ## secondly set up the group where the host(s) has to be added example_group = ansible.inventory.group.Group( name = 'sample_group_name' ) example_group.add_host(example_host) ## the last step is set up the invetory itself example_inventory = ansible.inventory.Inventory() example_inventory.add_group(example_group) example_inventory.subset('sample_group_name') # setting callbacks stats = callbacks.AggregateStats() playbook_cb = callbacks.PlaybookCallbacks(verbose=utils.VERBOSITY) runner_cb = callbacks.PlaybookRunnerCallbacks(stats, verbose=utils.VERBOSITY) # creating the playbook instance to run, based on "test.yml" file pb = ansible.playbook.PlayBook( playbook = "test.yml", stats = stats, callbacks = playbook_cb, runner_callbacks = runner_cb, inventory = example_inventory, check=True ) # running the playbook pr = pb.run() # print the summary of results for each host printjson.dumps(pr, sort_keys=True, indent=4, separators=(',', ': '))

今天展示2个小例子希望对大家有帮助。

福利下载, ansible for devops.pdf

TinyDB: 使用Python编写的小型文档数据库

$
0
0

TinyDB是使用纯python编写的NoSQL数据库,它和SQLite数据库对应。SQLite是小型、嵌入式的关系型数据库,而TinyDB是小型、嵌入式的NoSQL数据库,它不需要外部服务器也没有任何依赖;使用json文件存储数据。

TinyDB源代码: https://github.com/msiemens/tinydb TinyDB文档: http://tinydb.readthedocs.io/en/latest/

如果你需要一个简单的面向文档的数据库,又不用配置,TinyDB可能正是你的选择。

安装TinyDB:

pipinstalltinydb

测试代码:

from tinydbimport TinyDB, Query, where db = TinyDB('db.json') # 插入两条数据 db.insert({'name': 'John', 'age': 22}) db.insert({'name': 'apple', 'age': 7}) # 输出所有记录 print(db.all()) # [{u'age': 22, u'name': u'John'}, {u'age': 7, u'name': u'apple'}] # 查询 User = Query() print(db.search(User.name == 'apple')) # [{u'age': 7, u'name': u'apple'}] # 查询 print(db.search(where('name') == 'apple')) # 更新记录 db.update({'age': 10}, where('name') == 'apple') # [{u'age': 10, u'name': u'apple'}] # 删除age大于20的记录 db.remove(where('age') > 20) # 清空数据库 db.purge()

如果TinyDB不能满足你的需求,你可以试试功能更强大 CodernityDB 。 CodernityDB是一个开源的纯 Python实现的、无第三方依赖、支持多平台的 NoSQL 数据库。

Share the post "TinyDB: 使用Python编写的小型文档数据库"

Google+ Weibo Email

读书 《 几何原本 》成书于公元前三百年左右,全书十三卷,是欧几里得将古希腊数学集大成的著作,包括了希腊科学数学家:泰利斯、毕达哥拉斯、希波克拉提斯等人的成果。它既是一本数学著作,也是哲学巨著,标志着人类首次完成了对空间的认识。全书章节安排严谨,由定义、公设、设准、命题(定理)、证明,以及符号和图像所构成,《几何原本》被翻译成世界上几乎所有的文字,对人们理性推演能力的影响,即对人的科学思想的影响深刻且巨大。


Upgrading Django

$
0
0

For the second time in a few years I've found myself doing a number of Django upgrades. It's a good thing: I'm happy the framework I chose to base most of my work on when I went solo has stayed relevant. But this time I wanted to document some of the pain points to make things easier on anyone else going through the same.

First off, this round of updates has been a lot easier than the first: back then I was the point of contact for WebFaction clients who had a Django app but no longer had a developer and those orphaned projects trended ancient . I started working with Django in 2009 with Django 1.1 and that was modern compared to most of the WebFaction projects. The biggest challenge in that group was a project running on 0.96 and using psycopg . I'd been installing psycopg2 for so long at that point it never occurred to me there was a version before it. And the Internet felt the same way: obtaining a version of Django that old was a challenge (see this thread for how to get versions no longer listed in Pypi). Obtaining a copy of psycopg proved impossible (I cheated and wound up downloading the folder from the client's site-packages directory and using that to replicate the issue they were seeing-- definitely not a recommended approach).

General Guidelines Use a virtualenv or similar concept. If you're already doing so, great. If you are not, now is definitely the time to start. If you're doing anything more than a minor upgrade/ working on a project with no 3rd party libraries, you are going to run into some "dependency hell" where you've updated to Django 1.NEW and updated all your code to be compatible and then find out some libraries you use aren't compatible. The best case scenario is where you just need to update the libraries to their latest versions too, but you may have to play around with the versions to make it all work. Otherwise you need to figure out how to replace the library or fork it . Implicit in my virtualenv suggestion is that you also use pip to install packages and pip freeze to create a requirements file with the exact versions of the libraries your project uses. If you are updating from a truly old version of Django, try not to get too hung up on updating to the latest and greatest. You might have to shoot for something a little bit older because the incompatibilities are just too great or there isn't enough time right now to make it all work. In updating my old site from 1.2 to 1.10 I actually stopped at 1.6 for a while instead. This is another place virtualenv is your friend: at one point I had three local environments for my site, tkc (live), tkc16 and tkc110 . Once I finished the upgrade process, I deleted the first two and renamed tkc110 to tkc and it was like nothing ever happened. Be aware of the long-term release versions and the deprecation schedule when targeting a new version. If you are jumping from a really old version, do some reading on the release notes to see what new features are available to you. I'm focusing on problems you may run into and backwards-incompatible changes, but one of the major reasons to upgrade are all the improvements you get. Be aware of things like select_related , prefetch_related and only for making queries faster. Change from render_to_response to render . It will make things easier and the former is soon to be removed. If you get stuck because the changes to project layout or syntax have changed so much, try creating a new virtualenv and project with 1.10 (or similar) and then dragging your apps into the project. You will still need to update a bunch of stuff but it may make things a lot easier and help with your future-proofing. Version-Specific Notes 1.10 Major changes to middleware If you are using mysql (don't start now), Django recommends you turn on strict mode. See project note about why and this answer for how django.conf.urls.patterns is gone and all view references in urls.py need to be imported views, not string names. 1.9 The admin got a facelift! Password validation options django.contrib.sites is no longer included by default, which can cause a RuntimeError: Model class django.contrib.sites.models.Site doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS . Just add it back to INSTALLED_APPS . django.utils.log.NullHandler was removed, replace the references with logging.NullHandler 1.8 Major settings change: the new TEMPLATES setting replaces all TEMPLATE_* settings . This is a bit of PITA as you have to move a bunch of things into their new home in that setting. select_related actually checks that the fields exist on the model you are querying. Previously this just silently ignored the error and you were left thinking you'd improved performance when you weren't doing anything. django.contrib.formtools is replaced by an external app The syntax of urls.py files changed: you now have to have url(regex, view) instead of just (regex, view) , otherwise you will get AttributeError 'tuple' object has no attribute 'regex' . Make note of the changes in 1.10 to urls.py files if you're going to be updating all of them anyway. request.REQUEST was removed , but you weren't using that anyway, were you? Drops support for Postgres < 9.0 (and then 9.0 and 9.1 are dropped in .9 and .10) and MySQL < 5.5 1.7 Major update which makes database migrations a built-in part of Django instead of relying on 3rd party apps (usually South). See the differences here . App loading changes may mean needing to reorder or move some stuff. If you are using South, there's a guide to changing over . One thing that's not obvious: in addition to removing 'south' from your INSTALLED_APPS , you actually need to uninstall it from your virtualenv or you will get errors like There is no South database module 'south.db.postgresql_psycopg2' for your database. Default middleware changed The syntax/ imports of the .wsgi file project use changed. You may run into AppRegistryNotReady: Apps aren't loaded yet . See this StackOverfloew thread . RuntimeError: populate() isn't reentrant - there are a number of possible causes for this (and I'm not sure if there are all 1.7-specific). In my case it was a dumb error where I had an app listed twice in INSTALLED_APPS . 1.6 Not much new, simplified setup, swapped a couple of defaults in settings. At least that's how I remember it and its why I chose this as a halfway step in my own update. Actually there's one change that will bite you if you have any custom managers: get_query_set is now get_queryset . django.contrib.localflavor is gone, replaced by a 3rd-party app . Update your requirments file and your references. django.contrib.markup is gone. You will need a replacement. 1.5 Introduces a configurable User model . A user in the reddit discussion of this post said it caused them problems due to a subclassed User model in their project. The syntax of the {% url %} tag changed and now the url name has to be quoted. If you use an editor that supports multi-file search-and-replace, you can update all your templates easily. Pro-tip: do this in its own branch or similar for safety's sake. Introduces the ALLOWED_HOSTS setting . Make sure to do this because it will bite you: the setting only applies if DEBUG = False , so you won't run into errors locally but then your brand-new, updated site will not respond when you post it live. Deprecates django.utils.simplejson . Can do a search and replace to import json instead. Unrelated to any upgrade stuff, if you do a lot of JSON processing or a little bit of JSON processing on very big pieces of JSON, take a look at ujson . I've gotten a lot of free performance improvements from it. direct_to_template is gone. Use this solution in its place 1.4 Lots of good new stuff introduced in this version, the biggest being timezone support . The concept of ADMIN_MEDIA is replaced by static files app from 1.3 django.conf.urls.defaults is replaced by django.conf.urls . Update your urls.py files accordingly. Error: "No modules named six" - this is an annoying one I ran into with a couple 3rd party libraries. I was testing which versions I could easily update to by updating to 1.2.0, 1.3.0, 1.4.0, etc. The problem is six was introduced in Django 1.4.2 so don't update to anything less than that if you're going to use 1.4. 1.3 The big thing here is the introduction of staticfiles . If you've never dealt with that, it's a sea change and you will need to read the documentation about how to change things. Essentially, this was splitting up the idea of media (user-generated uploads) from your site's static assets. Class-based views are also added here. You're not obligated to use them but if you rely on a lot of generic views, you're going to need to update those. CSRF protection now applies to Ajax views as well.

使用 OpenCV 裁剪 PDF 页面

$
0
0

网上找到的一些课件讲义等资料有时候会是 PPT 多页打印而成的 PDF 文件,比如下图这种:


使用 OpenCV 裁剪 PDF 页面

有些人希望对 PDF 页面进行裁剪,将 PDF 还原为原 slides 那样一页一张演示文稿的形式。(其实我个人觉得没什么必要,因为不影响阅读,而且 PDF 格式读起来还不用频繁翻页了。)

前几天,我在知乎上看到了有这样需求的 一个问题 。

当时想到用 python 和 OpenCV 来做这样的图像处理小任务应该很简单的吧。于是动手撸了段代码,简单的 “边缘检测+轮廓提取” ,最后结果看起来还不错:


使用 OpenCV 裁剪 PDF 页面

下面给出 Python 代码并简单解释下。

1. 边缘检测

新建一个文件 pdfcrop.py ,加入如下代码:

importnumpyasnp importargparse importcv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i","--image", help="path to the image file") args = vars(ap.parse_args()) # 读取图片,进行边缘检测 image = cv2.imread(args["image"]) gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (5,5),0) edged = cv2.Canny(gray, 75,200) # 显示原图和边缘图像 cv2.imshow("Original Image", image) cv2.imshow("Edged Image", edged) cv2.waitKey() cv2.destroyAllwindows()

假设你的图片文件为 test.png ,执行:

pythonpdfcrop.py-itest.png

边缘检测结果:


使用 OpenCV 裁剪 PDF 页面
2. 轮廓提取

这里的思路很简单,我们要找的轮廓是矩形的(有四个顶点),而且应该是面积最大的前 N 个(上图N = 6)轮廓。使用 Python 和 OpenCV 实现起来也是非常简单,继续编辑 pdfcrop.py :

# 轮廓提取 (cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE) cnts = sorted(cnts, key=cv2.contourArea, reverse=True) poly_contours = [] forcincnts[0:6]: peri = cv2.arcLength(c, True) approx = cv2.approxPolyDP(c, 0.02* peri,True) iflen(approx) ==4: poly_contours.append(approx) contours_image = image.copy() forcontoursinpoly_contours: cv2.drawContours(contours_image, [contours], -1, (0,255,0),2) # 显示轮廓图像 cv2.imshow("Contours", contours_image) cv2.waitKey() cv2.destroyAllWindows()

轮廓提取的结果:


使用 OpenCV 裁剪 PDF 页面
3. 获取轮廓内的图像

最后只要保存上一步中检测到的矩形轮廓内部的图像就好了,在 pdfcrop.py 中加入:

# 分别显示矩形轮廓内的图像 index = 0 forcontoursinpoly_contours: rect = cv2.boundingRect(contours) cv2.imshow("test"+ str(index), image[rect[1]:rect[1]+rect[3], rect[0]:rect[0]+rect[2]]) index += 1 cv2.waitKey() cv2.destroyAllWindows()

上面的代码只能对图片文件进行处理,不能直接处理 PDF ,想要使用的话可以考虑利用其他软件先将 PDF 批量转换成图片文件。(或者找一找 Python 处理 PDF 的库?)

还有一个问题就是裁剪后 slides 的顺序,这里我没有处理,其实做起来也很简单,将提取出的矩形轮廓按照位置坐标进行排序就好了。

How To Implement Baseline Machine Learning Algorithms From Scratch With Python

$
0
0

It is important to establish baseline performance on a predictive modeling problem.

A baseline provides a point of comparison for the more advanced methods that you evaluate later.

In this tutorial, you will discover how to implement baseline machine learning algorithms from scratch in python.

After completing this tutorial, you will know:

How to implement the random prediction algorithm. How to implement the zero rule prediction algorithm.

Let’s get started.


How To Implement Baseline Machine Learning Algorithms From Scratch With Python

How To Implement Baseline Machine Learning Algorithms From Scratch With Python

Photo by Vanesser III , some rights reserved.

Description

There are many machine learning algorithms to choose from. Hundreds in fact.

You must know whether the predictions for a given algorithm are good or not. But how do you know?

The answer is to use a baseline prediction algorithm. A baseline prediction algorithm provides a set of predictions that you can evaluate as you would any predictions for your problem, such as classification accuracy or RMSE.

The scores from these algorithms provide the required point of comparison when evaluating all other machine learning algorithms on your problem.

Once established, you can comment on how much better a given algorithm is as compared to the naive baseline algorithm, providing context on just how good a given method actually is.

The two most commonly used baseline algorithms are:

Random Prediction Algorithm. Zero Rule Algorithm.

When starting on a new problem that is more sticky than a conventional classification or regression problem, it is a good idea to first devise a random prediction algorithm that is specific to your predictionproblem. Later you can improve upon this and devise a zero rule algorithm.

Let’s implement these algorithms and see how they work.

Tutorial

This tutorial is divided into 2parts:

Random Prediction Algorithm. Zero Rule Algorithm.

These steps will provide the foundations you need to handle implementing and calculating baseline performance for your machine learning algorithms.

1. Random Prediction Algorithm

The random prediction algorithm predicts a random outcome as observed in the training data.

It is perhaps the simplest algorithm to implement.

It requires that you store all of the distinct outcome values in the training data, which could be large on regression problems with lots of distinct values.

Because random numbers are used to make decisions, it is a good idea to fix the random number seed prior to using the algorithm. This is to ensure that we get the same set of random numbers, and in turn the same decisions each time the algorithm is run.

Below is an implementation of the Random Prediction Algorithm in a function named random_algorithm() .

The function takes both a training dataset that includes output values and a test dataset for which output values must be predicted.

The function will work for both classification and regression problems. It assumes that the output value in the training data is the final column for each row.

First, the set of unique output values is collected from the training data. Then a randomly selected output value from the set is selected for each row in the test set.

# Generate random predictions defrandom_algorithm(train, test): output_values = [row[-1] for rowin train] unique = list(set(output_values)) predicted = list() for rowin test: index = randrange(len(unique)) predicted.append(unique[index]) return predicted

We can test this function with a small dataset that only contains the output column for simplicity.

The output values in the training datasetare either “0” or “1”, meaning that the set of predictions the algorithm will choose from is {0, 1}. The test set also contains a single column, with no data as the predictions are not known.

fromrandomimportseed fromrandomimportrandrange # Generate random predictions defrandom_algorithm(train, test): output_values = [row[-1] for rowin train] unique = list(set(output_values)) predicted = list() for rowin test: index = randrange(len(unique)) predicted.append(unique[index]) return predicted seed(1) train = [[0], [1], [0], [1], [0], [1]] test = [[None], [None], [None], [None]] predictions = random_algorithm(train, test) print(predictions)

Running the example calculates random predictions for the test dataset and prints those predictions.

[0, 1, 1, 0]

The random prediction algorithm is easy to implement and fast to run, but we could do better as a baseline.

2. Zero Rule Algorithm

The Zero Rule Algorithm is a better baseline than the random algorithm.

It uses more information about a given problem to create one rule in order to make predictions. This rule is different depending on the problem type.

Let’s start with classification problems, predicting a class label.

Classification

For classification problems, the one rule is to predict the class value that is most common in the training dataset. This means that if a training dataset has 90 instances of class “0” and 10 instances of class “1” that it will predict “0” and achieve a baseline accuracy of 90/100 or 90%.

This is much better than the random prediction algorithm that would only achieve 82% accuracy on average. For details on how this is estimate for random search is calculated, see below:

= ((0.9 * 0.9) + (0.1 * 0.1)) * 100 = 82%

Below is a function named zero_rule_algorithm_classification() that implements this for the classification case.

# zero rule algorithm for classification defzero_rule_algorithm_classification(train, test): output_values = [row[-1] for rowin train] prediction = max(set(output_values), key=output_values.count) predicted = [predictionfor i in range(len(train))] return predicted

The function makes use of the max() function with the key attribute, which is a little clever.

Given a list of class values observed in the training data, the max() function takes a set of unique class values and calls the count on the list of class values for each class value in the set.

The result is that it returns the class value that has the highest count of observed values in the list of class values observed in the training dataset.

If all class values have the same count, then we will choose the first class value observed in the dataset.

Once we select a class value, it is used to make a prediction for each row in the test dataset.

Below is a worked example with a contrived dataset that contains 4 examples of class “0” and 2 examples of class “1”. We would expect the algorithm to choose the class value “0” as the prediction for each row in the test dataset.

fromrandomimportseed fromrandomimportrandrange # zero rule algorithm for classification defzero_rule_algorithm_classification(train, test): output_values = [row[-1] for rowin train] prediction = max(set(output_values), key=output_values.count) predicted = [predictionfor i in range(len(train))] return predicted seed(1) train = [['0'], ['0'], ['0'], ['0'], ['1'], ['1']] test = [[None], [None], [None], [None]] predictions = zero_rule_algorithm_classification(train, test) print(predictions) Running this example makes the predictions and prints them to screen. As expected, the class value of “0” was chosen and

Lazy tree walking made easy with coroutines

$
0
0

Let's imagine a simple unbalanced binary tree structure, in which an abstract BinaryTree<E> is either a concrete Node with a value attribute of type E and a left and right subtree, or a concrete Empty tree. With trees, one very common requirement is to traverse the nodes in some appropriate order (preorder, inorder, or postorder).

To make things more economical (and more interesting) let's state some restrictions on how we want the traversal to work:

We want to do it lazily. That is, we want to be able to short-circuit the traversal, visiting only as many nodes in the tree as we require to see. We'd actually like to do the traversal either with a Stream or with an Iterator , in the latter case calling next whenever we want the next node to be supplied. We'd prefer a single-threaded solution.

It's actually not trivial to do that in Java. With regard to iteration, you have to keep track of a lot state. (Cf. these explanations ) You might scrap requirement 3 and actually have two threads communicating over a blocking queue, but that also entails some complexity, especially with task cancellation on short-circuiting.

And even when restricting ourselves to stream-based processing, it doesn't quite work out as expected. Here's a proposed solution in a hypothetical BinaryTreeStreamer class:

/** * Supplies a postorder stream of the nodes in the given tree. */ public static <E> Stream<Node<E>> postorderNodes(BinaryTree<E> t) { return t.match( empty -> Stream.<Node<E>> empty(), node -> concat(Stream.of(node.left, node.right).flatMap(BinaryTreeStreamer::postorderNodes), Stream.of(node))); }

The corresponding inorder- or preorder-traversals would be similar. The technique for structural pattern-matching with method BinaryTree#match() goes back to Alonzo Church and is explained in more detail on Rúnar Bjarnason's blog . Basically, each subclass of BinaryTree applies the appropriate function to itself, i. e. Empty invokes the first argument of match , and Node the second.

The code above looks quite reasonable, but unfortunately it is broken by the same JDK feature/bug that I mentioned over a year ago in this post . Embedded flatMap just isn't lazy enough, and breaks short-circuiting. Suppose we construct ourselves a tree representing the expression (3 - 1) * (2 + 4 * 5). I'll use this as an example throughout this article. Then we start streaming, with the aim of finding out whether the expression contains a node for addition:

boolean adds = BinaryTreeStreamer.postorderNodes(tree).filter(node -> node.value.equals("+")).findAny().isPresent();

which leads the code to traverse the entire tree down to nodes 4 and 5. And anyway, we are no closer to an iterating solution. This state of affairs got me thinking what behavior I actually expected from the this little test:

I wanted a method that lazily generated tree nodes. I wanted that method to suspend (or pause, wait, block, whatever) when it had produced a node I wanted to be able to consume that node Whereupon the suspended method would go and resume processing at exactly the point where it had left off

This sounded a lot like requiring a continuation-passing style of programming. So I started experimenting with state monads and CompletableFuture etc. without getting anywhere. Until I finally realized that I actually already knew how to solve the problem in python. I don't remember what source I learned this from, but in any case I suppose that the following is folklore to Python developers, homework stuff.

The thing is, Python has coroutines, called generators in Python. Here's how Wikipedia defines coroutines:

Coroutines are computer program components that generalize subroutines for nonpreemptive multitasking , by allowing multiple entry points for suspending and resuming execution at certain locations.

In Python you can say "yield" anywhere in a coroutine and the calling coroutine starts up again with the value that was yielded. Coroutines are like functions that return multiple times and keep their state so they can resume from where they yielded, so they have multiple entry points as well. The state would include the values of local variables plus the command pointer. (Note: Generally, a coroutine would be able to hand off a value to any other coroutine to continue, not just its caller. However, in a Python generator the control is always transferred to the caller.) So here's a Python solution to our problem, with the defaultdict as the tree implementation, using value , left and right as the dictionary keys.:

tree = lambda: defaultdict(tree) def postorder(tree): if not tree: return for x in postorder(tree['left']): yield x for x in postorder(tree['right']): yield x yield tree

One thing to note is that we must yield each value from the sub-generators. Although the recursive calls would dutifully yield all required nodes, they would yield them in embedded generators, which we must append one level up. That corresponds to the successive flat-mapping in our Java code. Here's how we can enumerate the first few nodes of our example tree in postorder:

expr = tree() expr['value'] = '*' expr['left']['value'] = '-' expr['left']['left']['value'] = '3' expr['left']['right']['value'] = '1' ... node = postorder(expr) print(next(node)['value']) print(next(node)['value']) print(next(node)['value'])

Many other languages besides Python have coroutines, or something similar, if not in the language, then at least as a library. So I started looking for JVM languages that have them. There aren't many, and especially Java is not among them. But I found a library for Scala . However, Scala is not a language that Java developers readily embrace. The happier I was to learn that coroutines will be a feature of Kotlin 1.1 , which is now in the early access phase.

I had already kn

浅析Python中__new__函数

$
0
0

python是面向对象的程序设计语言,Python把所有东西都被视作对象。

Python中有一类特殊的方法,即魔术方法-Magic Method(带两个 __ 的方法),如 __new__ 、 __init__ 。

要想理解 __new__ ,我们还要搞懂 __init__ 。

__init__ 方法是在初始化一个类实例的时候调用,但 __init__ 其实并不是实例化一个类的时候第一个被调用的方法,最先被调用的方法其实是 __new__ 方法。

__new__ 这个方法负责创建类实例,而 __init__ 负责初始化类实例 。 __new__ 函数可以用来自定义对象的创建,它的第一个参数是这个类的引用,然后是一些构造参数;返回值通常是对象实例的引用。

常见的类的声明和实例化:

class Foo(object): def __init__(self, a, b): print("__init__") self.a = a self.b = b def bar(self): pass i = Foo(2, 3)

使用 __new__ :

class Foo(object): def __new__(cls, *args, **kwargs): print("__new__") instance = super(Foo, cls).__new__(cls, *args, **kwargs) # 或instance = object.__new__(cls, *args, **kwargs) return instance# 如果返回的不是本对象实例,__init__函数将不会自动调用,这时你需要手动调用 def __init__(self, a, b): print("__init__") self.a = a self.b = b def bar(self): pass i = Foo(2, 3)
浅析Python中__new__函数

__new__ 的应用

一般情况下我们并不需要覆写 __new__ 函数,但是写API的时候可能会用到。

# 使用 __new__ 的单例模式

class Singleton(object): _instance = None# 保存实例的引用 def __new__(cls, *args, **kwargs): if not cls._instance: cls._instance = object.__name__(cls, *args, **kwargs) return cls._instance # 限制一个类所能创建的实例对象的总个数 class LimitedInstances(object): _instances = []# 保存创建的实例 limit = 5# 这个类只能创建5个对象 def __new__(cls, *args, **kwargs): if not len(cls._instances) <= cls.limit: raise RuntimeError, "Can not create instance. Limit %s reached" % cls.limit instance = object.__name__(cls, *args, **kwargs) cls._instances.append(instance) return instance def __del__(self): # 从_instances中移除实例 self._instance.remove(self) # 自定义实例对象 class CustomizeInstance(object): def __new__(cls, a, b): if not createInstance(): raise RuntimeError, "Count not create instance" instance = super(CustomizeInstance, cls).__new__(cls, a, b) instance.a = a return instance def __init__(self, a, b): pass class AbstractClass(object): def __new__(cls, a, b): instance = super(AbstractClass, cls).__new__(cls) instance.__init__(a, b) return 3 def __init__(self, a, b): print "Initializing Instance", a, b # >>> a = AbstractClass(2, 3) # Initializing Instance 2 3 # >>> a # 3

总结:不到万不得已不要使用这个方法,Zen of Python有这样一句:”Simple is better than complex.”

>>> import this TheZenofPython, byTimPeters Beautifulis betterthanugly. Explicitis betterthanimplicit. Simpleis betterthancomplex. Complex is betterthancomplicated. Flatis betterthannested. Sparseis betterthandense. Readabilitycounts. Specialcasesaren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Nowis betterthannever. Althoughneveris oftenbetterthan *right* now. If theimplementationis hardtoexplain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's domoreofthose! >>>

Share the post "浅析Python中__new__函数"

Google+ Weibo Email

读书 《 几何原本 》成书于公元前三百年左右,全书十三卷,是欧几里得将古希腊数学集大成的著作,包括了希腊科学数学家:泰利斯、毕达哥拉斯、希波克拉提斯等人的成果。它既是一本数学著作,也是哲学巨著,标志着人类首次完成了对空间的认识。全书章节安排严谨,由定义、公设、设准、命题(定理)、证明,以及符号和图像所构成,《几何原本》被翻译成世界上几乎所有的文字,对人们理性推演能力的影响,即对人的科学思想的影响深刻且巨大。

Viewing all 9596 articles
Browse latest View live