Scrapy: AttributeError: 'list' object has no attribute 'iteritems'

October 21, 2016, 5:51 pm

≫ Next: Python内置函数(9)――callable

≪ Previous: 浅析Python中__new__函数

在学习Scrapy的过程中，跟着极客学院的课程模仿着写了一个爬取dz模板网站的爬虫，在本地运行的好好的，但是拷贝到服务器上就GG了，报了如题目所示的错误。

产生原因

Scrapy不同版本间不兼容所致，我的本机是Ubuntu16.04，安装Scrapy比较久远了，是1.0.x版本，在这个版本下，跟极客学院的课程同步，settings.py文件中配置ITEM_PIPELINES的时候做如下配置：

ITEM_PIPELINES = ['yourspider.pipelines.yourspiderPipeline']

我在服务器上使用pip安装的Scrapy的最新版本1.1.2，在这个版本下配置ITEM_PIPELINES的时候相较1.0.x版本应修改为：

ITEM_PIPELINES = {
'yourspider.pipelines.yourspiderPipeline': 300,
}

其中数字代表这个管道的优先级，取0-1000之间的任意一个数即可。

↧

Python内置函数(9)――callable

October 21, 2016, 5:50 pm

≫ Next: Python语言十分钟快速入门

≪ Previous: Scrapy: AttributeError: 'list' object has no attribute 'iteritems'

英文文档：

callable ( object )

Returnif the object argument appears callable,if not. If this returns true, it is still possible that a call fails, but if it is false, calling object will never succeed. Note that classes are callable (calling a class returns a new instance); instances are callable if their class has amethod.

说明：

1. 方法用来检测对象是否可被调用，可被调用指的是对象能否使用()括号的方法调用。

>>> callable(callable) True >>> callable(1) False >>> 1() Traceback (most recent call last): File "<pyshell#5>", line 1, in <module> 1() TypeError: 'int' object is not callable >>>

2. 可调用对象，在实际调用也可能调用失败；但是不可调用对象，调用肯定不成功。

3. 类对象都是可被调用对象，类的实例对象是否可调用对象，取决于类是否定义了__call__方法。

>>> class A: #定义类A pass >>> callable(A) #类A是可调用对象 True >>> a = A() #调用类A >>> callable(a) #实例a不可调用 False >>> a() #调用实例a失败 Traceback (most recent call last): File "<pyshell#31>", line 1, in <module> a() TypeError: 'A' object is not callable >>> class B: #定义类B def __call__(self): print('instances are callable now.') >>> callable(B) #类B是可调用对象 True >>> b = B() #调用类B >>> callable(b) #实例b是可调用对象 True >>> b() #调用实例b成功 instances are callable now.

↧

Python语言十分钟快速入门

October 21, 2016, 8:07 pm

≫ Next: python10min系列之多线程下载器

≪ Previous: Python内置函数(9)――callable

python语言十分钟快速入门

一点号数据分析精选1小时前

假设你希望学习Python这门语言，却苦于找不到一个简短而全面的入门教程。那么本教程将花费十分钟的时间带你走入Python的大门。本文的内容介于教程(Toturial)和速查手册(CheatSheet)之间，因此只会包含一些基本概念。很显然，如果你希望真正学好一门语言，你还是需要亲自动手实践的。在此，我会假定你已经有了一定的编程基础，因此我会跳过大部分非Python语言的相关内容。本文将高亮显示重要的关键字，以便你可以很容易看到它们。另外需要注意的是，由于本教程篇幅有限，有很多内容我会直接使用代码来说明加以少许注释。

Python的语言特性

Python是一门具有强类型(即变量类型是强制要求的)、动态性、隐式类型(不需要做变量声明)、大小写敏感(var和VAR代表了不同的变量)以及面向对象(一切皆为对象)等特点的编程语言。

获取帮助

你可以很容易的通过Python解释器获取帮助。如果你想知道一个对象(object)是如何工作的，那么你所需要做的就是调用help(<object>)！另外还有一些有用的方法，dir会显示该对象的所有方法，还有<object>.__doc__会显示其文档：

help(5)Help on int object:(etc etc)dir(5)['__abs__''__add__'...]abs.__doc__ 'abs(number)->number Return the absolute value of the argument.'语法

Python中没有强制的语句终止字符，且代码块是通过缩进来指示的。缩进表示一个代码块的开始，逆缩进则表示一个代码块的结束。声明以冒号(:)字符结束，并且开启一个缩进级别。单行注释以井号字符(#)开头，多行注释则以多行字符串的形式出现。赋值（事实上是将对象绑定到名字）通过等号(“=”)实现，双等号(“==”)用于相等判断，”+=”和”-=”用于增加/减少运算(由符号右边的值确定增加/减少的值)。这适用于许多数据类型，包括字符串。你也可以在一行上使用多个变量。例如：

myvar=3myvar+=2myvar5myvar-=1myvar4"""This is a multiline comment. The following lines concatenate the two strings."""mystring="Hello"mystring+=" world."printmystring Hello world.# This swaps the variables in one line(!).# It doesn't violate strong typing because values aren't# actually being assigned, but new objects are bound to# the old names.myvarmystring=mystringmyvar数据类型

Python具有列表（list）、元组（tuple）和字典（dictionaries）三种基本的数据结构，而集合(sets)则包含在集合库中(但从Python2.5版本开始正式成为Python内建类型)。列表的特点跟一维数组类似（当然你也可以创建类似多维数组的“列表的列表”），字典则是具有关联关系的数组（通常也叫做哈希表），而元组则是不可变的一维数组（Python中“数组”可以包含任何类型的元素，这样你就可以使用混合元素，例如整数、字符串或是嵌套包含列表、字典或元组）。数组中第一个元素索引值(下标)为0，使用负数索引值能够从后向前访问数组元素，-1表示最后一个元素。数组元素还能指向函数。来看下面的用法：

sample=[1["another""list"],("a""tuple")]mylist=["List item 1"23.14]mylist[0]="List item 1 again"# We're changing the item.mylist[-1]=3.21# Here, we refer to the last item.mydict={"Key 1":"Value 1"2:3"pi":3.14}mydict["pi"]=3.15# This is how you change dictionary values.mytuple=(123)myfunction=lenprintmyfunction(mylist)3

你可以使用:运算符访问数组中的某一段，如果:左边为空则表示从第一个元素开始，同理:右边为空则表示到最后一个元素结束。负数索引则表示从后向前数的位置（-1是最后一个项目），例如：

mylist=["List item 1"23.14]printmylist[:]['List item 1'23.1400000000000001]printmylist[0:2]['List item 1'2]printmylist[-3:-1]['List item 1'2]printmylist[1:][23.14]# Adding a third parameter, "step" will have Python step in# N item increments, rather than 1.# E.g., this will return the first item, then go to the third and# return that (so, items 0 and 2 in 0-indexing).printmylist[::2]['List item 1'3.14]字符串

Python中的字符串使用单引号(‘)或是双引号(“)来进行标示，并且你还能够在通过某一种标示的字符串中使用另外一种标示符(例如 “He said ‘hello’.”)。而多行字符串可以通过三个连续的单引号(”’)或是双引号(“””)来进行标示。Python可以通过u”This is a unicode string”这样的语法使用Unicode字符串。如果想通过变量来填充字符串，那么可以使用取模运算符(%)和一个元组。使用方式是在目标字符串中从左至右使用%s来指代变量的位置，或者使用字典来代替，示例如下：

print"Name:%s Number:%s String:%s" % (myclass.name, 3, 3 * "-")Name:Poromenos Number:3String:---strString="""This is a multiline string."""# WARNING: Watch out for the trailing s in "%(key)s".print"This %(verb)s a %(noun)s."%{"noun":"test""verb":"is"}Thisisa test.流程控制

Python中可以使用if、for和while来实现流程控制。Python中并没有select，取而代之使用if来实现。使用for来枚举列表中的元素。如果希望生成一个由数字组成的列表，则可以使用range(<number>)函数。以下是这些声明的语法示例：

rangelist=range(10)printrangelist[0123456789]fornumberinrangelist: # Check if number is one of # the numbers in the tuple. ifnumberin(3479): # "Break" terminates a for without # executing the "else" clause. break else: # "Continue" starts the next iteration # of the loop. It's rather useless here, # as it's the last statement of the loop. continueelse: # The "else" clause is optional and is # executed only if the loop didn't "break". pass# Do nothingifrangelist[1]==2: print"The second item (lists are 0-based) is 2"elifrangelist[1]==3: print"The second item (lists are 0-based) is 3"else: print"Dunno"whilerangelist[1]==1: pass函数

函数通过“def”关键字进行声明。可选参数以集合的方式出现在函数声明中并紧跟着必选参数，可选参数可以在函数声明中被赋予一个默认值。已命名的参数需要赋值。函数可以返回一个元组（使用元组拆包可以有效返回多个值）。Lambda函数是由一个单独的语句组成的特殊函数，参数通过引用进行传递，但对于不可变类型(例如元组，整数，字符串等)则不能够被改变。这是因为只传递了该变量的内存地址，并且只有丢弃了旧的对象后，变量才能绑定一个对象，所以不可变类型是被替换而不是改变（译者注：虽然Python传递的参数形式本质上是引用传递，但是会产生值传递的效果）。例如：

# 作用等同于 def funcvar(x): return x + 1funcvar=lambdax:x+1printfuncvar(1)2# an_int 和 a_string 是可选参数，它们有默认值# 如果调用 passing_example 时只指定一个参数，那么 an_int 缺省为 2 ，a_string 缺省为 A default string。如果调用 passing_example 时指定了前面两个参数，a_string 仍缺省为 A default string。# a_list 是必备参数，因为它没有指定缺省值。defpassing_example(a_listan_int=2a_string="A default string"): a_list.append("A new item") an_int=4 returna_listan_inta_stringmy_list=[123]my_int=10printpassing_example(my_listmy_int)([123'A new item'],4"A default string")my_list[123'A new item']my_int10类

Python支持有限的多继承形式。私有变量和方法可以通过添加至少两个前导下划线和最多尾随一个下划线的形式进行声明（如“__spam”，这只是惯例，而不是Python的强制要求）。当然，我们也可以给类的实例取任意名称。例如：

classMyClass(object): common=10 def__init__(self): self.myvariable=3 defmyfunction(selfarg1arg2): returnself.myvariable # This is the class instantiationclassinstance=MyClassclassinstance.myfunction(12)3# This variable is shared by all classes.classinstance2=MyClassclassinstance.common10classinstance2.common10# Note how we use the class name# instead of the instance.MyClass.common=30classinstance.common30classinstance2.common30# This will not update the variable on the class,# instead it will bind a new object to the old# variable name.classinstance.common=10classinstance.common10classinstance2.common30MyClass.common=50# This has not changed, because "common" is# now an instance variable.classinstance.common10classinstance2.common50# This class inherits from MyClass. The example# class above inherits from "object", which makes# it what's called a "new-style class".# Multiple inheritance is declared as:# class OtherClass(MyClass1, MyClass2, MyClassN)classOtherClass(MyClass): # The "self" argument is passed automatically # and refers to the class instance, so you can set # instance variables as above, but from inside the class. def__init__(selfarg1): self.myvariable=3 printarg1classinstance=OtherClass("hello")helloclassinstance.myfunction(12)3# This class doesn't have a .test member, but# we can add one to the instance anyway. Note# that this will only be a member of classinstance.classinstance.test=10classinstance.test10异常Python中的异常由 try-except [exceptionname] 块处理，例如： defsome_function: try: # Division by zero raises an exception 10/0 exceptZeroDivisionError: print"Oops, invalid." else: # Exception didn't occur, we're good. pass finally: # This is executed after the code block is run # and all exceptions have been handled, even # if a new exception is raised while handling. print"We're done with that."some_functionOopsinvalid.We're donewiththat.导入外部库可以使用 import [libname] 关键字来导入。同时，你还可以用 from [libname] import [funcname] 来导入所需要的函数。例如： importrandomfromtimeimportclock randomint=random.randint(1100)printrandomint64文件I / O

Python针对文件的处理有很多内建的函数库可以调用。例如，这里演示了如何序列化文件(使用pickle库将数据结构转换为字符串)：

importpickle mylist=["This""is"413327]# Open the file C:binary.dat for writing. The letter r before the# filename string is used to prevent backslash escaping.myfile=open(r"C:binary.dat""w")pickle.dump(mylistmyfile)myfile.closemyfile=open(r"C:text.txt""w")myfile.write("This is a sample string")myfile.closemyfile=open(r"C:text.txt")printmyfile.read'This is a sample string'myfile.close# Open the file for reading.myfile=open(r"C:binary.dat")loadedlist=pickle.load(myfile)myfile.closeprintloadedlist['This''is'413327]其它杂项

数值判断可以链接使用，例如 1<a<3 能够判断变量 a 是否在1和3之间。

可以使用 del 删除变量或删除数组中的元素。

列表推导式(List Comprehension)提供了一个创建和操作列表的有力工具。列表推导式由一个表达式以及紧跟着这个表达式的for语句构成，for语句还可以跟0个或多个if或for语句，来看下面的例子：

lst1=[123]lst2=[345]print[x*yforxinlst1foryinlst2][345681091215]print[xforxinlst1if4x1][23]# Check if an item has a specific property.# "any" returns true if any item in the list is true.any([i%3foriin[33443]])True# This is because 4 % 3 = 1, and 1 is true, so any# returns True.# Check how many items have this property.sum(1foriin[33443]ifi==4)2dellst1[0]printlst1[23]dellst1

全局变量在函数之外声明，并且可以不需要任何特殊的声明即能读取，但如果你想要修改全局变量的值，就必须在函数开始之处用global关键字进行声明，否则Python会将此变量按照新的局部变量处理（请注意，如果不注意很容易被坑）。例如：

number=5defmyfunc: # This will print 5. printnumberdefanotherfunc: # This raises an exception because the variable has not # been bound before printing. Python knows that it an # object will be bound to it later and creates a new, local # object instead of accessing the global one. printnumber number=3defyetanotherfunc: globalnumber # This will correctly change the global. number=3小结

本教程并未涵盖Python语言的全部内容(甚至连一小部分都称不上)。Python有非常多的库以及很多的功能特点需要学习，所以要想学好Python你必须在此教程之外通过其它方式，例如阅读Dive into Python。我希望这个教程能给你一个很好的入门指导。如果你觉得本文还有什么地方值得改进或添加，或是你希望能够了解Python的哪方面内容，请留言。

↧

python10min系列之多线程下载器

October 21, 2016, 8:06 pm

≫ Next: [Python]-17-socket编程

≪ Previous: Python语言十分钟快速入门

今天群里看到有人问关于python多线程写文件的问题，联想到这是reboot的架构师班的入学题，我想了一下，感觉坑和考察的点还挺多，可以当成一个面试题来问，简单说一下我的想法和思路吧，涉及的代码和注释在github 跪求star

本文需要一定的python基础，希望大家对下面几个知识点有所了解

python文件处理，open write 简单了解http协议头信息 os，sys模块 threading模块多线程 requests模块发请求

题目既然是多线程下载，首先要解决的就是下载问题，为了方便测试，我们先不用QQ安装包这么大的，直接用pc大大英明神武又很内涵的头像举例，大概是这个样子( http://51reboot.com/src/blogimg/pc.jpg )

下载

python的requests模块很好的封装了http请求，我们选择用它来发送http的get请求，然后写入本地文件即可（关于requests和http，以及python处理文件的具体介绍，可以百度或者持续关注，后面我会写），思路既然清楚了，代码就呼之欲出了

# 简单粗暴的下载 import requests res=requests.get('http://51reboot.com/src/blogimg/pc.jpg') with open('pc.jpg','w') as f: f.write(res.content)

运行完上面的代码，文件夹下面多了个pc.jpg 就是你想要的图片了

上面代码功能太少了，注意，我们的要求是多线程下载，这种简单粗暴的下载完全不符合要求，所谓多线程，你可以理解为仓库里有很多很多袋奥利奥饼干，老板让我去都搬到公司来放好，而且要按照原顺序放好

上面的代码，大概就是我一个人去仓库，把所有奥利奥一次性拿回来，大概流程如下（图不清戳大）

我们如果要完成题目多线程的要求，首先就要把任务拆解，拆成几个子任务，子任务之间可以并行执行，并且执行结果可以汇总成最终结果

拆解任务

为了完成这个任务，我们首先要知道数据到底有多大，然后把数据分块去取就OK啦，我们要对http协议有一个很好的了解

用head方法请求数据，返回只有http头信息，没有主题部分我们从头信息Content-length的值，知道资源的大小，比如有50字节比如我们要分四个线程，每个线程去取大概1/4即可 50/4=12，所以前几个线程每人取12个字节，最后一个现成取剩下的即可每个线程取到相应的内容，文件中seek到相应的位置再写入即可 file.seek 为了方便理解，一开始我们先用单线程的跑通流程图大概如下（图不清戳大）
python10min系列之多线程下载器

思路清晰了，代码也就呼之欲出了，我们先测试一下range头信息

http头信息中的Range信息，用于请求头中，指定第一个字节的位置和最后一个字节的位置，如1-12，如果省略第二个数，就认为取到最后，比如36-

# range测试代码 import requests # http头信息，指定获取前15000个字节 headers={'Range':'Bytes=0-15000','Accept-Encoding':'*'} res=requests.get('http://51reboot.com/src/blogimg/pc.jpg',headers=headers) with open('pc.jpg','w') as f: f.write(res.content)

我们得到了头像的前15000个字节，如下图，目测range是对的

继续丰富我们的代码

要先用requests.head方法去获取数据的长度确认开几个线程后，给每个线程确认要获取的数据区间，即Range字段的值 seek写文件功能比较复杂了，我们需要用面向对象来组织一下代码先写单线程，逐步优化代码呼之欲出了 import requests # 下载器的类 class downloader: # 构造函数 def __init__(self): # 要下载的数据连接 self.url='http://51reboot.com/src/blogimg/pc.jpg' # 要开的线程数 self.num=8 # 存储文件的名字，从url最后面取 self.name=self.url.split('/')[-1] # head方法去请求url r = requests.head(self.url) # headers中取出数据的长度 self.total = int(r.headers['Content-Length']) print type('total is %s' % (self.total)) def get_range(self): ranges=[] # 比如total是50,线程数是4个。offset就是12 offset = int(self.total/self.num) for i in range(self.num): if i==self.num-1: # 最后一个线程，不指定结束位置，取到最后 ranges.append((i*offset,'')) else: # 每个线程取得区间 ranges.append((i*offset,(i+1)*offset)) # range大概是[(0,12),(12,24),(25,36),(36,'')] return ranges def run(self): f = open(self.name,'w') for ran in self.get_range(): # 拼出Range参数获取分片数据 r = requests.get(self.url,headers={'Range':'Bytes=%s-%s' % ran,'Accept-Encoding':'*'}) # seek到相应位置 f.seek(ran[0]) # 写数据 f.write(r.content) f.close() if __name__=='__main__': down = downloader() down.run() 多线程

多线程和多进程是啥在这就不多说了，要说明白还得专门写个文章，大家知道threading模块是专门解决多线程的问题就OK了，大概的使用方法如下，更详细的请百度或者关注后续文章

threading.Thread创建线程，设置处理函数 start启动 setDaemon 设置守护进程 join设置线程等待代码如下 import requests import threading class downloader: def __init__(self): self.url='http://51reboot.com/src/blogimg/pc.jpg' self.num=8 self.name=self.url.split('/')[-1] r = requests.head(self.url) self.total = int(r.headers['Content-Length']) print 'total is %s' % (self.total) def get_range(self): ranges=[] offset = int(self.total/self.num) for i in range(self.num): if i==self.num-1: ranges.append((i*offset,'')) else: ranges.append((i*offset,(i+1)*offset)) return ranges def download(self,start,end): headers={'Range':'Bytes=%s-%s' % (start,end),'Accept-Encoding':'*'} res = requests.get(self.url,headers=headers) print '%s:%s download success'%(start,end) self.fd.seek(start) self.fd.write(res.content) def run(self): self.fd = open(self.name,'w') thread_list = [] n = 0 for ran in self.get_range(): start,end = ran print 'thread %d start:%s,end:%s'%(n,start,end) n+=1 thread = threading.Thread(target=self.download,args=(start,end)) thread.start() thread_list.append(thread) for i in thread_list: i.join() print 'download %s load success'%(self.name) self.fd.close() if __name__=='__main__': down = downloader() down.run()

执行python downloader效果如下

total is 21520 thread 0 start:0,end:2690 thread 1 start:2690,end:5380 thread 2 start:5380,end:8070 thread 3 start:8070,end:10760 thread 4 start:10760,end:13450 thread 5 start:13450,end:16140 thread 6 start:16140,end:18830 thread 7 start:18830,end: 0:2690 is end 2690:5380 is end 13450:16140 is end 10760:13450 is end 5380:8070 is end 8070:10760 is end 18830: is end 16140:18830 is end download pc.jpg load success

run函数做了修改，加了多线程的东西，加了一个download函数专门用来下载数据块，这俩函数详细解释如下

def download(self,start,end): #拼接Range字段,accept字段支持所有编码 headers={'Range':'Bytes=%s-%s' % (start,end),'Accept-Encoding':'*'} res = requests.get(self.url,headers=headers) print '%s:%s download success'%(start,end) #seek到start位置 self.fd.seek(start) self.fd.write(res.content) def run(self): # 保存文件打开对象 self.fd = open(self.name,'w') thread_list = [] #一个数字,用来标记打印每个线程 n = 0 for ran in self.get_range(): start,end = ran #打印信息 print 'thread %d start:%s,end:%s'%(n,start,end) n+=1 #创建线程传参,处理函数为download thread = threading.Thread(target=self.download,args=(start,end)) #启动 thread.start() thread_list.append(thread) for i in thread_list: # 设置等待 i.join() print 'download %s load success'%(self.name) #关闭文件 self.fd.close()

持续可以优化的点

一个文件描述符多个进程用会出问题建议用os.dup复制文件描述符和os.fdopen来打开处理文件要下载的资源地址和线程数,应该做成命令行传进来的用sys.argv获取命令行参数支持python downloader.py url num这种写法参数数量不对或者格式不对时报错各种容错处理正所谓女人的迪奥，男人的奥利奥，这篇文章，你值得拥有

大概就是这样了，我也是正在学习python，文章代表我个人看法，有错误不可避免，欢迎大家指正，共同学习，本文完整代码在github,跪求大家star

↧

[Python]-17-socket编程

October 21, 2016, 8:05 pm

≫ Next: Python的装饰器decorator

≪ Previous: python10min系列之多线程下载器

引言

这篇文章介绍python内置socket模块，使用这个模块模拟一个HTTP GET请求，并将服务器返回的数据保存到本地。

文章目录 0×1.使用TCP传输数据

在互联网上，客户机与服务器通信，实际上是两台机器上的两个不同进程之间的通信，而我们经常听说的socket，只是一个抽象的概念，它通常表示一台机器上所打开的一个连接到目标的"网络接口"，这个网络接口包含了目标机器的IP地址，端口号，以及双方使用的协议类型；

下面这段代码使用socket向本站所在服务器发送了一个HTTP GET请求，然后将服务器返回的HTML数据保存到本地的qingsword.html文件中：

#!/usr/bin/env python3 #coding=utf-8 #导入socket模块 import socket #初始化一个socket对象，AF_INET表示使用IPv4协议（AF_INET6表示IPv6协议），SOCK_STREAM表示这个socket使用TCP通信（UTP为SOCK_DGRAM） s=socket.socket(socket.AF_INET, socket.SOCK_STREAM) #连接到www.qingsword.com的80端口，括号中是一个元组 s.connect(("www.qingsword.com",80)) #发送GET请求，请求服务器/目录，请求协议版本HTTP1.1，在请求完成后关闭连接 s.send("GET / HTTP/1.1\r\nHost: www.qingsword.com\r\nConnection: Close\r\n\r\n".encode("utf-8")) #上面的语句可以分开写成下面的样子，在字符串前使用b前缀和在字符串后使用.encode("utf-8")都能将字符串转码成可用于网络传输的字节码（在要发送的字符串中包含中文时，推荐使用.encode("utf-8")转码） #s.send(b"GET / HTTP/1.1\r\n") #s.send(b"Host:www.qingsword.com\r\n") #s.send(b"Connection:Close\r\n\r\n") #初始化一个列表用于接收服务器返回的数据 buffer=[] while True: #每次读取1024个字节 d=s.recv(1024) if d: #如果还有数据，将其添加到列表中 buffer.append(d) else: break #将列表中的数据使用空字符串连接成一个整体 data="".encode("utf-8").join(buffer) s.close() #关闭socket #使用split将data分割一次（遇到第一个"\r\n\r\n"分割成前后两部分） header,html=data.split("\r\n\r\n".encode("utf-8"),1) #将第一部分服务器返回的HTTP头部信息解码后打印出来 print(header.decode("utf-8")) #将第二部分HTML信息保存到当前脚本所在目录中的qingsword.html文件中 with open("qingsword.html","wb") as f1: f1.write(html) 0×2.socket客户端&服务端模型

下面是一个简单的客户端到服务端socket通信模型实例：

服务端：

#!/usr/bin/env python3 #coding=utf-8 from multiprocessing import Process import socket,time #用于处理新的连接，接收新连接的socket和IP:port def Hello_Socket(sock,addr): #addr包含了客户端主机的IP和端口号 print("创建新的客户端连接成功，%s:%s"%addr) #发送一个欢迎信息给客户端 sock.send("欢迎！".encode("utf-8")) while True: #从客户端读取消息，如果消息为空或读取到exit，退出循环 data=sock.recv(1024) time.sleep(1) #模拟网络延迟 if not data or data.decode("utf-8")=="exit": break #在读取到的消息前添加Hello，再发送回客户端 sock.send(("Hello %s"%data.decode("utf-8")).encode("utf-8")) #关闭连接 sock.close() print("%s:%s连接断开..."%addr) if __name__=="__main__": so=socket.socket(socket.AF_INET, socket.SOCK_STREAM) Local_IP_Addr="127.0.0.1" Local_Port=8899 #服务端打开一个socket，并且绑定本地网卡IP和端口号，客户端通过服务端这个IP以及端口和服务器通信 so.bind((Local_IP_Addr,Local_Port)) #监听端口，最多同时支持10个客户端连接 so.listen(10) print("在本地接口%s:%s等待客户端连接...."%(Local_IP_Addr,Local_Port)) while True: #accept()方法会阻塞程序，程序在这里等待客户端连接，如果有多个连接，accept()按先后顺序每次取一个，并返回这个连接的socket以及客户端IP和port，分别保存到前面的两个变量中 sock,addr=so.accept() #创建一个新的进程来处理传入的连接，将连接对应的socket和IP:port传递给这个进程处理程序 p1=Process(target=Hello_Socket,args=(sock,addr)) p1.start() #启动进程

客户端：

#!/usr/bin/env python3 #coding=utf-8 import socket #新建一个socket，连接到服务端IP的对应端口 so=socket.socket(socket.AF_INET, socket.SOCK_STREAM) so.connect(("127.0.0.1",8899)) #接收服务端的欢迎信息 print(so.recv(1024).decode("utf-8")) #将列表中的四个元素发送给服务端 for d in ["A","B","C","D"]: so.send(d.encode("utf-8")) print(so.recv(1024).decode("utf-8")) #发送exit，断开连接 so.send("exit".encode("utf-8")) so.close()

首先启动服务端程序，然后再打开客户端程序，输出如下：

#服务端在本地接口127.0.0.1:8899等待客户端连接.... 创建新的客户端连接成功，127.0.0.1:45552 127.0.0.1:45552连接断开... #客户端欢迎！ Hello A Hello B Hello C Hello D 0×3.使用UDP传输数据

UDP传输数据时，双方不需要建立连接，在客户端只需要知道服务端的IP和UDP端口，在服务端只需要监听一个UDP端口来接收数据即可，UDP本身并不保证数据的可靠到达，所以数据的传输速度会比TCP快很多，下面是一个服务端和客户端使用UDP通信的实例：

服务端：

#!/usr/bin/env python3 #coding=utf-8 import socket if __name__=="__main__": #SOCK_DGRAM表示这是一个UDP socket s=socket.socket(socket.AF_INET, socket.SOCK_DGRAM) Local_IP_Address="127.0.0.1" Local_Port=20086 #UDP的socket只需要在对应网卡接口绑定本地监听端口即可接收客户端的信息 s.bind((Local_IP_Address,Local_Port)) print("在本地接口%s:%s接收客户端消息..."%(Local_IP_Address,Local_Port)) while True: #recvfrom()方法返回两个值，第一个为从socket接收到的信息，第二个为客户端的(IP,port)元组 data,addr=s.recvfrom(1024) print("从%s:%s接收到消息:%s"%(addr[0],addr[1],data.decode("utf-8"))) #sendto函数接受两个必选参数，语法如下 #sendto(发送的数据,(接收端ip,port)) s.sendto("Hello %s".encode("utf-8")%(data),addr)

客户端：

#!/usr/bin/env python3 #coding=utf-8 import socket #客户端只需要初始化一个UDP socket，然后使用sendto方法将数据发送给服务端即可，接收服务端的返回数据时使用recv方法（因为已经直到服务器的ip和固定端口，所以不需要使用recvfrom方法来获取服务端的IP和端口号了），并将接收的数据解码后打印出来 so=socket.socket(socket.AF_INET, socket.SOCK_DGRAM) for d in ["A","B","C","D"]: so.sendto(d.encode("utf-8"),("127.0.0.1",20086)) print(so.recv(1024).decode("utf-8")) so.close()

首先启动服务端，然后启动客户端，输出如下：

#服务端在本地接口127.0.0.1:20086接收客户端消息... 从127.0.0.1:58172接收到消息:A 从127.0.0.1:58172接收到消息:B 从127.0.0.1:58172接收到消息:C 从127.0.0.1:58172接收到消息:D #客户端 Hello A Hello B Hello C Hello D

↧

Python的装饰器decorator

October 21, 2016, 9:48 pm

≫ Next: Visualizing Bayes’ Theorem with D3

≪ Previous: [Python]-17-socket编程

python的装饰器decorator

作者：王大为

时间：2016-10-19

一、装饰器的本质

本质：装饰器本身就是一个函数，高阶函数+嵌套函数==>装饰器

原则：

* 1、不能修改被装饰函数的源代码 * 2、不能修改被装饰函数的调用方式二、装饰器需要的知识储备

1、函数即变量

2、高阶函数

3、嵌套函数

三、函数即变量形象的比喻，整个内存是一座大楼，其中每个房间存储的是对应的值，而变量名就是对应房间的门牌；一个房间的门牌可以有好几个，也就是不同的变量名可以对应同一段内存地址
Python的装饰器decorator

对比一下四种方式的函数调用说明：变量先声明后调用

第一种

第二种

第三种

第四种

四、高阶函数

定义：满足下面条件二选一即可

* a，把一个函数名当做实参传给另外一个函数（在不修改被装饰函数源代码的情况下为其添加功能） * b，返回值中包含函数名（不修改函数的调用方式）

举例如下：

五、嵌套函数

定义：函数中嵌套函数（在一个函数体内使用def定义一个局部函数）

举例如下：

六、装饰器

1、基本装饰器

code代码：

import time def timer(func): def inner(): start_time = time.time() func() stop_time = time.time() print('the func run time is %s' % (stop_time - start_time)) return inner @timer def f1(): time.sleep(1) print('in the f1') f1()

原理：

2、装饰的函数带有参数，需要万能参数(*args, **kwargs)

import time def timer(func): def inner(*args, **kwargs): start_time = time.time() func(*args, **kwargs) stop_time = time.time() print('the func run time is %s' % (stop_time - start_time)) return inner @timer def f1(name): time.sleep(1) print('in the f1,%s' % name) f1('linda')

3、装饰的函数带有返回值，需要指定返回值

import time def timer(func): def inner(): start_time = time.time() ret = func() stop_time = time.time() print('the func run time is %s' % (stop_time - start_time)) return ret return inner @timer def f1(): time.sleep(1) print('in the f1') return 'hello' print(f1())

4、单层装饰器最终版

import time def timer(func): def inner(*args, **kwargs): start_time = time.time() ret = func(*args, **kwargs) stop_time = time.time() print('the func run time is %s' % (stop_time - start_time)) return ret return inner @timer def f1(name): time.sleep(1) print('in the f1,%s' % name) return 'hello' print(f1('linda'))

5、多层装饰器

代码code：

def timer1(func): def inner1(*args, **kwargs): print('begin in the timer1') ret = func(*args, **kwargs) print('stop in the timer1') return ret return inner1 def timer2(func): def inner2(*args, **kwargs): print('begin in the timer2') ret = func(*args, **kwargs) print('stop in the timer2') return ret return inner2 @timer1 @timer2 def f1(name): print('in the f1,%s' % name) return 'hello' f1('linda')

结果result：

原理如下：

**编译时**：从下往上一层一层嵌套、编译 **执行时**，从上往下一层一层调用、执行
Python的装饰器decorator

6、可以加参数的装饰器（web使用）

代码code：

def timer1(position): print('position：', position) def inner1(func): def inner2(*args, **kwargs): if position == 'local': print('begin in the timer1') ret = func(*args, **kwargs) print('stop in the timer1') return ret else: print('你不是本地local用户.') return inner2 return inner1 @timer1(position='now') def f1(name): print('in the f1,%s' % name) return 'hello' f1('linda')

6.1、当装饰器的参数为@timer1(position=’local’)

结果result：

6.2、当装饰器的参数为@timer1(position=’remote’)

结果result：

↧

Visualizing Bayes’ Theorem with D3

October 21, 2016, 11:49 pm

≫ Next: 在 Python 里设置 stdout 的编码

≪ Previous: Python的装饰器decorator

I have been interested in getting better at Bayesian statistics recently. For my Ranking PGA Tour golfers project, I have been looking at Approximate Bayes’ Computation. This involves discretizing a probabilility distribution and updating the distribution according to data observed and a liklihood function. Here is a discussion I had online which leads me to the ABC approach.

In the process of playing around with updating normal distributions, I was really enjoying visualizing the updating process. I also wanted to keep practicing developing visualizations with D3 after learning the basics at Metis .

For this project, I actually did the computations properly. Instead of resorting to discretizing the distribution, I have a user set a prior and then I follow the conjugate-prior updating steps. The conjugate prior distribution for a normal random variable with unknown mean and variance is a Normal-Gamma (or equivelently a Normal-Inverse-Gamma). The user specifies the expected mean, a precision on this estimate, the expected standard deviation, and the precision on this estimate (unbeknownst to the user, the presision on the standard deviation is actually the precision on the variance; this is a minor technical detail). This resourse was really helpful.

I learned a lot of D3 with this project. I was originally going to make this a Flask app and carry out the computations in python. However, I realized that D3 is capable of doing this sort of math! This was a fun exersize in learning the limits of D3’s computational ability. The one consequence that I notice is that the memory cost starts to effect D3’s ability to render smoothly - you can see this if you scale the number of points up to 100.

Here is the app deployed with Heroku and here is the code for the project.

As always, if anyone has feedback for improvements, please comment below!

↧

在 Python 里设置 stdout 的编码

October 21, 2016, 11:48 pm

≫ Next: 小程序（1）―― 内网扫描工具

≪ Previous: Visualizing Bayes’ Theorem with D3

有时候进程的运行环境里，locale 会被设置成只支持 ASCII 字符集的（比如 LANG=C）。这时候 python 就会把标准输出和标准错误的编码给设置成 ascii，造成输出中文时报错。

一种解决办法是设置支持 UTF-8 的 locale，但是那需要在 Python 进程启动前设置。启动之后，初始化过了，再设置 locale 也不会重新初始化那些对象。

另一种办法是往 sys.stdout.buffer 这种地方直接写 bytes。理论上完全没问题，但是写起程序来好累……

我就去找了一下怎么优雅地弄一个新的 sys.stdout 出来。Python 3 的 I/O 不再使用 C 标准库的 I/O 函数，而是直接使用 OS 提供的接口。封装位于 io 这个模块里边，有带缓冲的，不带缓冲的，二进制的，文本的。

研究了一下文档可知，sys.stdout 是个 io.TextIOWrapper，有个 buffer 属性，里边是个 io.BufferedWriter。我们用它造一个新的 io.TextIOWrapper，指定编码为 UTF-8：

import sys
import io
def setup_io():
sys.stdout = sys.__stdout__ = io.TextIOWrapper(
sys.stdout.detach(), encoding='utf-8', line_buffering=True)
sys.stderr = sys.__stderr__ = io.TextIOWrapper(
sys.stderr.detach(), encoding='utf-8', line_buffering=True)

这里除了可以设置编码之外，也可以设置错误处理和缓冲。所以这个技巧也可以用来容忍编码错误、改变标准输出的缓冲（不需要在启动的时候加 -u 了）。

其实这样子还是不够彻底。Python 在很多地方都有用到默认编码。比如 subprocess，指定 universal_newlines=True 时 Python 会自动给标准输入、输出、错误编解码，但是呢，在 Python 3.6 之前，这里的编码是不能手动指定的。还有参数的编码，也是不能指定的（不过可以传 bytes 过去）。

所以，还是想办法去设置合适的 locale 更靠谱……

↧

小程序（1）―― 内网扫描工具

October 22, 2016, 2:33 am

≫ Next: Python内置函数(10)――chr

≪ Previous: 在 Python 里设置 stdout 的编码

这是一个搭配 reGeorg 工具使用的内网扫描脚本。

当我们利用weshell做代理内网渗透时，reGeorg是一个常用的手段。windows下reGeorg搭配proxifier，能够访问webshell所在的内网机器。

简单说下 reGeorg使用：

将对应的tunnel脚本上传到webshell服务器后，在本机运行reGeorgSocksProxy.py做代理，-p 设置端口

python reGeorgSocksProxy.py -p 1234 -u http://xx.com/tunnel.jsp

然后运行Proxifier，配置一个代理服务器，port是上一步设置的端口，然后就可以欢快地访问webshell所在内网的其他服务器了

为什么写了这么个工具呢

当你在shell服务器中，获取到多个内网网段时，这时就会想批量地检测一下哪些ip开了http端口，可是很多工具是不经过我们的代理服务器的，一时又想不起来什么可配置socks5代理的扫描器。只是差了一小点，与其不去找工具了，写个脚本吧。

需要的环境和包：

Python 2.6+ pysocks(使用socks代理) win_inet_pton（解决windows下pysocks报错）

和常规http请求的python程序不同的地方，只是多配置了一个socks代理，相当于把Proxifier的功能搬到程序中，几行代码而已；之后就进行正常的http请求，requests或者urllib2都可以

def scan(): SOCKS_PROXY_HOST = '127.0.0.1' SOCKS_PROXY_PORT = 1234 …… default_socket = socket.socket socks.set_default_proxy(socks.SOCKS5, SOCKS_PROXY_HOST,SOCKS_PROXY_PORT) socket.socket = socks.socksocket

程序中设置了常用端口：

ip_port = [22, 80, 443, 3389, 6379, 7001, 8080, 27017]

使用方法：python net_scan.py ip 192.168.10.0/24

扫描完成后在当前目录生成一个txt文件

进击的功能

程序到这儿非常简单，然后就想说为什么不干脆扫一下敏感路径呢。于是添加了一个扫路径功能，程序结构变成了这样

使用时，先在dic文件夹中添加字典，命名规则必须是这样：

net_scan.py中调用dir_scan.py中的方法,dir_scan.py是一个类，所以稍微修改一下也可以单独使用。如果不想使用路径扫描功能，将程序最后下列代码注释（毕竟如果ip列表和字典大了会比较慢）：

batchScan = batch_scan(success_list) batchScan.getList() batchScan.run() dirScan_path = str(time.strftime('%Y%m%d%H%M%S',time.localtime(time.time())))+'_'+str(ip_lists[0])+'_dir.txt' for url in batchScan.successList: saveFile(url,dirScan_path)

为什么字典命名要固定,这里是由于程序自动筛选字典，根据header中的信息，来选择字典：

powered_by = headers['x-powered-by'].lower() server = headers['server'].lower() if ('php') in powered_by: self.choose_dic('php') elif ('asp') in powered_by: self.choose_dic('asp') self.choose_dic('aspx') elif ('jboss' and 'java' and 'jsp' and 'weblogic') in powered_by: self.choose_dic('jsp') elif ('tomcat') in server: self.choose_dic('jsp') elif ('centos' and 'linux' and 'redhat') in server: self.choose_dic('php') self.choose_dic('jsp')

由于测试比较少，所以只提取了 x-powered-by 和 server ，而且匹配的字符也比较少，如果没有匹配到信息默认选择dir和backup字典，之后有空会优化~

扫描结束后也会生成txt文件

程序地址： https://github.com/kovige/NetScan

不要在意程序里那些细节，毕竟我是个女同学，就是这样！

↧

Python内置函数(10)――chr

October 22, 2016, 2:32 am

≫ Next: Python Locust性能测试框架实践

≪ Previous: 小程序（1）―― 内网扫描工具

英文文档：

chr ( i ) Return the string representing a character whose Unicode code point is the integer i . For example, chr(97) returns the string 'a' , while chr(8364) returns the string '' . This is the inverse of. The valid range for the argument is from 0 through 1,114,111 (0x10FFFF in base 16).will be raised if i is outside that range 说明： 1. 函数返回整形参数值所对应的Unicode字符的字符串表示 >>> chr(97) #参数类型为整数 'a' >>> chr('97') #参数传入字符串时报错 Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> chr('97') TypeError: an integer is required (got type str) >>> type(chr(97)) #返回类型为字符串 <class 'str'>

2. 它的功能与ord函数刚好相反

>>> chr(97) 'a' >>> ord('a') 97

3. 传入的参数值范围必须在0-1114111(十六进制为0x10FFFF)之间，否则将报

>>> chr(-1) #小于0报错 Traceback (most recent call last): File "<pyshell#10>", line 1, in <module> chr(-1) ValueError: chr() arg not in range(0x110000) >>> chr(1114111) '\U0010ffff' >>> chr(1114112) #超过1114111报错 Traceback (most recent call last): File "<pyshell#13>", line 1, in <module> chr(1114112) ValueError: chr() arg not in range(0x110000)

↧

Python Locust性能测试框架实践

October 22, 2016, 5:24 am

≫ Next: MicroPython 教程

≪ Previous: Python内置函数(10)――chr

python Locust性能测试框架实践

3小时前来源：cnblogs

Locust的介绍

Locust是一个python的性能测试工具，你可以通过写python脚本的方式来对web接口进行负载测试。

Locust的安装

首先你要安装python2.6以上版本，而且有pip工具。之后打开命令行，分别安装locustio和pyzmq（命令如下）：

pip install locustio pip install pyzmq

之后我们就可以写性能测试脚本了。

Locust脚本编写

接下来我们拿两个接口做一下测试，编写脚本如下（每一步都有注释）。我来解释一下，首先我们要import进来三个类，分别是HttpLocust（用来模拟发请求的类）、TaskSet（顾名思义，任务集）、task（任务类）。额外的，为了方便观察接口测试的执行结果，我引入了json类用来解析web接口的返回值。我还引入了subprocess类用来执行一下shell命令，自启动Locust。这里有三个类，一个是UserBehavior（名字随便起，但传入TaskSet参数，说明这是一个包含了任务集的类），里面on_start函数可有可无，他会先于所有task函数运行。剩下被@task装饰器装饰的方法都是任务方法，里面包含了待请求的接口等信息，传入的参数代表了权重，如下所示两个被@task装饰的方法分别传入1和2，这意味着每3个人里有两个人会有1个模拟用户执行list_header方法，2个模拟用户执行list_goods方法。这个参数你也可以不传入，那就意味着模拟用户会随机访问所有被@task装饰的方法。这里面我对于每个接口的返回值都做了一下判断，首先将返回的字符串转成json格式并获取返回字段result的值，如果不是100就用Locust自带的报错方法打印出错信息；另两个类是HttpLocust类（仍然是名字随便起但传入参数必须得是HttpLocust），是用来模拟用户的类，包含了一些模拟用户信息，其中task_set变量的值用来指定模拟用户所对应要完成的TaskSet类中包含的请求，min_wait和max_wait（最小等待时间和最大等待时间用来模拟用户每两步操作之间的间隔时间，这里也就是模拟用户每执行两个请求之间所间隔的时间）。对Locust类我们可以指定权重，对weight变量的值进行指定。如下所示，两个Locust类的权重分别为1和3，这意味着两个Locust类的模拟用户人数为1:3的关系。最后我加了一个main函数用来执行shell命令，这个shell命令也可以不再本文件中执行，如果写在脚本中的话，直接在命令行中调用该python文件即可，如果不写在脚本中（注释掉最后两行），则需要在命令行终端里对Locust项目进行启动。

from locust import HttpLocust,TaskSet,task import subprocess import json #This is the TaskSet class.class UserBehavior(TaskSet): #Execute before any task.def on_start(self): pass#the @task takes an optional weight argument. @task(1) def list_header(self): r = self.client.get("") if json.loads((r.content))["result"] != 100: r.failure("Got wrong response:"+r.content) @task(2) def list_goods(self): r = self.client.get("") if"result"] != 100: r.failure("Got wrong response:"+r.content) #This is one HttpLocust class.class WebUserLocust(HttpLocust): #Speicify the weight of the locust. weight = 1 #The taskset class name is the value of the task_set. task_set = UserBehavior #Wait time between the execution of tasks. min_wait = 5000 max_wait = 15000 #This is another HttpLocust class.class MobileUserLocust(HttpLocust): weight = 3 task_set = UserBehavior min_wait = 3000 max_wait = 6000 #if __name__ == '__main__': # subprocess.Popen('locust -f .\locust_test_1.py --host=http://api.g.caipiao.163.com', shell=True) Locust的启动

对Locust项目的启动，我们可以在命令行终端中执行以下命令：

这里的“-f”指定了要执行的python文件路径，“--host”指定了模拟用户请求接口的host名。执行该命令，Locust项目就启动了。如果遇到下面的错误，注意[Errorno 10048]那行，可以看出端口8089被占用导致Locust项目启动失败，这里我们需要找到对应占用了8089端口的进程并杀掉：

php?url=0Ejfs0Q3Cd" alt="Python Locust性能测试框架实践" />

为了检测占用端口的进程我写了一个PowerShell小脚本：

function checkPid($result$port){ $port = $port.split(":")[1] if(($result.split)[6].split(":")[($result.split)[6].split(":").Count-1] -eq$port){ $tPid = ($result.split)[($result.split).count-1] if($tPid-ne"0"){ Write-Host "您查询的端口被以下程序占用：" -ForegroundColor Red $target = tasklist|findstr $tPid Write-Host $target$sig = $true }else{ $sig = $false } }else{ $sig = $false } $sig } function checkPort($port){ $port = ":" + $port$results = netstat -ano|findstr $portif($results.count -gt 0){ if($results.count -eq 1){ $sig = checkPid $results$portif($sig-eq$false){ Write-Host "您所查询的端口未被占用！" -ForegroundColor Green } }else{ foreach($resultin$results){ if($result){ $sig = checkPid $result$portif($sig-eq$true){ break } } } if($sig-eq$false){ Write-Host "您所查询的端口未被占用！" -ForegroundColor Green } } }else{ Write-Host "您所查询的端口未被占用！" -$port = $nullwhile($port-ne"exit"){ $port = Read-Host "请输入要查询的端口号"if($port-eq"exit"){ break } checkPort $port }

运行该脚本，输入端口号8089我们可以看出python.exe进程占用了该端口号：

然后我们在PowerShell中杀掉该进程，再启动Locust项目，就成功了（如下）：

Locust负载测试

在浏览器中输入“http://localhost:8089/”访问，会看到如下页面：

这里我们按提示输入要模拟的用户总数和每秒钟并发的用户数量，点击“Start swarming”就可以运行负载测试了：

点击“STOP”按钮停止负载测试，现在STATUS为“STOPPED”，点击“New test”可以进行一个新的测试：

从上图可以看出在Statistics标签下列出了一些性能相关的测试结果，比如总的请求数量、请求失败的个数、每秒钟的请求数、最小\最大响应时间、平均响应时间等。右上角显示了请求失败率和总的RPS（每秒钟请求数）。对应在Statistic右侧的Failures、Exceptions、Download Data标签下我们分别可以查看失败的请求、捕获的异常以及下载测试结果。这里不做过多介绍了，可以实际应用看一下。如果想深入的了解Locust性能测试框架，去官网上看看吧。

↧

MicroPython 教程

October 22, 2016, 5:23 am

≫ Next: Talk Python to Me: #81 Python and Machine Learning in Astronomy

≪ Previous: Python Locust性能测试框架实践

1: 如果你的OpenMV镜头变脏，请使用超细纤维布沾取异丙醇擦拭。 2: 连接到您的OpenMV。使用在IDE中helloworld.py脚本来帮助。 3: 对于windows，请参阅文件首页的设置指南链接。 4: 对于Mac，我们没有开发IDE，只是目前还没有，我们会生成一个后我们完成的文档。或者，您可以安装适用于Mac的开发环境和运行只在自己的IDE（因为它的一个python脚本）。 5: 对于linux，只要将相机连接到你的电脑，它应该就可以工作。 6: OpenMV使用通用的USB接口。 7: 在IDE的帧缓冲选择一个区域，然后单击复制颜色来获得颜色跟踪设置。 8: 如果你想帮助我们提交新的版本。如果你觉得缺了点什么，你可以添加新的功能。 9: 请在论坛上提问。尽量不要发送电子邮件。 10: 如果问题很难，请在论坛上发帖提问。不要害怕在论坛中发帖。我们希望用户的反馈。

↧

Talk Python to Me: #81 Python and Machine Learning in Astronomy

October 22, 2016, 5:22 am

≫ Next: Python内置函数(11)――classmethod

≪ Previous: MicroPython 教程

Episode #81: python and Machine Learning in Astronomy

Published Fri, Oct 21, 2016, recorded Fri, Oct 21, 2016.

( embed this episode via SoundCloud )

The advances in Astronomy over the past century are both evidence of and confirmation of the highest heights of human ingenuity. We have learned by studying the frequency of light that the universe is expanding. By observing the orbit of Mercury that Einstein's theory of general relativity is correct.

It probably won't surprise you to learn that Python and data science play a central role in modern day Astronomy. This week you'll meet Jake VanderPlas, an astrophysicist and data scientist from University of Washington. Join Jake and me while we discuss the state of Python in Astronomy.

Links from the show:

Jake on Twitter : @jakevdp

Jake on the web : staff.washington.edu/jakevdp

Python Data Science Handbook : shop.oreilly.com/product/0636920034919.do

Python Data Science Handbook on GitHub : github.com/jakevdp/PythonDataScienceHandbook

Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data : press.princeton.edu/titles/10159.html

PyData Talk : youtube.com/watch?v=qOOk6l-CHNw

eScience Institue : @UWeScience

Large Synoptic Survey Telescope : lsst.org

AstroML: Machine Learning and Data Mining for Astronomy : astroml.org

Astropy project : astropy.org

altair package : pypi.org/project/altair

Want to go deeper? Check out my courses
Talk Python to Me: #81 Python and Machine Learning in Astronomy

Jake VanderPlas

Subscribe @ iTunes Download MP3

Episode sponsors

Jake VanderPlas is a Senior Data Science Fellow at University of Washington’s eScience institute. His background is in Astronomy, and apart from his own research and writing, he spends much of his time developing, maintaining, and training users of the open software tools that are increasingly important to researchers in today’s data-centric world.

↧

Python内置函数(11)――classmethod

October 22, 2016, 6:24 am

≫ Next: Opening the BBC micro:bit

≪ Previous: Talk Python to Me: #81 Python and Machine Learning in Astronomy

英文文档：

classmethod ( function )

Return a class method for function .

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C: @classmethod def f(cls, arg1, arg2, ...): ...

The @classmethod form is a function see the description of function definitions infor details.

It can be called either on the class (such as C.f() ) or on an instance (such as C().f() ). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, seein this section.

说明：

1. classmethod 是一个装饰器函数，用来标示一个方法为类方法

2. 类方法的第一个参数是类对象参数，在方法被调用的时候自动将类对象传入，参数名称约定为cls

3. 如果一个方法被标示为类方法，则该方法可被类对象调用(如 C.f())，也可以被类的实例对象调用(如 C().f())

>>> class C: @classmethod def f(cls,arg1): print(cls) print(arg1) >>> C.f('类对象调用类方法') <class '__main__.C'> 类对象调用类方法 >>> c = C() >>> c.f('类实例对象调用类方法') <class '__main__.C'> 类实例对象调用类方法

4. 类被继承后，子类也可以调用父类的类方法，但是第一个参数传入的是子类的类对象

>>> class D(C): pass >>> D.f("子类的类对象调用父类的类方法") <class '__main__.D'> 子类的类对象调用父类的类方法

↧

Opening the BBC micro:bit

October 22, 2016, 1:53 pm

≫ Next: 使用python实现短信PDU编码

≪ Previous: Python内置函数(11)――classmethod

As many of you will know, the PSF has been a partner in the British Broadcasting Corporation's (BBC) micro:bit project . A million devices capable of running Micropython have been distributed to every 11 and 12 year old in the UK. Those of you lucky enough to attend EuroPython and PyCon UK will have also been given a device to take home.

The PSF wouldn't be involved if the project were not open source, and it has always been the intention that all the software and hardware designs should be released under open licenses so that anyone can recreate the project themselves .

We're very pleased to continue our association with the project as a partner with the new MicroBit Foundation ~ a charity tasked to promote and develop the project now that the BBC is stepping away. (It was always the intention of the BBC to step back once the UK "drop" of devices was complete.)

A few days ago they revealed their website and the final piece of the jigsaw was revealed: the hardware schematics.

If you're interested in learning more, check out the hardware page , learn about MicroPython on the micro:bit, join the Slack channel and take a look around the wider project .

It's a very cool device and puts Python firmly in the world of embedded hardware and Internet of Things. It's also a great complementary device to the Raspberry Pi: the skills children learn on the micro:bit transfer to the Raspberry Pi and vice versa. That there is progression from complete beginner to professional software developer is one of Python's great strengths.

Python is for everyone, no matter their age or ability. Having open embedded hardware that runs MicroPython makes Python all the more available to enterprising people all over the world.

Have fun!

↧

使用python实现短信PDU编码

October 22, 2016, 4:36 pm

≫ Next: Tiling the plane withPillow

≪ Previous: Opening the BBC micro:bit

前几天入手一个3G模块，便倒腾了一下。需要发送中英文混合短信，所以采用PDU模式（不了解google ^_^）。

最大问题当然就是拼接PDU编码（python这么强大，说不定有模块），果不其然找到一个smspdu模块（链接：https://pypi.python.org/pypi/smspdu）。但是测试发现生成的编码和模块文档要求有差别的，泪奔……但还是可以看一下源码的实现方式的。剩下的就是自己加工一下。不罗嗦，上代码：

from smspdu import SMS_SUBMIT def format_message(phone_number, message_content): 　　tpdu = [] 　　if phone_number and message_content: 　　　　#+8613010112500为短信中心号码，通过AT指令可查询　　　　pdu = SMS_SUBMIT.create('+8613010112500', phone_number, message_content) 　　　　#00:设置使用默认短信中心号码，11:普通GSM格式，00:默认发送号码　　　　tpdu.append('001100') 　　　　#91:+8613000000000格式 81:13000000000格式　　　　formatAddress = pdu.encodeAddress().replace('0B91', '0B81') 　　　　tpdu.append(formatAddress) 　　　　#00:协议标识，00为普通GSM类型；18：编码方式为UCS2; 01：有效时间　　　　tpdu.append('000801') 　　　　#短信内容长度接短信内容unicode编码　　　　tpdu.append('%02X' % pdu.tp_udl) 　　　　tpdu.append(''.join(['%02X' % ord(c) for c in pdu.tp_ud])) 　　return ''.join(tpdu)

剩下的就是通过AT指令发送了。

这里是老瘦家的儿子，如需转载请声明，我替老瘦感谢你。

↧

Tiling the plane withPillow

October 22, 2016, 7:56 pm

≫ Next: Neutron 网络的计算资源隔离与通过 TRex 测试网络性能

≪ Previous: 使用python实现短信PDU编码

On a recent yak-shaving exercise, I’ve been playing with Pillow , an imaging library for python. I’ve been creating some simple graphics: a task for which I usually use PGF or Ti k Z , but those both require LaTeX. In this case, I didn’t have a pre-existing LaTeX installation, so I took the opportunity to try Pillow, which is just a single pip install .

Along the way, I had to create a regular tiling with Pillow. In mathematics, a tiling is any arrangement of shapes that completely covers the 2D plane (a flat canvas), without leaving any gaps. A regular tiling is one in which every shape is a regular polygon that is, a polygon in which every angle is equal, and every side has the samelength.

There are just three regular tilings of the plane: with squares, equilateral triangles, and regular hexagons. Here’s what they look like, courtesy of Wikipedia :

In this post, I’ll explain how I reproduced this effect with Pillow. This is a stepping stone for something bigger, which I’ll write about in a separatepost.

If you just want the code, it’s allin a script you candownload.

Coordinatesystems

To do any drawing, first we need to establish a coordinatesystem.

The usual ( x , y ) coordinate system has two perpendicular axes. There’s an origin at (0, 0), and values increase as you move bottom-to-top,left-to-right.

In Pillow, this is flipped vertically: an increase in the vertical axis means moving down, not up. The origin is the top left-hand corner of an image, and the image canvas sits below and to theright.

Practically speaking, this doesn’t change much but it’s worth noting the difference, or drawings can behave in a confusingmanner.

Drawing apolygon

Once you have a coordinate system, a polygon can be specified as a list of coordinate points: one for every vertex. This is a list of 2-tuples in Python, which looks very similar to mathematical notation. For example, arectangle:

rectangle = [(0, 0), (0, 30), (100, 30), (100, 0)]

In Pillow, an image is represented by an instance of the Image class . We can draw shapes on the image using the ImageDraw module , passing it a list of coordinate points. For example, to draw this rectangle on a blankcanvas:

from PIL import Image, ImageDraw # Create a blank 500x500 pixel image im = Image.new('L', size=(500, 500)) # Draw the square ImageDraw.draw(im).polygon(rectangle) # Save the image to disk im.save('rectangle.png')

We can call this draw(im) function as many times as we like. So if we had an iterable that gave us coordinates, we could draw multiple shapes on thecanvas:

for coords in coordinates: ImageDraw.draw(im).polygon(coords)

So now we need to write some code that provides us with thesecoordinates.

A squaregrid

Because a square corresponds so neatly to the coordinate system, it’s a good place to start. Let’s start by thinking about a single point ( x , y ): suppose this is the top left-hand corner of a unit square, and then we can write down the other three vertices of thesquare:

We can get these points ( x , y ) by iterating over the integer coordinates of the canvas, likeso:

def generate_unit_squares(image_width, image_height): """Generate coordinates for a tiling of unit squares.""" for x in range(image_width): for y in range(image_height): yield [(x, y), (x + 1, y), (x + 1, y + 1), (x, y + 1)]

I’m using yield to make this function into a generator . This allows me to efficiently compute all the coordinates required, even when I have many shapes. Iteration is a very powerful feature in Python, and if you’re not familiar with it, I recommend this old PyCon talk as a goodintroduction.

To create bigger squares, we scale the coordinates in bothdirections:

def generate_squares(image_width, image_height, side_length=1): """Generate coordinates for a tiling of squares.""" scaled_width = int(image_width / side_length) + 1 scaled_height = int(image_height / side_length) + 1 for coords in generate_unit_squares(scaled_width, scaled_height): yield [(x * side_length, y *

↧

Neutron 网络的计算资源隔离与通过 TRex 测试网络性能

October 22, 2016, 11:54 pm

≫ Next: 5 Things to Know Before Starting Algorithmic Trading

≪ Previous: Tiling the plane withPillow

好久发博客了，透个气，其实在公司博客写了不少，但短篇为主，今天写了一篇长篇，发出来。

对于 OpenStack 原生的 Neutron 网络，往往诟病良多，核心问题就在性能无法满足生产环境的要求。一般来说，实验室数据往往需要是需求数据的几倍才能平稳、无明显延时的应对用户在真实生产环节下的复杂网络流量。

Neutron 的网络构成非常复杂，以一个典型的 OpenStack 网络节点为例，所使用的核心组件包括但不限于下面的模块/软件：

linux Kernel 协议栈，包括系统软中断到完整的 TCP/IP 协议栈，用以收发报文、路由和策略路由等； iptables，作为内核模块，与 conntrack 共同完成带状态的 DNAT 与 SNAT，如果有安全策略，例如 FWaaS 或者 LBaaS 的安全组，则也需要 iptables 过滤报文； tc，用于收发报文的限速，例如浮动 IP 的限速，社区版代码并没有这部分，但如果厂商需要实现浮动 IP 限速，比较常见的方法是在这里做，此外如果有厂商希望提高 1:1 NAT 的性能，也可以通过 tc 实现，但比较少见； Openswan、strongSwan、Libreswan，一般厂商会从中选取认为稳定、可靠的 Driver 提供简单的 IPSec VPN 功能； Open vSwitch，作为虚拟交换机，要实现基本的二层交换功能，同时提供诸如流表、ARP 代答、二层标签替换等功能，如果想实现相对简单的流控功能，也可以通过 Open vSwitch，但目前应用厂商也较少； VXLAN 模块，如果使用 VXLAN 网络，搭配内核 Datapath 的 Open vSwitch，则需要内核的 VXLAN 模块解析和封装外边的 VXLAN 报文； HAProxy，作为 OpenStack 默认的 LBaaS Plugin（假定用户没有使用 Octavia，毕竟还比较新），为用户提供负载均衡功能； Dnsmasq，作为 OpenStack 默认的 DHCP 的 Driver，为用户提供 DHCP Server 功能，还可以提供 DNS 查询；网卡驱动，包括物理网卡驱动例如 ixgbe，也包括虚拟网卡驱动例如 vport-internal，当然 internal 接口实际上是 Open vSwitch 代码的一部分。
Neutron 网络的计算资源隔离与通过 TRex 测试网络性能

（没有时间画一个很完整的图，这个图大概表现了内核网络的大概过程，可以参考下）

这么多组件、模块，集合在一台服务器上，调优的难度相当大。我们如果仔细考察整个过程对计算资源的使用（内存资源在 x86 中往往比较宽裕），其中协议栈部分和协议栈相关软件依赖 Namespace 隔离（然而，内核里的很多计算资源使用是无法隔离的），网卡的 Soft IRQ 其实是共用资源，只能通过哈希尽量分散对 CPU 资源的使用，Open vSwitch 的 Datapath 和 vswitchd 也很难说对用户做完整的资源隔离，因此总体来看，网络节点通过大量手段完成了网络通信的隔离（二层通信依赖 VLAN/VXLAN，三层以上依赖 Namespace），但是 IO 对计算资源的使用依然是一个很难解决的待处理难题。

一种思路是通过对网络通信的限制完成隔离，例如限制网络带宽和 PPS，其中限制带宽是一种比较不太有效的方法，例如我们允许虚拟机对外访问使用 1Gbps 带宽，相当于虚拟的一个千兆网卡，此时虚拟机小包线速可以发到 1.4Mpps，依然是一个很高的数字。在例如从外边收到攻击或者收到海量的报文，如果到 namespace 下的 tc 才做 QoS，很有可能 Open vSwitch 和网卡 softirq 都会瞬间跑满，以至一个用户影响所有用户。此时如果通过 Open vSwitch 控制带宽，效果会比 tc 好一些，如果使用 Flow director，效果可能会更好。

另一种思路是通过强隔离的手段分离用户对资源的使用，例如 NFV 思路，将网络功能封装在虚拟机里，这样资源隔离完全通过 Hypervisor 实现，设置 Flavor 和隔离资源都可以达到很好的效果，但目前 Neutron 社区的进展并不很突出。

无论哪种思路，大多都离不开 Open vSwitch，如何很好的评估 Open vSwitch 的性能、评估整个 Neutron 网络的性能因此成为一个对厂商比较困扰的问题。

简单来讲，评估测试的手段有以下这些（分类不完全科学，以尽量表现特点为主）：

依赖内核协议栈的打流工具，如 iPerf、netperf 等；依赖内核的打包工具，如 pktgen、hping、nping 等；专业的测试仪表，例如思博伦、IXIA 等；基于 dpdk 的打包工具如 dpdk-pktgen、moongen、trex 等。

其中：

1 的性能较弱，定制流的能力较差，难以反映准确结果；

2 的灵活性比较一般，难以超出其本身 scpoe 灵活打流，统计功能一般，性能也无法达到专业需求；

3 在各方面都是最佳之选，无奈价格昂贵，一般厂商难以承担，而且测试仪使用繁琐，效率不高；

4 我们测试过 moongen 和 trex，moongen 在测试过程中经常出现流打不稳的问题，而且统计功能一般，最终我们选择了 trex，并已经投入生产研发。

trex 的安装、编译非常简单，文档也比较全面，这里直接介绍两个简单的 use case：

通过 trex 测试 Open vSwitch 的 VXLAN 二层性能；通过 trex 测试 VXLAN 拓扑下内核路由性能。

其中第一个我们可以这样测试：

其中呢，我们在 Trex 绑定两块网卡，Trex 所在服务器（TG）通过交换机连接到待测服务器（DUT），在 DUT 上配置 IP 地址 10.0.109.171/24，并在 Open vSwitch 上建立两条到 TG 的 VXLAN 隧道，Switch 上和 DUT 上配置 10.0.109.67 和 10.0.109.68 的静态 ARP（交换机上添加 CAM 即可），以避免内核协议栈发送 ARP 请求和交换机泛洪。图中实线表示实际物理连线，虚线表示隧道，虚线平行四边线表示这个 IP 通过静态 ARP 配置，并不是真实配置的 IP 地址，实线平行四边形表示配置在网口上的 IP 地址。

然后在 Open vSwitch 上添加流表，将 vxlan-port-1 口的流量直接转发给 vxlan-port-2，DUT 配置为 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz，Intel 82599ES，调整 pstate 为性能，开启超线程，将网卡 IRQ 绑定在本地 Node 上，开启 UDP RSS 的四元组哈希。

Trex 支持 Stateful 发包和 Stateless 发包，在这里我们使用后者。其内置 Scapy 版本为 2.3.1，这个版本没有直接支持 VXLAN，有两种处理方法，一是自己在发包脚本中修改，类似官方的实例：stl/udp_1pkt_vxlan.py，另一种方法是直接修改 scapy 的代码，为了方便起见，我们采用了后者：

创建一个文件：external_libs/scapy-2.3.1/python2/scapy/contrib/vxlan.py，内容如下：

from scapy.packetimport Packet, bind_layers from scapy.layers.l2import Ether from scapy.layers.inetimport UDP from scapy.fieldsimport FlagsField, XByteField, ThreeBytesField _VXLAN_FLAGS = ['R' for i in range(0, 24)] + ['R', 'R', 'R', 'I', 'R', 'R', 'R', 'R', 'R'] class VXLAN(Packet): name = "VXLAN" fields_desc = [FlagsField("flags", 0x08000000, 32, _VXLAN_FLAGS), ThreeBytesField("vni", 0), XByteField("reserved", 0x00)] def mysummary(self): return self.sprintf("VXLAN (vni=%VXLAN.vni%)") bind_layers(UDP, VXLAN, dport=4789) bind_layers(VXLAN, Ether) fromscapy.configimportconf fromscapy.errorimportlog_loading importlogging log = logging.getLogger("scapy.loading") def_import_star(m): mod = __import__(m, globals(), locals()) for k,v in mod.__dict__.iteritems(): globals()[k] = v _contrib_modules_ = ["bgp","igmp","igmpv3","ldp","mpls","ospf","ripng","rsvp", "vxlan"] for _lin _contrib_modules_: log_loading.debug("Loading module %s" % _l) try: _import_star(_l) exceptException,e: log.warning("can't import module %s: %s" % (_l,e))

最后在external_libs/scapy-2.3.1/python2/scapy/all.py 添加：此外再创建文件 external_libs/scapy-2.3.1/python2/scapy/contrib/ all.py：

from contrib.all import *

下面我们就可以发送 VXLAN 报文了，以我们的测试代码为例：

from trex_stl_lib.apiimport * class STLS1(object): def create_stream (self): base_pkt= Ether(dst="ec:f4:bb:d9:82:8a")/IP(src="10.0.109.67",dst="10.0.109.171")/UDP(dport=4789)/VXLAN(vni=0)/Ether(src="ba:2e:d9:67:96:7f",dst="00:00:00:00:00:11")/IP(dst="10.10.20.20")/ICMP(id=1963)/(10*'a') vm = STLScVmRaw( [ STLVmFlowVar(name="src_port", min_value=32768, max_value=33000, size=2, op="random"), STLVmWrFlowVar(fv_name="src_port", pkt_offset= "UDP.sport"), # change udp port STLVmFlowVar(name="vxlan_ip_src", min_value="10.10.10.101", max_value="10.10.10.254", size=4, op="random"), STLVmWrFlowVar(fv_name="vxlan_ip_src", pkt_offset= 76), # change inner vxlan src ip STLVmFixIpv4(offset = "IP:1"), # fix vxlan inner ip checksum STLVmFixChecksumHw(l3_offset = "IP", l4_offset = "UDP", l4_type= CTRexVmInsFixHwCs.L4_TYPE_UDP )# fix outer checksum ] ) pkt = STLPktBuilder(pkt = base_pkt, vm = vm) return STLStream(packet = pkt, random_seed = 0x1234,# can be remove. will give the same random value any run mode = STLTXCont()) def get_streams (self, direction = 0, **kwargs): # create 1 stream return [ self.create_stream() ] # dynamic load - used for trex console or simulator def register(): return STLS1()

Base 报文构造需要看具体的测试场景，这的注意是为了方便 DUT 哈希，我们通过外层报文的 UDP 原端口和内层报文的源 IP 构造了多流，但修改报文后要记得恢复 Checksum，特别是 VXLAN 报文，需要保证内层报文的三、四层校验值、外层报文的三、四层校验值。

此时我们已经可以用 trex 发包了，注意发送 statless 报文时，需要先使用 interactive 模式打开 t-rex，再通过 trex-console 连接下发命令，当我们将发送速度限制在 3.2Mpps 时可以看到丢包率为 0，且较为平稳，再提升将会出现丢包：

上图为 trex 的统计数据，可以看到 trex 实际使用的资源很少，而且此时发包 pps 为 3.21 Mpps，S伟

↧

5 Things to Know Before Starting Algorithmic Trading

October 22, 2016, 11:52 pm

≫ Next: 使用 redis-rdb-tools 解析 reids dump.rdb 文件及分析内存使用量

≪ Previous: Neutron 网络的计算资源隔离与通过 TRex 测试网络性能

5 Things to know before starting Algorithmic Trading

By Milind Paradkar

With more than 70% of the trading volumes in the US markets being automated, the rise of the algorithms seem more inevitable than ever before. The mechanical jobs are shifting to computers and only those who can tame the machines can rule the trade markets. Equipping oneself with the skills of Algorithmic trading is one of the best ways prepare for the changing face of financial markets.

As we inch closer towards November 3 2016, that day when you will be attending Quantinsti’s informative session on Algorithmic Trading, we thought we should give you a pre-requisite insight into essentials of Algo and Quantitative trading. We hope you have already registered for the session, elseregister now before the limited spots vanish.

One of the recent trends in markets has been the emergence of DIY traders. By day, they do their regular jobs and by night they run their algorithmic trading strategies after putting their children to sleep. This article is specially aimed at those who want to learn algorithmic trading and wish to set up their own trading system. Your success as an algorithmic trader is determined not only by your quantitative skills but also depends on a large extent to the process and the tools you select for analyzing, devising, and executing your strategies. Let’s get acquainted with the tools required for the trade!

1. Data is everything (well almost!)

The first and perhaps the most important aspect of algo trading is data. Data is an algorithmic trader’s best friend. A trader needs to have access to data for the respective segments of the exchange that he intends to trade in. How does this data originate in the first place? Let us take the case of an emerging market’s exchange:

The National Stock Exchange of India Limited (NSE)

NSE provides market quotes and data for Capital Market Segment (CM), Futures and Options Segment (F&O), Wholesale Debt Market Segment (WDM), Securities Lending & Borrowing Market (SLBM), Currency Derivative Market Segment (CDS) and Corporate Data.

These quotes are provided by DotEx International Ltd., a 100% subsidiary of NSE dedicated solely for this purpose. It broadcasts real time data to various information agencies. NSE provides the 5 different types of data products viz.

Real Time Data (Level1, Level 2, Level 3, and tick by tick data) Snapshot Data End of Day (EOD) Data Corporate Data Historical Data
5 Things to Know Before Starting Algorithmic Trading

Source: www.nseindia.com

Now let us try to understand level 1, level 2, level 3, and Tick-By-Tick (TBT) data.

Level 1 data includes the Best Bid and Best Ask, plus the Bid Size and the Ask Size. Level 2 provides market depth data upto 5 best bid and ask prices and Level 3 provides market depth data upto 20 best bid and ask prices. Tick-By-Tick (TBT) data includes each and every order or a change in the order.

Level 2 data example NSE:YESBANK

For new traders, level 1 data is sufficient enough for analyzing price charts, devising strategies and to arrive at trading decisions. Other types of data are generally used by experienced traders and high frequency trading firms/institutions.

NSE provides data to the authorized datavendors (List of Authorized Data Vendors/Redistributors [ 1 ] ) which in turn redistribute the data to trading firms and retail traders. Some of the datavendors for the Indian markets include: eSignal globaldatafeeds iCharts ValveNet

Some datavendors provide datafeed only, while some others provide charting platform and other analytics for creating watchlists, tracking different markets, strategy development, generating buy/sell signals etc. A trader can connect the platform with his broker’s platform via a bridge, and have the orders executed. Datavendors usually list the broker partners on their websites, and also the compatibility of their feed with different charting platforms.

eSignal

Let us take the example of eSignal to list some of the services provided by such datavendors. eSignal is a leading global datavendor which offers three main products

SIGNATURE CLASSIC ELITE

SIGNATURE is the most popular one, and some of its important features include:

Streaming Real-Time Data Advanced Charting with customizable Studies Stocks, Futures, Forex and Options Back-testing Download Data using Qlink or RTD 1 year Intra-day Historical Data News, Commentary and Research

Apart from the trading platform, eSignal also offers QLink service that makes it quick and simple to download real-time, streaming data into your Excel worksheets. Traders can perform further analysis and build strategies in excel using worksheet functions/macros, and have them executed via Excel API.

2. Charting Platforms

As a trader you must acquaint yourself with different charting techniques and chart based strategies that can be profitably applied in the markets. There are many charting platforms available with advanced charting features and analytics. Some popular charting platforms among traders include:

NinjaTrader TradeStation MetaStock AmiBroker eSignal

Features offered by these platforms include real-time scanning, number of technical indicators, expert advisors, backtesting, company fundamentals, news services, placing trades automatically, forecasting, level 2 data etc. A trader should choose a platform based on his trading style, features and pricing.

Let us take the example of MetaStock to list some of the features of charting platforms. MetaStock is a very popular platform and offers solutions for individual end of day traders, real time traders, and FOREX traders. The basket of products offered includes:

METASTOCK Real Time METASTOCK XENITH METASTOCK Daily Charts DataLink Third Party add-ons

Features of METASTOCK Real Time

Markets Explorer Scan across markets and securities Enhanced System Tester to test your trading ideas Indicators & Trading Systems comprehensive collection of indicators Expert advisor expert inputs of industry professionals Forecaster tool to view probable Future Prices

Most of these charting platforms offer a trial period which can be used by a trader to assess whether the platform would fulfill his trading needs. Before subscribing to a platform it is also vital that a trader understands the pricing policy, as these platforms in addition to the software charges also charge for datafeed, exchange fees, and for third-party add-ons separately.

3. It is all about Programming, baby
5 Things to Know Before Starting Algorithmic Trading

Algorithmic trading involves devising & coding strategies by analyzing the historical/real-time data which is procured from the datavendors. Some of the trading platforms mentioned above have their own scripting language which can be used for coding & backtesting strategies in the platforms itself.

When Van Rossum started working on python to keep himself occupied during his Christmas week, he wanted to make an interpreter that would appeal to Unix and C hackers. However, today Python is one of the most appealing languages for algorithmic traders all over the world. The reason is very simple and can be foundhere.

Using languages like Python, Java and Matlabfor trading on trading platformsis a method which is extensively used by algorithmic traders. There are hundreds of external analytical packages that can be used in these languages which aid in developing various trading strategies like momentum based, mean reverting, scalping, strategies based on machine learning algorithms, sentiment based strategies etc. We use external wrappersto implement codes written by us into the trading platform.We have talked about using two such wrappers which can be used implement algorithmic trading strategies in Python on Interactive Brokers in our articles onIBPy andIBridgePy.

Hence, as a trader it is vital to have a sound programming knowledge to trade successfully in the markets. QuantInsti’sEPAT course includes Python, R, and MATLAB wherein the students not only learn the basics of programming, but also learn to devise different strategies for different markets using these languages.

4. Brokers Brokers Brokers

The next aspect in algorithmic trading is choosing the right broker. Considerations that go into choosing the right broker include:

Speed and reliability of the trading platform Segments offered Brokerage Leverage and the margin requirements Compatibility of charting softwares with the broker’s platform Gateway api’s offered by the broker.

Some of the popular brokers for the Indian markets include:

Interactive Brokers MasterTrust IIFL OptionsHouse Lightspeed trading Oanda Zerodha

We have gone into great detail about trading platforms available in Indiainthis article.

As an algorithmic trader who wants to automate the trading process you can execute your strategies in live markets via charting platforms that connect to your broker or through the gateway API’s offered. The available API’s are usually listed by the broker on their websites.

Some brokers like Zerodha offer platforms which are a set of simple HTTP APIs built on top of their exchange-approved web based trading platform. This enables users to gain programmatic access to data such as profile and funds information, order history, positions, live quotes etc. In addition, it enables users to place orders and manage portfolio at their convenience using any programming language of their choice (from excel VBAs to Python, Java, C#).

Thus for a prospective trader it is essential that he gets himself acquainted with the workings of an API and other relevant features offered by the broker’s platform.

5. A System to beat the heat of algorithmic trading

By now you must have realized that as an algorithmic trader you will be working with different applications (charting platforms/Programming tools/Broker terminal /News feed etc.), dealing with huge data for backtesting, and multi-tasking in live markets. So, it is essential to have the right computer system that fulfills all these needs without going on occasional breaks and strikes.

Afterall, that is the aim of automation, to get things done smoothly and quickly (and of course, devoid of emotions). Trading with a laptop is not reliable, and would limit your multi-tasking abilities. Therefore, it is advisable to use a high-end desktop system with multiple monitors for algorithmic trading.

You’d need reasonable desktop machines with fast processor, high RAM, multiple monitors with relevant graphic card(s), reliable motherboard, and ample storage space shall do. A trader can purchase the right system after researching on his requirements, or by consulting someone having a sound knowledge of computer hardware & technology.

Minimum Requirements

Processor: Intel Core2Duo 2.13Mhz

Operating system: windows7 Professional or Ubuntu x64 is preferable if R is required

RAM: 3gb DDR3

Next Step

This was just a prelude that we thought you should know before you attend our 3 rd November’s informative session on Algorithmic trading. In the session, you will not only get to hear everything that you should be knowing before venturing into algorithmic trading from one of the stalwarts of the industry, but also get a chance to interact with Nitish Khandelwal (Co-Founder of iRage). Reading this article will help you frame your doubts and questions. In case, you haven’t registered for the informative session, then please do so byclicking here.

↧

使用 redis-rdb-tools 解析 reids dump.rdb 文件及分析内存使用量

October 22, 2016, 11:51 pm

≫ Next: 鸭子类型

≪ Previous: 5 Things to Know Before Starting Algorithmic Trading

使用 redis-rdb-tools 解析 reids dump.rdb 文件及分析内存使用量

You are here:架构&实践 -性能和可伸缩性

Frank2015/02/13阅读: 1005评论: 0收藏: 0

服务性能 Redis

现在越来越多人开始使用 Redis 了，主要是因为它十分高效、性能强劲、扩展性好。先介绍几种分析 Redis 的工具！- Redis-samplerRedis-sampler 是 Redis 作者开发的工具，它通过采用的方法，能够让你了解到当前 Redis 中的数据的大致类型，数据及分布状况。- Redis-auditRedis-audit是一个脚本，通过它，我们可以知道每一类 key 对内存的使...

现在越来越多人开始使用 Redis 了，主要是因为它十分高效、性能强劲、扩展性好。先介绍几种分析 Redis 的工具！

- Redis-sampler

Redis-sampler 是 Redis 作者开发的工具，它通过采用的方法，能够让你了解到当前 Redis 中的数据的大致类型，数据及分布状况。

- Redis-audit

Redis-audit是一个脚本，通过它，我们可以知道每一类 key 对内存的使用量。它可以提供的数据有：某一类 key 值的访问频率如何，有多少值设置了过期时间，某一类 key 值使用内存的大小，这很方便让我们能排查哪些 key 不常用或者压根不用。

- Redis-rdb-tools

Redis-rdb-tools 跟 Redis-audit 功能类似，不同的是它是通过对 rdb 文件进行分析来取得统计数据的。

我用的是 Redis-rdb-tools，环境是 windows（比 linux 安装部署麻烦多了）

1. 安装 Ptyhon

Redis-rdb-tools 是基于 python 语言的，所以要先安装 Python，选择的版本是 Python27（其中 Python32 有问题，不能解析 Redis-rdb-tools，望有搞定的同学多交流下）