Python 2.8?

January 23, 2017, 5:10 am

Benefits for LWN subscribers

The primary benefit fromsubscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jake Edge

January 11, 2017

The appearance of a "python2.8" got the attention of the Python core developers in early December. It is based on Python2.7, with features backported from Python3.x. In general, there was little support for the effort―core developers tend to clearly see Python3 as the way forward―but no opposition to it either. The Python license makes it clear that these kinds of efforts are legal and even encouraged―any real opposition to the project lies in its name.

Larry Hastingsalerted the python-dev mailing list about the Python2.8 project (which has since been renamed to "Placeholder" until another name can be found). It is a fork of Python2.7.12 with features likefunction annotations, yieldfrom , async / await , and the matrix multiplication operator ported from Python3. It is meant to be a drop-in replacement for Python2.7, so it won't have features that are incompatible with it. It is aimed at those who are not ready (or willing) to make the jump to Python3, but want some of the features from it.

The name "Python2.8" implies a level of support, though; it also uses a name (Python) that is trademarked by the Python Software Foundation . Steven D'Apranorecalled discussions at the time of the decision to stop Python2.x development at 2.7:

I seem to recall that when we discussed the future of Python2.x, and the decision that 2.7 would be the final version and there would be no 2.8, we reached a consensus that if anyone did backport Python3 features to a Python2 fork, they should not call it Python2.8 as that could mislead people into thinking it was officially supported.

He and others called for the project to be renamed. An issue was filed for the project suggesting a rename. As it turns out, the owner of the project, Naftali Harris, is amenable to the change , which simplifies things greatly. Had that not been the case, though, it is not entirely clear that the PSF Trademark Usage Policy precludes using the name "Python" that way.

David Mertz, who is a member of the PSF Trademarks committee,believes that "Python2.8" would be a misuse of the trademark and referred it to the committee. Terry Reedyagreed, saying that the project was a "derived work" and that clause7 of the Python License does not automatically allow the use of the PSF trademarks.

But Marc-Andre Lemburgnoted that the trademark policy is seemingly written to allow for uses like this. The policy says:

[...] stating accurately that software is written in the Python programming language, that it is compatible with the Python programming language, or that it contains the Python programming language, is always allowed. In those cases, you may use the word "Python" or the unaltered logos to indicate this, without our prior approval.

He pointed out that the project also fulfilled the license requirements by listing the differences from 2.7.12 as is required in clause3. But he agreed that a name change should be requested. For his part, Guido van Rossum is not particularly concerned by the existence of the project:

While I think the name is misleading and in violation of PSF policy and/or license, I am not too worried about this. I expect it will be tough to port libraries from Python3 reliably because it is not true Python3 (e.g. str/bytes). So then it's just a toy. Who cares about having 'async def' if there's no backport of asyncio?

Mertz, however isnot so sure. The existence of a "Python2.8" may " serve as a pretext for managers to drag their feet further on migration plans ", which will be detrimental to organizations where that happens. PEP404 (the " Python2.8 Un-release Schedule ") makes it quite clear that the core development team (and, presumably, the PSF) is resolute about a 2.8 release: " There never will be an official Python2.8 release. It is an ex-release. "

But there are various other projects that have "Python" in their names (IronPython, ActivePython, MicroPython, etc.) as well as projects with names that are suggestive of Python without directly using the name (Jython, PyPy, Cython, Mython, and so on). Where is the line to be drawn? As with all trademark questions, though, it comes down to a question of user confusion: will users expect that something called "Python2.8" is officially endorsed and supported by the PSF? The answer would seem to clearly be "yes".

Luckily, everyone is being fairly reasonable―no legal action has been needed or even really considered. The fact that Harris was willing to change the name obviated any need to resort to legal remedies. The GitHub issue thread is full of suggestions for alternate names, replete with Monty Python references―our communities love to bikeshed about names. There are also some snide comments about Python3 and the like, but overall the thread was constructive.

As far as new names go, an early favorite was "Pythonesque", but calling the binary " pesque " reminded some of the word "pesky", which is not quite what Harris is after (though " pyesque " might work). He renamed the project to "Placeholder" on December12 " while we find a good permanent name that I like and that works for the PSF ". The current leader appears to be Pyvergent (since Mython already exists and one might guess that Harris is not entirely serious about Placeholder). In any case, he said, the decision does not need to be made immediately.

At this point, Placeholder appears to largely be a one-developer project. Its GitHub history starts in October 2016 and some real progress has seemingly been made; quite a few features have been ported from Python3. The issues list shows some ambitious plans that might make it less of a "toy" than Van Rossum envisioned. If it ends up being popular and attracting more of a community, it could perhaps become a strong player in the Python world.

There is a balance to be struck on trademark policies for free-software projects. As we saw in the Debian-Mozilla trademark conflict, which resulted in the "I

↧

Python3基础函数的参数是可变参数，将传进来的参数转成列表

January 23, 2017, 5:09 am

≫ Next: python之路 - 函数与模块2

≪ Previous: Python 2.8?

镇场诗：

诚听如来语，顿舍世间名与利。愿做地藏徒，广演是经阎浮提。

愿尽吾所学，成就一良心博客。愿诸后来人，重现智慧清净体。

――――――――――――――――――――――――――――――――――――――――――

code:

#不定长度的参数 def myFun(* argments) : values=[x for x in argments] print(values) myFun(1,2,3,4,6)

result:

============= RESTART: C:/Users/Administrator/Desktop/MyCode.py ============= [1, 2, 3, 4, 6] >>>

――――――――――――――――――――――――――――――――――――――――――

博文的精髓，在技术部分，更在镇场一诗。python版本3.5,系统 windows7。

Python是优秀的语言，值得努力学习。我是跟着小甲鱼视频教程学习的，推荐。

我是一个新手，所以如果博文的内容有可以改进的地方，甚至有错误的地方，请留下评论，我一定努力改正，争取成就一个良心博客。

注：此文仅作为科研学习，如果我无意中侵犯了您的权益，请务必及时告知，我会做出改正。

↧

python之路 - 函数与模块2

January 23, 2017, 5:08 am

≫ Next: Announcing Pipenv

≪ Previous: Python3基础函数的参数是可变参数，将传进来的参数转成列表

1.列表生成式

1 list1 = [x*2 for x in range(1,11)] View Code

可以用列表生成式快速生成一个列表，但是受到内存限制，列表的容量是有限的，一个列表中假如有100万个元素，我们只使用了其中的几个，不仅占用大量内存，绝大多数空间是被浪费掉了

如果列表中的元素可以按照某种算法推算出来，那我们可以在循环中不断推算出后面的结果，这样就不必创建列表和浪费大量内存空间了

2.生成器

在python中一边循环一边计算的方式叫generator生成器

要创建一个generator，有很多种方法，第一种方法很简单，只要把一个列表生成式的 [] 改成 () ，就创建了一个generator 1 list1 = [x*2 for x in range(1,11)] 2 3 g = (x*2 for x in range(1,11)) View Code

g就是一个生成器

如何打印生成器的每个元素呢

1 g.__next__() 2 或者 3 next(g) View Code

generator保存的是算法，每次调用 next(g) ，就计算出 g 的下一个元素的值，直到计算到最后一个元素，没有更多的元素时，抛出 StopIteration 的错误

当然，上面这种不断调用 next(g) 实在是太变态了，正确的方法是使用 for 循环，因为generator也是可迭代对象

1 for i in g: 2 print (i) View Code

所以，我们创建了一个generator后，基本上永远不会调用 next() ，而是通过 for 循环来迭代它，并且不需要关心 StopIteration 的错误

generator非常强大。如果推算的算法比较复杂，用类似列表生成式的 for 循环无法实现的时候，还可以用函数来实现

比如，著名的斐波拉契数列（Fibonacci），除第一个和第二个数外，任意一个数都可由前两个数相加得到：

1, 1, 2, 3, 5, 8, 13, 21, 34, ...

斐波拉契数列用列表生成式写不出来，但是，用函数把它打印出来却很容易

1 def fib(max): 2 n,a,b = 0,1,1 3 while n<max: 4 print (b) 5 a,b = b,a+b 6 n = n + 1 7 return "done" View Code

注意一点，里面的赋值

a,b = b,a+b

实际上是 t = (b,a+b)

a = 1

b = 2

t = (2,3)

a,b = (2,3)

仔细观察，可以看出， fib 函数实际上是定义了斐波拉契数列的推算规则，可以从第一个元素开始，推算出后续任意的元素，这种逻辑其实非常类似generator。

也就是说，上面的函数和generator仅一步之遥。要把 fib 函数变成generator，只需要把 print(b) 改为 yield b 就可以了

1 def fib(max): 2 n,a,b = 0,1,1 3 while n<max: 4 yield b 5 a,b = b,a+b 6 n = n + 1 7 return "done" View Code

这就是定义generator的另一种方法。如果一个函数定义中包含 yield 关键字，那么这个函数就不再是一个普通函数，而是一个generator

这里，最难理解的就是generator和函数的执行流程不一样。函数是顺序执行，遇到 return 语句或者最后一行函数语句就返回。而变成generator的函数，在每次调用 next() 的时候执行，遇到 yield 语句返回，再次执行时从上次返回的 yield 语句处继续执行

for 循环调用generator时，发现拿不到generator的 return 语句的返回值。如果想要拿到返回值，必须捕获 StopIteration 错误，返回值包含在 StopIteration 的 value 中

1 g = fib(6) 2 while True: 3 try: 4 x = next(g) 5 print('g:', x) 6 except StopIteration as e: 7 print('Generator return value:', e.value) 8 break View Code

还可通过yield实现在单线程的情况下实现并发运算的效果

1 #!/usr/bin/env python 2 #_*_ coding:utf-8 _*_ 3 # Author: jiachen 4 5 import time 6 7 def consumer(name): 8 print ("%s 准备吃包子啦！" % name) 9 while True: 10 baozi = yield 11 print ("包子[%s]来了，被[%s]吃了" % (baozi,name)) 12 13 def producer(): 14 c1 = consumer("Jack") 15 c2 = consumer("Tom") 16 c1.__next__() 17 c2.__next__() 18 for i in range(1,11): 19 time.sleep(1) 20 print ("做好1个包子,分成两份！") 21 c1.send(i) 22 c2.send(i) 23 producer() View Code

↧

Announcing Pipenv

January 24, 2017, 12:39 pm

≫ Next: Developing Command Line Interpreters using python-cmd2

≪ Previous: python之路 - 函数与模块2

Pipenv is an experimental project that aims to bring the best of all packaging worlds to the python world. It harnesses Pipfile , pip, and virtualenv into one single toolchain. It features very pretty terminal colors.

It automatically creates and manages a virtualenv for your projects, as well as adds/removes packages from your Pipfile as you install/uninstall packages. The lock command generates a lockfile ( Pipfile.lock ).

Features Automatically finds your project home, recursively, by looking for a Pipfile. Automatically generates a Pipfile , if one doesn't exist. Automatically generates a Pipfile.lock , if one doesn't exist. Automatically creates a virtualenv in a standard location (project/.venv). Automatically adds packages to a Pipfile when they are installed. Automatically removes packages from a Pipfile when they are un-installed. Also automatically updates pip.

The main commands are install , uninstall , and lock , which generates a Pipfile.lock . These are intended to replace $ pip install usage, as well as manual virtualenv management.

Basic Concepts A virtualenv will automatically be created, when one doesn't exist. When no parameters are passed to install , all packages specified will be installed. When no parameters are passed to uninstall , all packages will be uninstalled. To initialize a Python 3 virtual environment, run $ pipenv --three first. To initialize a Python 2 virtual environment, run $ pipenv --two first. Otherwise, whatever $ which python will be the default. Other Commands shell will spawn a shell with the virtualenv activated. run will run a given command from the virtualenv, with any arguments forwarded (e.g. $ pipenv run python ). check asserts that PEP 508 requirements are being met by the current environment. Usage

↧

Developing Command Line Interpreters using python-cmd2

January 24, 2017, 12:38 pm

≫ Next: Python and successive approximation

≪ Previous: Announcing Pipenv

Developing Command Line Interpreters using python-cmd2

Many of you already know that I love command line applications. Let it be a simple command line tool, or something more complex with a full command line interface/interpreter (CLI) attached to it. Back in college days, I tried to write a few small applications in Java with broken implementations of CLI. Later when I started working with python, I wanted to implement CLI(s) for various projects. Python already has a few great modules in the standard library, but, I am going to talk about one external library which I prefer to use a lot. Sometimes even for fun :)

Welcome to python-cmd2

python-cmd2 is a Python module which is written on top of the cmd module of the standard library. It can be used as a drop-in replacement. Through out this tutorial, we will learn how to use it for simple applications.

Installation

You can install it using pip, or standard package managers.

$ pip install cmd2 $ sudo dnf install python3-cmd2 First application #!/usr/bin/env python3 from cmd2 import Cmd class REPL(Cmd): def __init__(self): Cmd.__init__(self) if __name__ == '__main__': app = REPL() app.cmdloop()

We created a class called REPL , and later called the cmdloop method from an object of the same class. This will give us a minimal CLI. We can type ! and then any bash command to execute. Below, I called the ls command. You can also start the Python interpreter by using py command.

$ python3 mycli.py (Cmd) (Cmd) !ls a_test.png badge.png main.py mycli.py (Cmd) py Python 3.5.2 (default, Sep 14 2016, 11:28:32) [GCC 6.2.1 20160901 (Red Hat 6.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. (REPL) py <command>: Executes a Python command. py: Enters interactive Python mode. End with ``Ctrl-D`` (Unix) / ``Ctrl-Z`` (windows), ``quit()``, '`exit()``. Non-python commands can be issued with ``cmd("your command")``. Run python code from external files with ``run("filename.py")`` >>> (Cmd)

You can press Ctrl+d to quit or use quit/exit commands.

Let us add some commands

But, before that, we should add a better prompt. We can have a different prompt by changing the prompt variable of the Cmd class. We can also add some banner by adding text to the intro variable.

#!/usr/bin/env python3 from cmd2 import Cmd class REPL(Cmd): prompt = "life> " intro = "Welcome to the real world!" def __init__(self): Cmd.__init__(self) if __name__ == '__main__': app = REPL() app.cmdloop() $ python3 mycli.py Welcome to the real world! life>

Any method inside our REPL class which starts with do_ will become a command in our tool. For example, we will add a loadaverage command to show the load average of our system. We will read /proc/loadavg file in our Linux computers to find this value.

#!/usr/bin/env python3 from cmd2 import Cmd class REPL(Cmd): prompt = "life> " intro = "Welcome to the real world!" def __init__(self): Cmd.__init__(self) def do_loadaverage(self, line): with open('/proc/loadavg') as fobj: data = fobj.read() print(data) if __name__ == '__main__': app = REPL() app.cmdloop()

The output looks like:

$ python3 mycli.py Welcome to the real world! life> loadaverage 0.42 0.23 0.24 1/1024 16516 life> loadaverage 0.39 0.23 0.24 1/1025 16517 life> loadaverage 0.39 0.23 0.24 1/1025 16517

If you do not know about the values in this file, the first three values indicate the CPU/IO utilization of the last one, five and ten minutes back. Then we have the number of currently running processes and the total number of processes. The final column shows the last process ID used. You can also see that TAB will autocomplete the command in our shell. We can go back to the past commands by pressing the arrow keys. We can also press Ctrl+r to do a reverse search like the standard bash shell. This feature comes from the readline module. We can use that more, and add a history file to our tool.

import os import atexit import readline from cmd2 import Cmd history_file = os.path.expanduser('~/.mycli_history') if not os.path.exists(history_file): with open(history_file, "w") as fobj: fobj.write("") readline.read_history_file(history_file) atexit.register(readline.write_history_file, history_file) class REPL(Cmd): prompt = "life> " intro = "Welcome to the real world!" def __init__(self): Cmd.__init__(self) def do_loadaverage(self, line): with open('/proc/loadavg') as fobj: data = fobj.read() print(data) if __name__ == '__main__': app = REPL() app.cmdloop() Taking input in the commands

We can use the positional argument in our do_ methods to have arguments in our commands. Whatever input you are passing to the command, comes to the line variable in our example. We can use the same to do anything. For example, we can take any URL as input, and then check the status. We will use requests module for this example. We also used the Cmd.colorize method to add colors to our output text. I have added one extra command to make the tool more useful.

#!/usr/bin/env python3 import os import atexit import readline import requests from cmd2 import Cmd history_file = os.path.expanduser('~/.mycli_history') if not os.path.exists(history_file): with open(history_file, "w") as fobj: fobj.write("") readline.read_history_file(history_file) atexit.register(readline.write_history_file, history_file) class REPL(Cmd): prompt = "life> " intro = "Welcome to the real world!" def __init__(self): Cmd.__init__(self) def do_loadaverage(self, line): with open('/proc/loadavg') as fobj: data = fobj.read() print(data) def do_status(self, line): if line: resp = requests.get(line) if resp.status_code == 200: print(self.colorize("200", "green")) else: print(self.colorize(str(resp.status_code), "red")) def do_alternativefacts(self, line): print(self.colorize("Lies! Pure lies, and more lies.", "red")) if __name__ == '__main__': app = REPL() app.cmdloop()

Building these little shells can be a lot of fun. The documentation has all the details, but, you should start reading from the standard lib cmd documentation . There is also the video from PyCon 2010.

↧

Python and successive approximation

January 24, 2017, 12:37 pm

≫ Next: Building and Parsing XML Document using Python

≪ Previous: Developing Command Line Interpreters using python-cmd2

I was doing some work in the yard and I wanted to know the smallest circle that would fit around a 4x6 inch post. Of course, just as a 2x4 is not 2 inches by 4 inches, a 4x6 post (what they call its "nominal" dimensions) is actually 3.5 inches by 5.5 inches, sothis smallest circle has as its diameter the diagonal of that 3.5x5.5-inch rectangle. Pythagoras tells us that will be the square root of the sum of 3.5 squared and 5.5 squared. At the time, I only had my cell phone, which has a calculator with only basic math operations, so how do you get the square root of 42.5? Successive approximation.

You make your initial guess, knowing that it is greater than 6 but less than 7, and try 6.2 maybe, then 6.3, so on. The number 6.5 is pretty close with a square of 42.25, so you go to the next decimal place. Finally, you get as far as 6.5192, and say you're close enough. You don't need a square root function on your calculator for this.

Some time ago it occurred to me that the long-hand way of "calculating" a square root is nothing more than this same method on paper, so I thought it would be interesting to teach python how to do this. Before I go any further, I would add that I am fully aware that the Python math module has a sqrt function for this, but that wouldn't be nearly as much fun as creating my own script.

My approach was to have the script ask for the value for which the square root is desired, then ask for the number of decimal places, and then go through the process of progressively approximating the square root, which is going to involve a lot of repetition, testing, and retesting. What I wanted then was some sort of incrementing loop, with a test to see if I was below or above the desired value. Here is the final result:

#!/usr/bin/env python # sqrt.py - find square root def sqtest(num, orig, ind): sq = 0 while sq <= orig: num = num + ind sq = num * num num = num - ind return num n = float(0) # this is our working number. We start at zero and increment. innum = input("For what number would you like to get the square root? ") i = input("To how many decimal places? ") inc = float(10) i += 1 if (innum < 0): i = 0 while i: inc = inc * 0.1 n = sqtest (n, innum, inc) i -= 1 if (innum >= 0): sq = n * n print "Approximate square root of " + str(innum)+ " is " + str(n) + "\n(Actual square of this would be "+str(sq)+")" else: print "This would be an imaginary number"

For now, skip over the indented section, called a function , which begins with def sqtest . You're going to tell Python to begin the quest at zero, because youmight at some point want the square root of a number between 0 and 1. After this, youask for the number for which youneed the square root, and then how many decimal places of precision youwant.

Although it looks like your initial increment is going to be 10, this will be refactored so youcan smoothly work with decimals later. Add 1 to the number of decimal places because you will have a cycle for units and above before you get to the decimals. Next, you check to see if the number input was less than zero, so you don't waste time on imaginary numbers.

Now comes the loop, where you send your working number n , your original number innum , and the current increment, initially 10 x 0.1,off to the sqtest function.

In sqtest , to which you send the values for your current working number, the number for which you are needing a square root, and the current incrementing value. After clearing any prior value, you check the square of the working number plus the new increment, and if it's lower than innum , increment until the square is greater than innum , at which point you back up one incremental notch, and send back the new working number. The while loop lower down now takes your last increment value and takes one-tenth of that, and you're off to sqtest again. You repeat until you've finished the requested number of decimal places.

Here is a sample output:

For what number would you like to get the square root? 28837465 To how many decimal places? 6 Approximate square root of 28837465 is 5370.052606 (Actual square of this would be 28837464.9912)

Notice that I not only display the square root, but also show what the actual square of that would be, for a better sense of how close I am. One thing to keep in mind is the uncertainty of the last digit. Because of the way this script works, this value is truncated at 6 decimal places, and not rounded, so youdon't know if the seventhdecimal place would be greater or less than 5. It's better to have more places than you think you need, so that any rounding you might decide to do is accurate.

You might think that starting from 0 and incrementing by 1 would be time-consuming for large numbers, but I think you'll be surprised how fast the interpreter runs here―the square root of this 8-digit number with 6 decimal places was pretty instantaneous. For real-world numbers, this is quite adequate, and even with 15-digit numbers plus 6 decimal places, it took perhaps 3 seconds. However, at some point, there might be issues with the number of significant digits.

↧

Building and Parsing XML Document using Python

January 24, 2017, 12:36 pm

≫ Next: Develop your Python applications easily in clean Docker environments

≪ Previous: Python and successive approximation

In this Blog Post, We will see how to create an XML document and parse XML Document using python.

Python supports to work with various forms of structured data markup. This includes modules to work with the Hypertext Markup Language (HTML), Extensible Markup Language (XML).

In addition to parsing capabilities, xml.etree.ElementTree supports creating well-formed XML documents from Element objects constructed in an application. The Element class used when a document is parsed also knows how to generate a serialized form of its contents, which can then be written to a file.

Creating an XML Document:

To create an element instance, use the Element constructor or the SubElement() factory function.

import xml.etree.ElementTree as xml
filename = "/home/abc/Desktop/test_xml.xml"
root = xml.Element("Users")
userelement = xml.Element("user")
root.append(userelement)

When you run this you can see the output as below:

Let us add user children

uid = xml.SubElement(userelement, "uid")
uid.text = "1" FirstName = xml.SubElement(userelement, "FirstName")
FirstName.text = "testuser" LastName = xml.SubElement(userelement, "LastName")
LastName.text = "testuser" Email = xml.SubElement(userelement, "Email")
Email.text = "testuser@test.com" state = xml.SubElement(userelement, "state")
state.text = "xyz" location = xml.SubElement(userelement, "location")
location.text = "abc" tree = xml.ElementTree(root)
with open(filename, "w") as fh:
tree.write(fh)

First we create the root element by using ElmeentTree’s Element function. Then we create an user element and append it to the root. Next we create SubElements by passing the user Element object (userelement) to SubElement along with a name, like "FirstName". Then for each SubElement, we set its text property to give it a value. At the end of the script, we create an ElementTree and use it to write the XML out to a file.

If you run this the reponse will be as following

<Users>
<user>
<uid>1</uid>
<FirstName>testuser</FirstName>
<LastName>testuser</LastName>
<Email>testuser@test.com</Email>
<state>xyz</state>
<location>abc</location>
</user>
</Users> Parsing the XML Document: import xml.etree.ElementTree as ET
tree = ET.parse('Your_XML_file_path')
root = tree.getroot()

here getroot() will return the root Element of the XML document.

<Users version="1.0" language="SPA">
<user>
<uid>1</uid>
<FirstName>testuser</FirstName>
<LastName>testuser</LastName>
<Email>testuser@test.com</Email>
<state>xyz</state>
<location>abc</location>
</user>
</Users>

Let us take above example for parsing.

here getroot() will return "Users" Element as "<Element 'Users' at 0x7f9095365d90>"

root.tag--> This returns only the Tag, Here in our Example "Users".

root.attrib--> This returns the attributes of the root element, In our Example "{'version': '1.0', 'language': 'SPA'}"

root.getchildren()--> This Returns the children elements of that root in an array.

In our Example: [<Element 'uid' at 0x7f0c307eee10>, <Element 'FirstName' at 0x7f0c307eeed0>, <Element 'LastName' at 0x7f0c307eef90>, <Element 'Email' at 0x7f0c307f0150>, <Element 'state' at 0x7f0c307f0190>, <Element 'location' at 0x7f0c307f01d0>].

From this you can read the text of the required Element and then insert in to your database.

↧

Develop your Python applications easily in clean Docker environments

January 24, 2017, 12:35 pm

≫ Next: Mathematical Modules in Python: Random

≪ Previous: Building and Parsing XML Document using Python

Cage

Develop and run your python applications in clean Docker environments Cage aims to be as easy to use and as familiar as possible.

Requirements Docker Python 3.5+ Installation $ pip install pycage Usage

NOTE: Docker should be running before using any of the Cage commands. All commands should be run from your project directory!

Create a new cage $ cage app:create <name_of_your_cage>

This command will create a new Dockerfile in the root your project and initialize all the necessary environment files.

Activate the new environment $ source <name_of_your_cage>/bin/activate

This command should be very familiar to virtualenv users. This adjusts your environment to make sure you use the caged python binaries.

Run your project (<name_of_your_cage>)$ python <file.py>

Running a script with the caged python binary will build a new Docker image with your project files, create a new container using that image and run the python command you specified.

Passing environment variables

You can pass environment variables to the cage by creating a ENV file in the root of your project. The file structure should be VAR=VALUE.

Example:

AVAR=value1 BVAR=value2

You can also place this file anywhere in your project. If it is not in the root of your project you can specify the path to it by passing the ENV variable when running a python script.

(<name_of_your_cage>)$ ENV=path/to/ENV python <file.py> Expose a TCP Port

To expose a TCP port from the cage, use the PORT environment variable defined in your ENV file.

Example:

PORT=5000

Specifying it in the ENV file will also make it available in the cage so you can bind your apps to it easily.

Working with requirements

The current version of Cage only supports dependencies written in a requirements file :

(<name_of_your_cage>)$ pip install -r requirements.txt

You cannot use any other pip commands with this version. This includes simple pip install commands like:

(<name_of_your_cage>)$ pip install <dependency> Stop a cage (<name_of_your_cage>)$ cage app:stop <name_of_your_cage> Deactivating the environment (<name_of_your_cage)$ deactivate

This will return your environment to the state it was in before activating the Cage environment.

Caveats THIS IS A WORK IN PROGRESS. DO NOT USE THIS IF YOU DON'T KNOW WHAT YOU ARE DOING You can only use pip with a requirements file. No other pip commands are supported You can only expose ONE TCP port from the container and it will be mapped to the same port number on the host Tested only on OSX and linux License

Cage is released under the MIT license. See LICENSE for details.

Contact

Follow me on twitter @mcostea

↧

Mathematical Modules in Python: Random

January 24, 2017, 12:34 pm

≫ Next: Python标准库笔记(1) ― string模块

≪ Previous: Develop your Python applications easily in clean Docker environments

Randomness is all around us. When you flip a coin or roll a die, you can never be sure of the final outcome. This unpredictability has a lot of applications like determining the winners of a lucky draw or generating test cases for an experiment with random values produced based on an algorithm.

Keeping this usefulness in mind, python has provided us with the random module. You can use it in games to spawn enemies randomly or to shuffle the elements in a list.

How Does Random Work?

Nearly all of the functions in this module depend on the basic random() function, which will generate a random float greater than or equal to zero and less than one. Python uses the Mersenne Twister to generate the floats. It produces 53-bit precision floats with a period of2**19937-1.It is actually the most widely used general-purpose pseudo-random number generator.

Sometimes, you want the random number generator to reproduce the sequence of numbers it created the first time. This can be achieved by providing the same seed value both times to the generator using the seed(s, version) function. If the parameter s is omitted, the generator will use the current system time to generate the numbers. Here is an example:

import random
random.seed(100)
random.random()
# returns 0.1456692551041303
random.random()
# returns 0.45492700451402135

Keep in mind that unlike a coin flip, the module generates pseudo-random numbers which are completely deterministic, so it is not suitable for cryptographic purposes.

Generating Random Integers

The module has two different functions for generating random integers. You can use randrange(a) to generate a random whole number smaller than a .

Similarly, you can use randrange(a, b[,step]) to generate a random number from range(a, b,step) . For example, using random.randrange(0, 100, 3) will only return those numbers between 0 and 100 which are also divisible by 3.

If you know both the lower and upper limit between which you want to generate the numbers, you can use a simpler and more intuitive function called randint(a, b) . It is simply an alias for randrange(a,b+1) .

import random
random.randrange(100)
# returns 65
random.randrange(100)
# returns 98
random.randrange(0, 100, 3)
# returns 33
random.randrange(0, 100, 3)
# returns 75
random.randint(1,6)
# returns 4
random.randint(1,6)
# returns 6 Functions for Sequences

To select a random element from a given non-empty sequence, you can use the choice(seq) function. With randint() , you are limited to a selection of numbers from a given range. The choice(seq) function allows you choose a number from any sequence you want.

Another good thing about this function is that it is not limited to just numbers. It can select any type of element randomly from a sequence. For example, the name of the winner of a lucky draw among five different people, provided as a string, can be determined using this function easily.

If you want to shuffle a sequence instead of selecting a random element from it, you can use the shuffle(seq) function. This will result in an in place shuffling of the sequence. For a sequence with just 10(n) elements, there can be a total3628800(n!) different arrangements. With a larger sequence, the number of possible permutations will be even higher―this implies that the function can never generate all the permutations of a large sequence.

Let's say you have to pick 50 students from a group of 100 students to go on a trip.

At this point, you may be tempted use the choice(seq) function. The problem is that you will have to call it about 50 times in the best case scenario where it does not choose the same student again.

A better solution is to use the sample(seq, k) function. It will return a list of k unique elements from the given sequence. The original sequence is left unchanged. The elements in the resulting list will be in selection order. If k is greater than the number of elements in the sequence itself, a ValueError will be raised.

import random
ids = [1, 8, 10, 12, 15, 17, 25]
random.choice(ids) # returns 8
random.choice(ids) # returns 15
names = ['Tom', 'Harry', 'Andrew', 'Robert']
random.choice(names) # returns Tom
random.choice(names) # returns Robert
random.shuffle(names)
names
# returns ['Robert', 'Andrew', 'Tom', 'Harry']
random.sample(names, 2)
# returns ['Andrew', 'Robert']
random.sample(names, 2)
# returns ['Tom', 'Robert']
names
# returns ['Robert', 'Andrew', 'Tom', 'Harry']

As you can see, shuffle(seq) modified the original list, but sample(seq, k) kept it intact.

Generating Random Floats

In this section, you will learn about functions that can be used to generate random numbers based on specific real-value distributions. The parameters of most of these functions are named after the corresponding variable in that distribution's actual equation.

When you just want a number between 0 and 1, you can use the random() function. If you want the number to be in a specific range, you can use the uniform(a, b) function with a and b as the lower and higher limits respectively.

Let's say you need to generate a random number between low and high such that it has a higher probability of lying in the vicinity of another number mode . You can do this with the triangular(low, high, mode) function. The low and high values will be 0 and 1 by default. Similarly, the mode value defaults to the mid-point of the low and high value, resulting in a symmetrical distribution.

There are a lot of other functions as well to generate random numbers based on different distributions. As an example, you can use normalvariate(mu, sigma) to generate a random number based on a normal distribution, with mu as mean and sigma as standard deviation.

import random
random.random()
# returns 0.8053547502449923
random.random()
# returns 0.05966180559620815
random.uniform(1, 20)
# returns 11.970525425108205
random.uniform(1, 20)
# returns 7.731292430291898
random.triangular(1, 100, 80)
# returns 42.328674062298816
random.triangular(1, 100, 80)
# returns 73.54693076132074 Weighted Probabilities

As we just saw, it is possible to generate random numbers with uniform distribution as well as triangular or normal distribution. Even in a finite range like 0 to 100, there are an infinite number of floats that can be generated. What if there is a finite set of elements and you want to add more weight to some specific values while selecting a random number? This situation is common in lottery systems where numbers with little reward are given a high weighting.

If it is acceptable for your application to have weights that are integer values, you can create a list of elements whose frequency depends on their weight. You can then use the choice(seq) function to select an elem

↧

Python标准库笔记(1) ― string模块

January 24, 2017, 12:33 pm

≫ Next: Reuven Lerner: Data science + machine learning + Python course in Shanghai

≪ Previous: Mathematical Modules in Python: Random

String模块包含大量实用常量和类，以及一些过时的遗留功能，并还可用作字符串操作。

1. 常用方法常用方法描述 str.capitalize() 把字符串的首字母大写 str.center(width) 将原字符串用空格填充成一个长度为width的字符串，原字符串内容居中 str.count(s) 返回字符串s在str中出现的次数 str.decode(encoding=’UTF-8’,errors=’strict’) 以指定编码格式解码字符串 str.encode(encoding=’UTF-8’,errors=’strict’) 以指定编码格式编码字符串 str.endswith(s) 判断字符串str是否以字符串s结尾 str.find(s) 返回字符串s在字符串str中的位置索引，没有则返回-1 str.index(s) 和find()方法一样，但是如果s不存在于str中则会抛出异常 str.isalnum() 如果str至少有一个字符并且都是字母或数字则返回True,否则返回False str.isalpha() 如果str至少有一个字符并且都是字母则返回True,否则返回False str.isdigit() 如果str只包含数字则返回 True 否则返回 False str.islower() 如果str存在区分大小写的字符，并且都是小写则返回True 否则返回False str.isspace() 如果str中只包含空格，则返回 True，否则返回 False str.istitle() 如果str是标题化的(单词首字母大写)则返回True，否则返回False str.isupper() 如果str存在区分大小写的字符，并且都是大写则返回True 否则返回False str.ljust(width) 返回一个原字符串左对齐的并使用空格填充至长度width的新字符串 str.lower() 转换str中所有大写字符为小写 str.lstrip() 去掉str左边的不可见字符 str.partition(s) 用s将str切分成三个值 str.replace(a, b) 将字符串str中的a替换成b str.rfind(s) 类似于 find()函数，不过是从右边开始查找 str.rindex(s) 类似于 index()，不过是从右边开始 str.rjust(width) 返回一个原字符串右对齐的并使用空格填充至长度width的新字符串 str.rpartition(s) 类似于 partition()函数,不过是从右边开始查找 str.rstrip() 去掉str右边的不可见字符 str.split(s) 以s为分隔符切片str str.splitlines() 按照行分隔，返回一个包含各行作为元素的列表 str.startswith(s) 检查字符串str是否是以s开头，是则返回True，否则返回False str.strip() 等于同时执行rstrip()和lstrip() str.title() 返回”标题化”的str,所有单词都是以大写开始，其余字母均为小写 str.upper() 返回str所有字符为大写的字符串 str.zfill(width) 返回长度为 width 的字符串，原字符串str右对齐，前面填充0 2.字符串常量常数含义 string.ascii_lowercase 小写字母’abcdefghijklmnopqrstuvwxyz’ string.ascii_uppercase 大写的字母’ABCDEFGHIJKLMNOPQRSTUVWXYZ’ string.ascii_letters ascii_lowercase和ascii_uppercase常量的连接串 string.digits 数字0到9的字符串:’0123456789’ string.hexdigits 字符串’0123456789abcdefABCDEF’ string.letters 字符串’abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’ string.lowercase 小写字母的字符串’abcdefghijklmnopqrstuvwxyz’ string.octdigits 字符串’01234567’ string.punctuation 所有标点字符 string.printable 可打印的字符的字符串。包含数字、字母、标点符号和空格 string.uppercase 大学字母的字符串’ABCDEFGHIJKLMNOPQRSTUVWXYZ’ string.whitespace 空白字符 ‘\t\n\x0b\x0c\r ‘ 3.字符串模板Template

通过string.Template可以为python定制字符串的替换标准,下面是具体列子：

>>>from string import Template
>>>s = Template('$who like $what')
>>>print s.substitute(who='i', what='python')
i like python
>>>print s.safe_substitute(who='i') # 缺少key时不会抛错
i like $what
>>>Template('${who}LikePython').substitute(who='I') # 在字符串内时使用{}
'ILikePython'

Template还有更加高级的用法，可以通过继承string.Template, 重写变量delimiter(定界符)和idpattern(替换格式), 定制不同形式的模板。

import string
template_text = '''
Delimiter : $de
Replaced : %with_underscore
Ingored : %notunderscored
'''
d = {'de': 'not replaced',
'with_underscore': 'replaced',
'notunderscored': 'not replaced'}
class MyTemplate(string.Template):
# 重写模板定界符(delimiter)为"%", 替换模式(idpattern)必须包含下划线(_)
delimiter = '%'
idpattern = '[a-z]+_[a-z]+'
print string.Template(template_text).safe_substitute(d) # 采用原来的Template渲染
print MyTemplate(template_text).safe_substitute(d) # 使用重写后的MyTemplate渲染

输出：

Delimiter : not replaced
Replaced : %with_underscore
Ingored : %notunderscored
Delimiter : $de
Replaced : replaced
Ingored : %notunderscored

可以看出原生的Template只会渲染界定符为$的情况，重写后的MyTemplate会渲染界定符为%且替换格式带有下划线的情况。

↧

Reuven Lerner: Data science + machine learning + Python course in Shanghai

January 24, 2017, 12:32 pm

≫ Next: PyTennessee: PyTN Profiles: Greg Back and TEDxNashville

≪ Previous: Python标准库笔记(1) ― string模块

Reuven Lerner: Data science + machine learning + Python course in Shanghai

Data science is changing our lives in dramatic ways, and just about every company out there wants to take advantage of the insights that we can gain from analyzing our data ― and then making predictions based on that analysis.

python is a leading language (and some would say the leading language) for data scientists. So it shouldn’t come as a surprise that in addition to teaching intro and advanced Python courses around the world, I’m increasingly also teaching courses in how to use Python for data science and machine learning. (Within the next few weeks, I expect to release a free e-mail course on how to use Pandas , an amazing Python library for manipulating data. I’ve already discussed it on mymailing list, with more discussion of the subject to come in the future.)

Next month (i.e., February 2017), I’ll be teaching a three-day course in the subject in Shanghai, China. The course will be in English (since my Chinese is not nearly good enough for giving lectures at this point), and will involve a lot of hands-on exercises, as well as lecture and discussion. And lots of bad jokes, of course.

Here’s the advertisement (in Chinese); if you’re interested in attending, contact me or the marketing folks at Trig, the company bringing me to China:

http://mp.weixin.qq.com/s/kNwRpwEdhwqjL22e4TdgLA

Can’t make it to Shanghai? That’s OK; maybe I can come to your city/country to teach! Just contact me atreuven@lerner.co.il, and we can chat about it in greater depth.

↧

PyTennessee: PyTN Profiles: Greg Back and TEDxNashville

January 24, 2017, 12:31 pm

≫ Next: Handling webhooks using Django and ngrok

≪ Previous: Reuven Lerner: Data science + machine learning + Python course in Shanghai

PyTennessee: PyTN Profiles: Greg Back and TEDxNashville

Speaker Profile: Greg Back( @gtback )

Greg is a deep and mysterious python programmer with no bio, but don’t let his mystic fool you! He’s a wicked sharp local Python developer who is very active in our community.

Greg will be presenting “What Time is it Anyway?” at 1:00PM Sunday (2/5) in the auditorium. This talk will provide an overview of dates, times, and timezones, and why they can cause problems in real life and in Python programs. Solutions and helpful hints will be provided that use both the standard library and third-party packages like pytz and python-dateutil.

Sponsor Profile: TEDxNashville ( @tedxnashville )

TEDxNashville has been the non-profit that has backed PyTennessee since we startedand provides countless helpful resources. Without them, we wouldn’t be able to have PyTN each year. They just announced astellar line-up of speakers for their 2017 event.

↧

Handling webhooks using Django and ngrok

January 24, 2017, 12:30 pm

≫ Next: PyTennessee: PyTN Profiles: Derik Pell and Meeple Mountain

≪ Previous: PyTennessee: PyTN Profiles: Greg Back and TEDxNashville

Handling webhooks using Django andngrok

In this article we’ll go over how to handle webhooks using Django, create a webhook in GitHub, and test the webhook on your local machine using ngrok. But first a brief primer on webhooks.

If you’re already familiar with webhooks then feel free to skip past this first part.

What are webhooks?

Imagine you are writing an app that needs to be informed when an event occurs in another system. The event could be when a user sends a tweet or when the price of an item changes.

One way to know when the event occurs is to check every so often. For instance, your app could make a request to Twitter every 5 minutes asking “Has user posted anything yet?” This is called polling, and it can be taxing on your servers because you must constantly make requests to external services.

Another way to know an event has occurred is to have the other service inform your app when things change. This can be accomplished using webhooks . With webhooks you no longer need to poll every 5 minutes or once a day. Instead, your app receives events in real-time .

Handling GitHubwebhooks

GitHub has a plethora of events that can trigger webhooks. The event we’ll handle is the default push event, which occurs when a user pushes commits, branches, or tags to a GitHub repository.

Let’s write some code that handles GitHub’s webhooks. We’re writing a Django app, so we’ll create a view function. Be sure to wire up this view to the URL /hooks/handle_github .

Below is a view function that will handle GitHub webhooks based on the instructions in GitHub’s documentation . For this to work, you’ll need to first add a GITHUB_WEBHOOK_SECRET to your settings file. Think of this as your webhook’s password, so make it a long string with lots of random characters. Also, remember it, because we’ll need it later.

Requests from GitHub come into our app through the handle_github_hook view function. The view ensures the request is authorized, loads the payload JSON, does something useful with the payload, and returns an HTTP response.

When writing your handler, keep in mind that GitHub expects you to respond to webhooks within 30 seconds . If the task you need to perform can happen quickly then do it synchronously. Otherwise it’s probably best to put the task in the background using Celery or RQ .

Now that we have code that handles webhooks, we need to test it.

Enter ngrok

Webhooks take some work to test locally. That’s because by their very nature they expect a publicly accessible URL to send requests to, and most of our development laptops don’t have that. Luckily there is a very easy way that we can create a public URL that leads right to our development server: ngrok .

Ngrok is a command line application you can use to expose your development machine to the Internet. To install ngrok, go to ngrok.io and follow their installation steps. It’s as simple as downloading and unzipping. I’ll wait while you go off and do that.

:musical_note: Jeopardy theme song:musical_note:

Is ngrok installed now? Great! To run it, open up your terminal and enter the following.

ngrok http 8000

This should start up a secure tunnel that is connected to your local HTTP port. It will look something like this:

ngrok by @inconshreveable (Ctrl+C to quit) Session Status online
Version 2.1.18
Region United States (us)
Web Interface http://127.0.0.1:4041
Forwarding http://dda5f8fd.ngrok.io -> localhost:8000
Forwarding https://dda5f8fd.ngrok.io -> localhost:8000 Connections ttl opn rt1 rt5 p50 p90 0 0 0.00 0.00 0.00 0.00

The forwarding URL http://dda5f8fd.ngrok.io is what I’ll use for the webhook. Your URL will be different, so use whatever ngrok provides.

Setting up ourwebhook

Now that we have code to handle webhooks and a publicly accessible URL, let’s set up a webhook in GitHub.

You can add a webhook to a repository programmatically using GitHub’s API. In fact, that’s what you should do to automate this whole process. In the spirit of brevity, however, we’ll add a webhook through the GitHub UI. To do that, go to one of your repositories in GitHub, select Settings, then Webhooks.

Add your ngrok URL + /hooks/handle_github to the Payload URL field. Next, add the secret string from your Django settings to the Secret field. GitHub will send along this secret string so that you can verify the request is really coming from them. Finally, choose the events you would like GitHub to notify your app about. When it’s all said and done, the form should look something like this:

Handling webhooks using Django and ngrok

Setting up our webhook inGitHub

Click the Add webhook button and your webhook is ready for action.

Testing itout

It’s finally time to confirm this whole thing is working. To do that, start up the development Django server by running python manage.py runserver . This should start your server on port 8000, which is the port ngrok expects.

Next we’ll need to trigger an event in GitHub. If you’re webhook is configured to handle the default push event then pushing a branch to GitHub will suffice.

Clone the repository where you created your webhook. For example:

$ git clone https://github.com/grantmcconnaughey/django-field-history.git

Now, create a new branch and push it back to GitHub.

$ git checkout -b webhook_test $ touch new_file.py $ git add new_file.py $ git commit -m "Testing webhooks" $ git push origin webhook_test This will trigger the push event and GitHub will make a request to the ngrok URL you entered in your repo’s settings. This means you should see some activit

↧

PyTennessee: PyTN Profiles: Derik Pell and Meeple Mountain

January 24, 2017, 12:29 pm

≫ Next: Mike Driscoll: PyDev of the Week: Mark Haase

≪ Previous: Handling webhooks using Django and ngrok

PyTennessee: PyTN Profiles: Derik Pell and Meeple Mountain

Speaker Profile: Derik Pell ( @_gignosko_ )

Derik got his Masters in Computer Science from The University of Illinois at Springfield where he was lucky enough to get to study data from many different angles. That work started him on a love affair with picking apart data while writing the best code he can.

He is happy to have lived through Java so he can now enjoy writing python at Emma in Nashville, TN, where he lives with his partner, his son, and the head of the household, a chihuahua named Kala.

Derik will be presenting “It’s time for Functional Programming! Maybe.” at 1:00PM Sunday (2/5) in Room 100. Functional programming is arguably the biggest buzzword in programming these days and with good reason since it can reduce lines of code while adding safety and efficiency. In this talk, we’ll get an understanding of the foundations of FP, look at what python does and does not have to help us out, and try to decide if and when it’s a good idea to use these FP concepts in a pythonic environment.

Sponsor Profile: Meeple Mountain ( @meeplemountain )

Meeple Mountain delivers board game reviews, board game news, editorial content and other articles relating to tabletop board games that are fun for adults and children alike. We also host a monthly board game night in Nashville with nearly 80 attendees. They are the driving force behind the PyTN TableTop Gaming After Party. Registerhere.

↧

Mike Driscoll: PyDev of the Week: Mark Haase

January 24, 2017, 12:28 pm

≫ Next: Seam carving with OpenCV, Python, and scikit-image

≪ Previous: PyTennessee: PyTN Profiles: Derik Pell and Meeple Mountain

This week we welcome Mark Haase as our PyDev of the Week. Mark is the author of PEP 505 None-aware operators . You can check out what projects Mark is interested in over on Github . He also has a programming blog that covers various programming topics. Let’s take some time getting to know Mark better!

Mike Driscoll: PyDev of the Week: Mark Haase

Can you tell us a little about yourself (hobbies, education, etc):

As a child, I always loved building things, like Legos. I sketched designs for other projects that I daydreamed about building ― a hang glider! ― but as a kid I was obviously limited in terms of skills, tools, and resources. Nobody was going to hand me an arc-welder, after all.

I started programming when I was about 12 or 13. I learned BASIC first, then Java a couple years later. Programming didn’t have the same limitations as physical things. I didn’t need a whole workshop with tools and materials, just a computer and a compiler. I minored in Comp Sci in college (a mistake, in retrospect, I should have majored in it!) and managed to find a job as a programmer when I graduated it. It’s been my career for 10 years now.

Why did you start using python?

I started working with a roommate on a pet project and he encouraged me to try Python. After several years working with some other, not-to-be-named web development languages, Python was a relief. It felt sensible. It felt like it had been carefully designed. It felt productive ― ideas could be turned into code easily. Python made programming fun again!

I switched jobs soon after that and had several new coworkers who liked Python, and Python has been a major part of my work ever since. As I learned more and more about Python, I became totally fascinated with the design and features of the language and ― equally important ― the culture and mindset that surrounds it.

What other programming languages do you know and which is your favorite?

Languages that I know well enough to do real work in? Perl, php, javascript, Dart, and Python. Python is my clear favorite, but using Dart as a JavaScript replacement has also been enjoyable and refreshing.

What projects are you working on now?

I work with a company called Hyperion Gray on a DARPA (a progenitor of the internet) research & development contract. We have a wide range of research, but my personal focus is combining web crawling and machine learning: building crawlers that are more intelligent about what links to follow, more resilient in the face of errors (crawling loops, soft 404s, etc.), and extracting the important content on each page. We love working with DARPA, and we are able to open source the majority of our work.

Which Python libraries are your favorite (core or 3rd party)?

Flask is the first major package I started using. I shopped around a bit and settled on Flask because it wasn’t monolithic and the documentation was fantastic. I’ve never regretted the choice: the more I learned about Flask, the more I enjoyed it. I’ve continued to use it on new projects over the last 5 years and it still feels like a “just right” tool.

Requests is another phenomenal library, because it wraps a chaotic and difficult core API with one of the most brilliant APIs I’ve ever seen: simple enough to memorize the basics but powerful enough to not limit what you can do with it.

There is also a cluster of numerical and scientific computing packages (NumPy, SciPy, etc.) that are making Python an important language to know for a wide range of data science applications. I can combine the productivity of working in a high-level language like Python with the performance characteristics of low-level, highly tuned code.

Where do you see Python going as a programming language?

I’ve always thought of Python as a glue language. In the web development world, performance critical code is handled by databases, web servers, message queues, etc. You can write your business logic in Python and pass data back and forth between those highly optimized, highly tuned components. I wouldn’t write a database server in Python, and I wouldn’t write my business logic in C++. Those are different tools that complement, not compete.

That same approach is taking Python into new areas. High performance machine learning and deep learning are going to be hugely important in the future of computing, and Python ― a notoriously “slow” language ― is a reasonable language for building deep learning systems. Why? Because the performance critical code is optimized, vectorized, and tuned in low level libraries like NumPy and Theano. A small amount of Python code can produce simple and amazing machine learning systems. This is a great example: http://neuralnetworksanddeeplearning.com/

I personally don’t feel a need to learn specialized, domain-specific programming languages, like Octave for linear algebra, or R for statistics. Python can do all these things. At work, we even create presentations in IPython notebooks!

What is your take on the current market for Python programmers?

I don’t know! The last time I applied for a job was about 5 years ago, which is also roughly the time when I started learning Python. Anecdotally, it seems like job listings are always dominated by whatever flavor-of-the-month technology is hot at the time. If I was looking for a job, I would feel more confident applying to any job with my 5 years of Python experience than with 6 months of . Python is so versatile (see my previous answer) that I assume Python will continue to be highly relevant in the coming years.

Thanks so much for doing the interview!

↧

Seam carving with OpenCV, Python, and scikit-image

January 24, 2017, 12:27 pm

≫ Next: Introducing Arcade and the ArcGIS Python API

≪ Previous: Mike Driscoll: PyDev of the Week: Mark Haase

Seam carving with OpenCV, Python, and scikit-image

Easily one of my all-time favorite papers in computer vision literature is Seam Carving for Content-Aware Image Resizing by Avidan and Shamir from Mitsubishi Electric Research Labs (MERL).

Originally published in the SIGGRAPH 2007 proceedings, I read this paper for the first time during my computational photography class as an undergraduate student.

This paper, along with thedemo video from the authors , made the algorithm feel like magic , especially to a student who was just getting his feet wet in the world of computer vision and image processing.

The seam carving algorithm works by finding connected pixels called seams with low energy (i.e., least important) that traverse the entire image from left-to-right or top-to-bottom.

These seams are then removed from the original image, allowing us to resize the image while preserving the most salient regions(the original algorithm also supports adding seams , allowing us to increase the image size as well).

In the remainder of today’s blog post I’ll discuss the seam carving algorithm, how it works, and how to apply seam carving using python, OpenCV, and sickit-image.

To learn more about this classic computer vision algorithm, just keep reading!

Looking for the source code to this post?

Jump right to the downloads section. Seam carving with OpenCV, Python, and scikit-image

The first part of this blog post will discuss what the seam carving algorithm is and why we may prefer to use it over traditional resizing methods.

From there I’ll demonstrate how to use seam carving using OpenCV, Python, and scikit-image.

Finally, I’ll wrap up this tutorial by providing a demonstrationof the seam carving algorithm in action.

The seam carving algorithm

Introduced by Avidan and Shimar in 2007, the seam carving algorithm is used to resize (both downsample and upsample) an image by removing/adding seams that have low energy .

Seams are defined as connected pixels that flow from left-to-right or top-to-bottom provided that they traverse the entire width/height of the image.

Thus, in order to perform seam carving we need two important inputs:

The original image. This is the input image that we want to resize. The energy map. We derive the energy map from the original image. The energy map should represent the most salient regions of the image. Typically, this is either the gradient magnitude representation (i.e., output of Sobel, Scharr, etc. operators), entropy maps, or saliency maps.

For example, let’s take a look at the following image:

Figure 1:Our input image to the seam carving algorithm [source: Wikipedia ].

Using this image as an input, wecan compute the gradient magnitude to serve as our energy map:

Figure 2:Computing the gradient magnitude representation of of the input image. This representation will serve as our energy map [source: Wikipedia ].

Given our energy map we can then generate a set of seams that either span the image from left-to-right or top-to-bottom:

Figure 3:Generating seams from the energy map. Low energy seams can be removed/duplicated to perform the actual resizing [source: Wikipedia ]. These seams are efficiently computed via dynamic programming and are sorted by their energy. Seams with low energy are placed at the front of the list while high energy seams

are placed at the back of the list.

To resize an image we either remove seams with low energy to downsample an image or we duplicate seams with low energyto upsample the image.

Below is an example of taking the original image, finding the seams with the lowest energy, and then removing them to reduce the final size of the output image:

Figure 4:Removing low energy seams from an image using the seam carving algorithm[source: Wikipedia ]. For more information on the seam carving algorithm, please see the original publication

Why use traditional seam carving over traditional resizing?

Keep in mind that the purpose of seam carving is to preserve the most salient (i.e., “interesting”) regions of an image while still resizing the image itself.

Using traditional methods for resizing changes the dimensions of the entire image― no care is taken to determine what part of the image is most or least important.

Seam carving instead applies heuristics/path finding derived from the energy map to determine which regions of the image can be removed/duplicated to ensure (1) all “interesting” regions of the image are preserved and (2) this is done in an aesthetically pleasing way.

Note:Preserving the most interesting regions of an image in an aesthetically pleasing manner is a lot harder than it sounds. While seam carving may seem like magic, it’s actually not ― and it has its limitations. See the “Summary” section for more information on these limitations.

To compare traditional resizing versus seam carving, consider the following input image:

Figure 5:An example image to resize.

This image has a width of 600 pixels and I would like to resize it to approximately 500 pixels.

Using traditional interpolation methods my resized image would look like this:

Figure 6:Resizing an image using traditional interpolation techniques. Notice how the height changes along with the width to retain the aspect aspect ratio.

However, by applying seam carving I can “shrink” the image along the horizontal dimension and still preserve the most interesting regions of the image without changing the image height:

Figure 7:Resizing the image using seam carving.

Utilizing seam carving in computer vision and image processing

In this section I’ll demonstrate how to use seam carving with OpenCV, Python, and scikit-image.

I’ll assume you already have OpenCV installed on your system ― if not, pleaserefer to this page where I provided res

↧

Introducing Arcade and the ArcGIS Python API

January 24, 2017, 12:26 pm

≫ Next: IronPython in Spotfire ― Turning lines & curves on and off

≪ Previous: Seam carving with OpenCV, Python, and scikit-image

ArcGIS 10.5 introduced a new scripting language for the whole ArcGIS platform, as well as a python API. Both are covered below.

The Arcade scripting language

Dynamic labeling and visualization has become a lot easier with the release of Arcade, a lightweight scripting language that lets users write, share and execute expressions. These expressions can be created through simple scripts with a scripting interface inside of an application or IDE. What makes Arcade unique is that the expressions are portable, so that they can be used through the whole ArcGIS Platform and not just one or two applications, such as ArcPy.

Arcade’s syntax looks similar to javascript and JavaScript developers will directly be at ease with it. In a similar fashion as JavaScript, Arcade enables you to declare variables, perform logical operations, take advantage of built-in functions, and write custom functions. Data is referred to through globals (short for global variables). These start with a dollar sign and represent features from a service or layer, and contain a geometry and set of attributes. Global variables enable you to perform simple calculations using field values at runtime, whereas until now you had to create new fields in an attribute table and use the Field Calculator, write an expression and populate the field values.

As this is the first release of Arcade, you can use it only with ArcGIS Pro, ArcGIS Online and through apps that use the JavaScript and Runtime SDKs. It will become available in more places in later releases. Esri explicitly stated that Arcade is not meant to replace Python now or in the future.

A New Python API

As promised, Esri released the first version of their ArcGIS Python API together with ArcGIS 10.5 late December 2016. This is a ‘pythonic’ API, which means it contains modules, classes, functions, and types for managing and working with the elements of a GIS information model. Every API provides an interface between computer systems, in this case the GIS user (the client) and a platform, being ArcGIS Portal or ArcGIS Online. In terms of software architecture, this API is implemented on top of the REST APIs of the Web GIS platform, but you use Python to connect and interact with the platform.

This API enables users to work with data from ArcGIS Online through different Python libraries such as Pandas, Numpy and the SciPy stack, in combination with the APIs own GIS capabilities. These far exceed the capabilities of Esris ArcPy package: the API has no less than thirteen different modules, grouped into different categories, covering everything from accessing datasets, data visualization, analysis as well as additional functionality for geospatial workflows.

The API is distributed using Conda, a modern package and environment management system for Python. Both Conda and the API use Python version 3. To get started with the API, you first need to install Conda and then the ArcGIS package that contains all the modules of the API, before you can access the ArcGIS Python API in a Python IDE of choice, or from a Jupyter Notebook environment. You do this by making a connection to ArcGIS Online or Portal with a URL and login credentials, using the GIS module and a public or organizational account.

More info on Arcade: https://developers.arcgis.com/arcade/

More info on the ArcGIS Python API: https://developers.arcgis.com/python/

↧

IronPython in Spotfire ― Turning lines & curves on and off

January 24, 2017, 12:25 pm

≫ Next: When and why to build your own data tools?

≪ Previous: Introducing Arcade and the ArcGIS Python API

IronPython in Spotfire ― Turning lines & curves on and off

Would you like to be able to turn lines and curves on and off with the click of a button? Have you tried to implement Ironpython code you found on blog posts but failed?

My IronPython skills are still a work in progress, and I struggle with it just as much as the average user. However, learning new things each and every day is one of the most fun and rewarding parts of my work. Some months/years ago, I stumbled upon two sets of code written to show or hide lines/curves. I knew they would be useful, so I saved them in a Word document. At least some of what I saved was generated from this Community post (thanks to Sean Riley) ― https://community.tibco.com/questions/python-turn-lines-curves-and .

When I attempted to implement the code in my own files, I ran into issues with the code not producing the desired result, so I set out to figure out what was holding it back. It turns out that the way the code is structured imposes limitations in terms of how many lines you can have on a visualization, whether the page can contain multiple visualizations, and how many lines you want to be able to turn on and off at a time. This is one of the reasons why users struggle to implement code they find online. In the end, I learned enough to produce a third set of code that marries the strengths of the first two and performs in any scenario. The learning experience is worth sharing.

Thus, this week, I will use thisv1.1 Show Hide LinesDXP file to present 3 sets of IronPython code that turn lines and curves on and off using different code structure. I will present the code and then describe what the code is doing, the advantages of the way it’s written, and then the limitations of the way it’s written. Again, I’m still learning IronPython so if you see something that could be improved, please leave a comment.

Code Set 1

from Spotfire.Dxp.Application.Visuals import LineChart

for fm in vis.As[LineChart]().FittingModels: if vis.As[LineChart]().FittingModels[0].Enabled == True: vis.As[LineChart]().FittingModels[0].Enabled = False

else:

vis.As[LineChart]().FittingModels[0].Enabled = True

Description: This particular piece of code is referencing FittingModels, which refers to any Line or Curve. It will turn any line/curve on and off, as long as there is only one line/curve. It requires the creation of a parameter called “vis”, which is then connected to a single visualization.

Advantages: Because the ‘vis’ parameter connects to a specific visualization, the user mayhave other line charts with lines and curves on the same page, and the button will still work.Also, if the line/curve is changed, say from a straight line fit to an average line, the button will still work without modification.

Restrictions: With this code,the user mayhave only one line/curve on a chart for it to work, andthat means only one line/curve setup, not just one line/curve checked in the properties dialog. I added a second line/curveto this chart, and it quit working, even if I unchecked the straight line fit. The second line/curve had to be completely deleted to get the code working again.

Code Set 2

from Spotfire.Dxp.Application.Visuals import *

for visual in Document.ActivePageReference.Visuals:

if visual.TypeId == VisualTypeIdentifiers.LineChart:

lc = visual.As[LineChart]()

for fm in lc.FittingModels:

if fm.TypeId.DisplayName == “Straight Line Fit”:

if fm.Line.DisplayName == “My line”:

fm.Enabled=not(fm.Enabled)

for fm in lc.FittingModels:

if fm.TypeId.DisplayName == “Line from Column Values”:

if fm.Line.DisplayName == “Type Curve A”:

fm.Enabled=not(fm.Enabled)

Description: This piece of code works by referencing the type of line/curve and the line/curve name. Note, there is no connection to the specific line chart.

Advantages: This code will work if there are multiple lines/curves on a visualization. The sample visualization hasthree lines/curves, and the code has been setup to turn offtwo of them. To turn off all lines, copy and pastelines 3-5, and adjust the line/curve name and line/curve type.

Restrictions: This code will not work with more than one line chart on the page. Additionally, the code must ‘import *’. It will not work with ‘import LineChart’. If anyone can tell me why (in comments), it would be greatly appreciated. I decided not to dive into that rabbit hole.

Code Set 3

from Spotfire.Dxp.Application.Visuals import *

for fm in vis.As[LineChart]().FittingModels:

if vis.TypeId == VisualTypeIdentifiers.LineChart:

lc = vis.As[LineChart]()

for fm in lc.FittingModels:

if fm.TypeId.DisplayName == “Line from Column Values”:

if fm.Line.DisplayName == “Type Curve A”:

fm.Enabled=not(fm.Enabled)

#for fm in vis.As[LineChart]().FittingModels: #if vis.As[LineChart]().FittingModels[0].Enabled == True: #vis.As[LineChart]().FittingModels[0].Enabled = False

#else:

#vis.As[LineChart]().FittingModels[0].Enabled = True

Description: This code merges the work of the previous two pieces of code. It uses a parameternamed ‘vis’ that connects to the ‘Visualization Connected to a Button’ visualization. It specifies both the type of line/curve and the name of the line/curve.

Advantages: This code will work with multiple lines on a visualization, and it will work if there is another line chart on the page, which makes it an improvement over the other two pieces of code.

Restrictions: This code may have other restrictions, but I haven’t run into them yet.

This post covered the syntax for a few different types of lines/curves, but it did not cover all of them. You can find the proper API for all lines and curves at this TIBCO website. (insert link)

↧

When and why to build your own data tools?

January 24, 2017, 12:24 pm

≫ Next: Spotluck uses Stitch to eliminate their custom Python scripts and replicate thei ...

≪ Previous: IronPython in Spotfire ― Turning lines & curves on and off

For the record: I’m a big supporter of the third-party data services (eg. Google Analytics , Hotjar , Crazyegg , Optimizely , Mixpanel , etc.). I like them, because they are easy to use and easy to set up.

But there are cases, when startups, e-commerce companies (and other online businesses) are reaching a size, where they are growing out of these services and otheradvanced data tools (eg. SQL, python, R, bash, etc…) will be needed!

By reading this article you will understand, for which kind of companies, why and when is it essential to build own data tools!

When third-party data tools are good enough

When you’re kicking off your business, you don’t have the time to create proper data analyses neither the money to hire a data analyst. To be fair in these first few months most probably you won’t be in need of it either.

However if you are smart and careful enough, you’ll think about at least collecting the data for further researches. And for that, setting up Google Analytics, Hotjar and the other “point-and-click” services seems to be just the perfect solution. As these tracking tools require only the implementation of a small code snippet into your website’s header, you don’t need to spend too much time or developer resources on them. Copy-paste the tracking code, finalize some settings (eg. setting up goals in Google Analytics, start polls in Hotjar, launch the heatmaps in Crazyegg, etc…) and done. Doable in 2-3 hours tops.

When and why to build your own data tools?

Data36.com’s Hotjar tracking. I’ve copy-pasted the snippet and set up the whole tracking in 5 mins.

Then when you start to grow, you can start analyzing this data. I won’t go into details, why it is needed, if you are reading this blog, I’m pretty sure, you know the importance of that:

No matter how fast you are without data, the faster you grow, the higher the chance that you will hit the wall.

― Tomi Mester (@data36_com) December 19, 2016

After a while you’ll be opening your smart data tools on a daily basis, you’ll upgrade them into Pro versions, etc. But sooner or later (usually after 2-3 years) you will realize that these services are not scaling with your company anymore. You’ll have 3 major problems:

You can’t connect all the dots. You can’t do predictions You can’t fully trust your data

And these are the 3 exact problems, that you can fix, if you build your own data infrastructure.

But what “own data infrastructure” means?

This can be splitted into 2 parts:

Collecting the Data Using your Data (for KPIs, analyses or predictions)

First you need to implement your own tracking scripts . These won’t collect your data to Hotjar for instance, but into your own data warehouse, usually in SQL-tables or plain-text files (.csv, .tsv, etc.) or both. (Read more here:Data collection.) There are many more technical solutions, but to keep it simple for now, I won’t list the rest.

Then you can analyze your data with SQL, Python, R or bash scripts instead of the graphical interface of Google Analytics or others. If you want to try these data coding languages and learn them, I wrote an article about that as well: Data Coding 101 How to install Python, SQL, R and Bash?

“But wait! that’s sounds too difficult and tech-heavy! Why would I do that?” you could ask. So let me answer it and let’s get back to the main 3 reasons of building and using your own data tools!

Reason #1: Having your own data. Connecting the dots.

The first big problem with third-party tools like Google Analytics, that they are working as a black-box. This means that you don’t own your data and you can’t use it for everything you want. This is not an issue, as long as you want to check simple reports, like how many people scrolled down to the bottom of your landing page, or how many sessions you had from google/organic in the last month.

But if you want to combine these metrics, things could become tricky. Eg.:

“What was the bounce rate and the time spent on page for each of my A/B test buckets?”

Of course, you can solve the smaller problems by using integrations, APIs or some hacks. (Note: Although let me tell you from my own experience, this could be a real pain in the neck, if you start to integrate more than 2 tools together.) Eg. for this specific question above, you can connect GA to Optimizely.

But doing more and more advanced analyses, you will reach the point, where you understand:

Every third-party service is created to measure a specific part of your product. That’s their power and that’s their limitation at the same time. Even if you manage to connect them, you will never be able to see the full picture. They don’t enable you to connect the dots!

A dummy example for a SQL star schema, where you can connect all the dots based on the user_id

And eventually this will lead to more and more poorly-answered or even unanswered questions. In a competitive sector, that online business is, this can be fatal.

Reason #2: Predictions

A part of the “ not-having-your-own-data ” issue is, that you can’t use your data for predictions either.Predictive Analytics is an iterative process, where you need to have clear and transparent data tables with a big number of variables. To create a meaningful prediction, you need to be flexible with your data. And third-party tools are not flexible at all. I guess, this is the main reason, why I’ve never seen an analyst who created useful predictions from Mixpanel, Kissmetrics or similars.

Reason #3: Trust your data

“Why Mixpanel doesn’t show the same numbers as Optimizely?”

“Why Adwords conversions are different from GA conversions?”

“How come that Crazyegg shows 30% bounce rate and Mixpanel shows 50%?”

3 different the tools show 3 different results for the same metrics. From previous guest post by PappG .

In the last few years I’ve been working and consulting with quite a few startups and ecommerce businesses. These questions above tend to come up from time to time. The answers differ by the specific problems. Some examples:

“This tool defines that metric differently, than the other one.”

“This tool has changed its conversion tracking recently.”

“This tool uses sampling.”

“The tool has been set up improperly.”

Either way the ultimate answer is:

The single source of truth will be always in your own data tools. With your own definitions, with your own tracking and without sampling.(Note that this also means that if you have your own data tools, it’s easier to debug the third-parties, when it’s needed.)

Con #1: Simplicity

However building your own data infrastructure is not black or white decision. There is one big counter argument against it: simplicity. Using third-parties like Google Analytics is incredibly easy the implementation part and the analysis part as well.

My rulehere is: simple tools for simple questions advanced tools for complex questions.

data analysis in Google Analytics vs. Bash

Going with the above mentioned examples: as long as your data analysis ends at checking the number of sessions per acquisition channels, you won’t need to spend time or money to set up your own data tools. There are businesses (eg. sole trader e-commerce businesses), where Google Analytics will cover the data needs forever! And that is cool!

But once you are out of the “simple questions” phase, make sure, you start to build advanced tools.

Hiring questions

Different tools need different skill set.

For using Google Analytics, Hotjar, Optimizely you have to hire a digital marketer or a digital analyst. (Or you can do it by yourself, if you feel like and if you have time for that.)

For building data collection scripts, SQL-tables, python scripts and the rest you need to hire a data analyst with engineering skills or a data scientist.

It’s hard to tell the exact numbers, but if you look around on webpages like indeed or glassdoor, you will see that a data analyst/scientist salaries are ~20% higher, than digital marketer or digital analyst salaries. Obviously this can differ by country, by market, by company, by the exact role, etc…

Anyways, hopefully regardless if it’s a digital analyst or a data scientist you hire, he/she will create much more value for your company, than he/she costs.

Is the price of the data tools a question?

For sure. But you’d be surprised!

Small calculation for an SaaS startup: let’s say, you have 5.000 users, 500 daily active users and 1000 daily new visitors. In this case you will pay:

Optimizely: ~400$/month (link: https://www.optimizely.com/plans/ )

Mixpanel: ~150$/month (link: https://mixpanel.com/pricing/ )

Crazyegg: ~50$/month (link: https://www.crazyegg.com/pricing/ )

Hotjar: ~30$/month (link: https://www.hotjar.com/pricing )

Google Analytics: free

Altogether: $630/month.

Just to compare:

For the same amount of users and visitors you can collect, store and process all your data on a data server for ~$100/month. On the top of that Python, R, SQL, bash and most of the related things are free.

It means that even at that size, your own tools will be cheaper. And in the long term: the more you scale the bigger this difference will be.

Note that this win will most probably be “paid back” on salaries (see above).

When to build your own data tools?

I guess you got the point now! You have to take your first steps into the advanced data tools (SQL, Python, R, bash, etc.) direction, when you have grown out of the basic third-party tools.

But when is that exactly?

In my experience the best possible moment to hire a data scientist/analyst to start to build up your data infrastructure is when your company is between 15 and 30 employees. Obviously, this is based on the great average but usually this is the time:

when you’ve filled in the must-have roles (engineers, designers, marketers) and when you can start to be smarter and smarter, and optimize your online business (with data people) when you’ve reached a reasonable size of audience (users or/and visitors) when thedata resistance is still not too big at your company

However if you have some engineering resources, then I recommend to log interaction-data at least in plain-text format from day zero . Believe me, 3 years later you will be very thankful to yourself, not letting this information to be lost today. Also I suggest to create daily copies of your transaction/production data somewhere for the same reasons.

CONCLUSION

Hope this article gave you a good overview about when and why to build your own data infrastructure. Use third-party data services, but don’t get stuck with them. Once it’s needed, don’t be afraid and start to build your own data tools!

If you want to be notified first about new content on Data36 (like articles, videos, handbooks, etc.), sign up for myNewsletter!

Cheers,

Tomi Mester

share: 0 0 0

↧

Spotluck uses Stitch to eliminate their custom Python scripts and replicate thei ...

January 24, 2017, 12:23 pm

≫ Next: reduce() in Python

≪ Previous: When and why to build your own data tools?

Spotluck uses Stitch to eliminate their custom python scripts and replicate their MongoDB data toRedshift

Spotluck is an on-demand app for discovering local restaurants. Their users enjoy a gamified approach to choosing restaurants and save money with fluctuating discounts. Spotluck’s restaurant partners get exposure to new guests, customizable solutions for communicating with users, and actionable data to better understand their business.

The need forStitch

Creating and maintaining custom ETL scripts is both complex and time consuming, especially when your primary data source is MongoDB. After launching in Philadelphia, their second major metropolitan area outside of DC, Spotluck’s data volume expanded rapidly. The team quickly realized that maintaining a data warehouse to power business intelligence and strategic decision making was no longer optional.

Spotluck turned to Stitch to address their two main data needs:

Internal analysis : make business data easily accessible by all teams, without impacting the workload of the engineering team. Restaurant partners : update the dynamic reporting provided to partners in near-real-time, enabling restaurants to act on their data in a moment’s notice. The results

Using Stitch, Spotluck was up and running with their first reports in less than a day. By automating the ETL process in a way that scales, Spotluck has used Stitch to free up their engineers’ time, grow their business, and strengthen their relationships with restaurant partners.

Now Spotluck has the ability to answer questions in real time, something they were never able to do before.

“Stitch is perfect for us. It’s simple, and that is exactly what we needed.” ― Srivathsan Komandur, Software Engineer

Check out the full case study on replacing custom Python ETL scripts for MongoDB , orsign up now to eliminate your company’s ETL hassles.

↧