Scrapy 1.2.2 发布，Web 爬虫框架

December 7, 2016, 1:28 pm

≫ Next: Visualizing Tweet Vectors Using Python

≪ Previous: Weird behavior in Class methods vs StaticMethods in Pyspark

Scrapy 1.2.2 发布了。

Scrapy 是一套基于基于Twisted的异步处理框架，纯python实现的爬虫框架，用户只需要定制开发几个模块就可以轻松的实现一个爬虫，用来抓取网页内容以及各种图片。

更新内容： Bug 修复

Fix a cryptic traceback when a pipeline fails on open_spider() ( issue 2011 )

Fix embedded IPython shell variables (fixing issue 396 that re-appeared in 1.2.0, fixed in issue 2418 )

A couple of patches when dealing with robots.txt:

handle (non-standard) relative sitemap URLs ( issue 2390 )

handle non-ASCII URLs and User-Agents in Python 2 ( issue 2373 )

文档

Document "download_latency" key in Request ‘s meta dict ( issue 2033 )

Remove page on (deprecated & unsupported) Ubuntu packages from ToC ( issue 2335 )

A few fixed typos ( issue 2346 , issue 2369 , issue 2369 , issue 2380 ) and clarifications ( issue 2354 , issue 2325 , issue 2414 )

其他变更

Advertize conda-forge as Scrapy’s official conda channel ( issue 2387 )

More helpful error messages when trying to use .css() or .xpath() on non-Text Responses ( issue 2264 )

startproject command now generates a sample middlewares.py file ( issue 2335 )

Add more dependencies’ version info in scrapyversion verbose output ( issue 2404 )

Remove all *.pyc files from source distribution ( issue 2386 )

完整更新内容

下载地址

↧

Visualizing Tweet Vectors Using Python

December 7, 2016, 1:27 pm

≫ Next: tryexceptpass: Threaded Asynchronous Magic and How to Wield It

≪ Previous: Scrapy 1.2.2 发布，Web 爬虫框架

I try to experiment with a lot of different technologies. I’ve found that having experience with a diverse set of concepts, languages, libraries, tools etc. leads to more robust thinking when trying to solve a problem. If you don’t know that something exists then you can’t use it when it would be helpful to do so! There are lots of ways to gain these experiences. One can find great content online for almost any topic imaginable. But I’ve found that the best way to understand a technology is to try to build something with it.

My latest target was a basket of different libraries in the python ecosystem covering things like web development, caching, asynchronous messaging, and visualization. And since I’m a data scientist, I threw in a machine learning component just for fun. To explore these technologies, I created a semi-practical application that reads from the Twitter stream, parses tweets, and does some machine learning magic to score the tweet’s sentiment and project it into a two-dimensional grid, where tweets with similar content will appear closer to each other. It does all of this more or less in real time using asynchronous messaging.

The remainder of this blog post is devoted to showing how to build this from scratch. Just to be completely transparent about what it does and what technologies are involved, here are the components that I’ll demonstrate how to build:

Basic web development in Python using Flask + standard front-end stuff (Bootstrap, JQuery etc.) Asynchronous, chainable task queues using Celery and Redis Real-time event-based communication between Flask and connected clients using Socket-IO Twitter stream filtering/parsing using Pattern Streaming real-time visualization using NVD3 Sentiment analysis and word embeddings using Scikit-learn and Gensim (word2vec)

And here's a screenshot of what the finished app looks like.

Why are we doing this? Does this app have some tangible, real-world purpose? Probably not. But it’s fun and neat and you’ll hopefully learn a lot. If that sounds interesting then read on!

(NOTE: The completed application is located here . Feel free to reference this often as I may leave out some details throughout the post).

Setup

To get started, let’s talk about the setup required to run the application. In addition to a working Python interpreter, you’ll need to install a bunch of libraries that the application uses. You’ll also need to know how to perform a few tasks such as starting Redis, launching Celery, and running a Flask server. Fortunately these are all very easy to do. I wrote up detailed instructions in the project’s README file . Follow those steps and you should be up and running.

Sentiment & Word Vector Models

Now let’s build sentiment and word vector models to transform tweets. We’ll use this Twitter sentiment dataset as training data for both models. The first step is to read in the dataset and do some pre-processing using TF-IDF to convert each tweet to a bag-of-words representation.

print('Reading in data file...') data = pd.read_csv(path + 'Sentiment Analysis Dataset.csv', usecols=['Sentiment', 'SentimentText'], error_bad_lines=False) print('Pre-processing tweet text...') corpus = data['SentimentText'] vectorizer = TfidfVectorizer(decode_error='replace', strip_accents='unicode', stop_words='english', tokenizer=tokenize) X = vectorizer.fit_transform(corpus.values) y = data['Sentiment'].values

Note that we’re using a custom tokenizer designed to handle patterns common in tweets. I borrowed this from a script Christopher Potts wrote and adapted it slightly (final version is in the "scripts" folder). Next, we can train the sentiment classifier and word2vec model.

print('Training sentiment classification model...') classifier = MultinomialNB() classifier.fit(X, y) print('Training word2vec model...') corpus = corpus.map(lambda x: tokenize(x)) word2vec = Word2Vec(corpus.tolist(), size=100, window=4, min_count=10, workers=4) word2vec.init_sims(replace=True)

This should run pretty fast since the training data set is not that big. We now have a model that can read a tweet and classify its sentiment as either positive or negative, and another model that transforms the words in a tweet to 100-dimensional vectors. But we still need a way to use those 100-dimensional vectors to spatially plot a tweet on a 2-dimensional grid. To do that, we’re going to fit a PCA transform for the word vectors and keep only the first 2 principal components.

print('Fitting PCA transform...') word_vectors = [word2vec[word] for word in word2vec.vocab] pca = PCA(n_components=2) pca.fit(word_vectors)

Finally, we’re going to save all of these artifacts to disk so we can call them later from the web application.

print('Saving artifacts to disk...') joblib.dump(vectorizer, path + 'vectorizer.pkl') joblib.dump(classifier, path + 'classifier.pkl') joblib.dump(pca, path + 'pca.pkl') word2vec.save(path + 'word2vec.pkl') Web App Initialization

Now that we have all the models we need ready to go, we can get started on the meat of the application. First, some initialization. This code runs only once, when the Flask server is launched.

# Initialize and configure Flask app = Flask(__name__) app.config['SECRET_KEY'] = 'secret' app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0' app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost:6379/0' app.config['SOCKETIO_REDIS_URL'] = 'redis://localhost:6379/0' app.config['BROKER_TRANSPORT'] = 'redis' app.config['CELERY_ACCEPT_CONTENT'] = ['pickle'] # Initialize SocketIO socketio = SocketIO(app, message_queue=app.config['SOCKETIO_REDIS_URL']) # Initialize and configure Celery celery = Celery(app.name, broker=app.config['CELERY_BROKER_URL']) celery.conf.update(app.config)

There’s a bunch of stuff going on here so let’s break it down. We’ve created a variable called "app" that’s an instantiation of Flask, and set some configuration items to do things like tell it to use Redis as the broker (note that "config" is just a dictionary of key/value pairs which we can use for other settings not required by Flask). We also created a SocketIO instance, which is a class from the Flask-SocketIO integration library that basically wraps Flask with SocketIO support. Finally, we created our Celery app and updated its configuration settings to use the "config" dictionary we defined for Flask.

Next we need to load the models we created earlier into memory so they can be used by the application.

# Load transforms and models vectorizer = joblib.load(path + 'vectorizer.pkl') classifier = joblib.load(path + 'classifier.pkl') pca = joblib.load(path + 'pca.pkl') word2vec = Word2Vec.load(path + 'word2vec.pkl')

Finally, we’ll create some helper functions that use these models to classify the sentiment of a tweet and transform a tweet into 2D coordinates.

def classify_tweet(tweet): """

↧

tryexceptpass: Threaded Asynchronous Magic and How to Wield It

December 7, 2016, 1:26 pm

≫ Next: Marcos Dione: ayrton-0.9

≪ Previous: Visualizing Tweet Vectors Using Python

tryexceptpass: Threaded Asynchronous Magic and How to Wield It

Photo Credit: Daniel Schwen via Wikipedia Threaded Asynchronous Magic and How to WieldIt A dive into python’s asyncio tasks and eventloops

Ok let’s face it. Clock speeds no longer govern the pace at which computer processors improve. Instead we see increased transistor density and higher core counts. Translating to software terms, this means that code won’t run faster, but more of it can run in parallel.

Although making good use of our new-found silicon real estate requires improvements in software, a lot of programming languages have already started down this path by adding features that help with parallel execution. In fact, they’ve been there for years waiting for us to take advantage.

So why don’t we? A good engineer always has an ear to the ground, listening for the latest trends in his industry, so let’s take a look at what Python is building for us.

What do we have sofar?

Python enables parallelism through both the threading and the multiprocessing libraries. Yet it wasn’t until the 3.4 branch that it gave us the asyncio library to help with single-threaded concurrency. This addition was key in providing a more convincing final push to start swapping over from version 2.

The asyncio package allows us to define coroutines. These are code blocks that have the ability of yielding execution to other blocks. They run inside an event loop which iterates through the scheduled tasks and executes them one by one. A task switch occurs when it reaches an await statement or when the current task completes.

Task execution itself happens the same as in a single-threaded system. Meaning, this is not an implementation of parallelism, it’s actually closer to multithreading. We can perceive the concurrency in situations where a block of code depends on external actions.

This illusion is possible because the block can yield execution while it waits, making anything that depends on external IO, like network or disk storage, a great candidate. When the IO completes, the coroutine receives an interrupt and can proceed with execution. In the meantime, other tasks execute.

The asyncio event loop can also serve as a task scheduler. Both asynchronous and blocking functions can queue up their execution as needed.

Tasks

A Task represents callable blocks of code designed for asynchronous execution within event loops. They execute single-threaded, but can run in parallel through loops on different threads.

Prefixing a function definition with the async keyword turns it into an asynchronous coroutine. Though the task itself will not exist until it’s added to a loop. This is usually implicit when calling most loop methods, but asyncio.ensure_future(your_coroutine) is the more direct mechanism.

To denote an operation or instruction that can yield execution, we use the await keyword. Although it’s only available within a coroutine block and causes a syntax error if used anywhere else.

Please note that the async keyword was not implemented until Python version 3.5. So when working with older versions, use the @asyncio.coroutine decorator and yield from keywords instead.

Scheduling

In order to execute a task, we need a reference to the event loop in which to run it. Using loop = asyncio.get_event_loop() gives us the current loop in our execution thread. Now it’s a matter of calling loop.run_until_complete(your_coroutine) or loop.run_forever() to have it do some work.

Let’s look at a short example to illustrate a few points. I strongly encourage you to open an interpreter and follow along:

import time import asyncio async def do_some_work(x): print("Waiting " + str(x)) await asyncio.sleep(x) loop = asyncio.get_event_loop() loop.run_until_complete(do_some_work(5))

Here we defined do_some_work() as a coroutine that waits on the results of external workload. The workload is simulated through asyncio.sleep .

Running the code may be surprising. Did you expect run_until_complete to be a blocking call? Remember that we’re using the event loop from the current thread to execute the task. We’ll discuss alternatives in more detail later. So for now, the important part is to understand that while execution blocks, the await keyword still enables concurrency.

For a better picture, let’s change our test code a bit and look at executing tasks in batches:

tasks = [asyncio.ensure_future(do_some_work(2)), asyncio.ensure_future(do_some_work(5)] loop.run_until_complete(asyncio.gather(*tasks))

Introducing the asyncio.gather() function enables results aggregation. It waits for several tasks in the same thread to complete and puts the results in a list.

The main observation here is that both function calls did not execute in sequence. It did not wait 2 seconds, then 5, for a total of 7 seconds. Instead it started to wait 2s, then moved on to the next item which started to wait 5s, returning when the longer task completed, for a total of 5s. Feel free to add more print statements to the base function if it helps visualize.

This means that we can put long running tasks with awaitable code in an execution batch, then ask Python to run them in parallel and wait until they all complete. If you plan it right, this will be faster than running in sequence.

Think of it as an alternative to the threading package where after spinning up a number of Threads , we wait for them to complete with .join() . The major difference is that there’s less overhead incurred than creating a new thread for each function.

Of course, it’s always good to point out that your millage may vary based on the task at hand. If you’re doing compute-heavy work, with little or no time waiting, then the only benefit you get is the grouping of code into logical batches.

Running a loop in a different thread

What if instead of doing everything in the current thread, we spawn a separate Thread to do the work for us.

from threading import Thread import asyncio def start_loop(loop): asyncio.set_event_loop(loop) l.run_forever() new_loop = asyncio.new_event_loop() t = Thread(target=start_loop, args=(new_loop,)) t.start()

Notice that this time we created a new event loop through asyncio.new_event_loop() . The idea is to spawn a new thread, pass it that new loop and then call thread-safe functions (discussed later) to schedule work.

The advantage of this method is that work executed by the other event loop will not block execution in the current thread. Thereby allowing the main thread to manage the work, and enabling a new category of execution mechanisms.

Queuing work in a different thread

Using the thread and event loop from the previous code block, we can easily get work done with the call_soon() , call_later() or call_at() methods. They are able to run regular function code blocks (those not defined as coroutines) in an event loop.

↧

Marcos Dione: ayrton-0.9

December 7, 2016, 1:25 pm

≫ Next: Talk Python to Me: #88 Lightweight Django

≪ Previous: tryexceptpass: Threaded Asynchronous Magic and How to Wield It

Another release, but this time not (only) a bugfix one. After playing with bool semantics I converted the file tests from a _X format, which, let's face it, was not pretty, into the more usual -X format. This alone merits a change in the minor version number. Also, _in , _out and _err also accept a tuple (path, flags) , so you can specify things like os.O_APPEND .

In other news, I had to drop support for Pyhton-3.3, because otherwise I would have to complexify the import system a lot.

But in the end, yes, this also is a bugfix release. Lost of fd leaks where plugged, so I suggest you to upgrade if you can. Just remember the s/_X/-X/ change. I found all the leaks thanks to unitest 's warnings, even if sometimes they were a little misleading:

testRemoteCommandStdout (tests.test_remote.RealRemoteTests) ... ayrton/parser/pyparser/parser.py:175: <span class="createlink">ResourceWarning</span>: unclosed <socket.socket fd=5, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, raddr=/tmp/ssh-XZxnYoIQxZX9/agent.7248> self.stack[-1] = (dfa, next_state, node)

The file and line cited in the warning have nothing to do with the warning itself (it was not the one who raised it) or the leaked fd, so it took me a while to find were those leaks were coming from. I hope I have some time to find why this is so. The most frustrating thing was that unitest closes the leaking fd, which is nice, but in one of the test cases it was closing it seemingly before the test finished, and the test failed because the socket was closed:

====================================================================== ERROR: testLocalVarToRemoteToLocal (tests.test_remote.RealRemoteTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 225, in wrapper test (self) File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/test_remote.py", line 235, in testLocalVarToRemoteToLocal self.runner.run_file ('ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay') File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 304, in run_file return self.run_script (script, file_name, argv, params) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 323, in run_script return self.run_tree (tree, file_name, argv, params) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 336, in run_tree return self.run_code (code, file_name, argv) File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 421, in run_code raise error File "/home/mdione/src/projects/ayrton_clean/ayrton/__init__.py", line 402, in run_code exec (code, self.globals, self.locals) File "ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay", line 6, in <module> with remote ('127.0.0.1', _test=True): File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 362, in __enter__ i, o, e= self.prepare_connections (backchannel_port, command) File "/home/mdione/src/projects/ayrton_clean/ayrton/remote.py", line 270, in prepare_connections self.client.connect (self.hostname, *self.args, **self.kwargs) File "/usr/lib/python3/dist-packages/paramiko/client.py", line 338, in connect t.start_client() File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 493, in start_client raise e File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1757, in run self.kex_engine.parse_next(ptype, m) File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 75, in parse_next return self._parse_kexdh_reply(m) File "/usr/lib/python3/dist-packages/paramiko/kex_group1.py", line 112, in _parse_kexdh_reply self.transport._activate_outbound() File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 2079, in _activate_outbound self._send_message(m) File "/usr/lib/python3/dist-packages/paramiko/transport.py", line 1566, in _send_message self.packetizer.send_message(data) File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 364, in send_message self.write_all(out) File "/usr/lib/python3/dist-packages/paramiko/packet.py", line 314, in write_all raise EOFError() EOFError

This probably has something to do with the fact that the test (a functional test, really) is using threads and real sockets. Again, I'll try to investigate this.

All in all, the release is an interesting one. I'll keep adding small features and releasing, let's see how it goes. Meanwhile, here's the changelog:

The 'No Government' release. Test functions are no longer called _X but -X , which is more scripting friendly. Some if those tests had to be fixed. Dropped support for py3.3 because the importer does not work there. tox support, but not yet part of the stable test suite. Lots and lots of more tests. Lots of improvements in the remote() tests; in particular, make sure they don't hang waiting for someone who's not gonna come. Ignore ssh remote() tests if there's not password/phrase-less connection. Fixed several fd leaks. _in , _out and _err also accept a tuple (path, flags) , so you can specify things like os.O_APPEND . Mostly used internally.

Get it on github or pypi !

↧

Talk Python to Me: #88 Lightweight Django

December 7, 2016, 1:24 pm

≫ Next: Obey the Testing Goat: Second Edition update: Virtualenvs, Django 1.10, REST API ...

≪ Previous: Marcos Dione: ayrton-0.9

Django is a very popular python web framework. One reason is you have many building blocks to drop in for large sections of your application. Need a full-on admin table editor backend? That's a few lines of code and boom you have a basic table editor.

This applies to many people. But those of us, myself included, who appreciate lightweight frameworks where we choose just what is included and piece together our web apps from best-of-breed components find this a turn off.

This week you'll meet Julia Elman and Mark Lavin, authors of Lightweight Django who are here to dispel the myth that Django apps have to be built out of large building blocks.

Links from the show:

Lightweight Django by Julia Elman and Mark Lavin :

shop.oreilly.com/product/0636920032502.do

Twitter : @juliaelman

Julia Elman web : juliaelman.com

Mark Lavin web : mlavin.org

Mark's Twitter : @DrOhYes

Lightweight Django code examples : github.com/lightweightdjango

Intermediate Django: Building Modern, Scalable, and Maintainable Web Applications by Mark Lavin :

shop.oreilly.com/product/0636920040903.do Sponsors

GoCD : go.cd

pyup.io : pyup.io

↧

Obey the Testing Goat: Second Edition update: Virtualenvs, Django 1.10, REST API ...

December 7, 2016, 1:23 pm

≫ Next: Writing autofill plugins for TeamPlayer

≪ Previous: Talk Python to Me: #88 Lightweight Django

A brief update on my progress for the second edition.

Obey the Testing Goat: Second Edition update: Virtualenvs, Django 1.10, REST API ...

Getting there! Virtualenvs all the way down.

In the first edition, I made the judgement call that telling people to use virtualenvs at the very beginning of the book would be too confusing for beginners. I've decided to revisit that decision, since virtualenvs are more and more de rigueur these days. I mean, if the djangogirls tutorial is recommending one, given that it's the most beginner-friendly tutorial on Earth, then it really must be a good idea. So there's new instructions in the pre-requisite installations chapter . Let me know if you think they could be clearer.

Django 1.10

Django 1.10 doesn't introduce that many new features over 1.8, but upgrading was still pretty fiddly. Thank goodness for my extensive tests (tests for the tests in the book about testing, yes. because of course.) The main changes you'll likely to notice is inChapter 4 where I introduce the Django Test Client, much earlier than I used to (which, through a long chain of causes, is actually because of a change to the way csrf tokens are generated ). Other than that, Django 1.10 was pretty much a drop-in replacement. The main thing I'm preparing for really is the upgrade to 1.11LTS early next year.

REST APIs

I was thinking of having a couple of in-line chapters on building a REST API, but for now I've decided to have them as appendices. It starts withhow to roll your own, including an example of how to test client-side ajax javascript with sinon , and then there's a second appendix on Django Rest Framework . These are both very much just skeleton outlines at the moment, but, still, feedback and suggestions appreciated.

A cleaner flow for Chapter 6

Chapter 6 is all about rewriting an app that almost works, to be one that actually works, but trying to work incrementally all along, and using the FTs to tell us when we make progress, and warn us if we introduce regressions. I used to have just the one FT, and track progress/regressions by "what line number is the FT failing at? is it higher or lower than before?". Instead I've split out one FT that tests that the existing behaviour still works , and one FT for the new behaviour, and that's much neater I think.

Next: geckodriver and Selenium 3 (uh-oh!)

There are plenty more little tweaks and nice-to-have additions I can think of (React? Docker? Oh yeah, I got your trendy topics covered), but the main task that's really outstanding is upgrading to Selenium 3 and geckodriver. And the reason that's scary is because the current status of implicit waits is up for debate , and I rely on implicit waits a lot. Introducing explicit waits earlier might be a good thing (they're currently only mentioned in Chapter 20), but it would definitely add to the learning curve in the early chapters (I think they'd have to go in chapter 4 or 5, which feels very early indeed). So I'm kinda in denial about this at the moment, hoping that maybe Mozilla will reintroduce the old behaviour, or maybe I'll build some magical wrapper around selenium that just does implicit waits for you (maybe using my stale element check trick ) (in my copious spare time), or maybe switch to chromedriver, or I don't know I don't want to think about it. Suggestions, words of encouragement, moral support all welcome here.

In the meantime, I hope you enjoy the new stuff. Keep in touch!

↧

Writing autofill plugins for TeamPlayer

December 7, 2016, 1:22 pm

≫ Next: How to Create Group By Queries With Django ORM

≪ Previous: Obey the Testing Goat: Second Edition update: Virtualenvs, Django 1.10, REST API ...

Background

TeamPlayer is a Django-based streaming radio app with a twist. A while back it gained a feature called "shake things up" where, instead of dead silence, "DJ Ango" would play tracks from the TeamPlayer Library when no players had any queued songs. Initially this was implemented by creating a queue for DJ Ango and then filling it with random tracks. This worked but after I while I became annoyed by the "randomness" and so went about writing a few other implementations which I call "autofill strategies". These were function definitions and the autofill logic used an if/else clause to select which function to call based on what was set in the Django settings.

Recently I got rid of the if/else 's and instead use setuptools entry points . This also allows for third parties to write "autofill plugins" for TeamPlayer. Here's how to do it.

As I said every autofill strategy is a python function with the following signature:

def my_autofill_strategy(*, queryset, entries_needed, station):

This function should return a list of teamplayer.models.LibraryItem . The list should ideally have a length of entries_needed but no longer, and the returned list should contain entries from the queryset . The "should"s are emphasized because sometimes a particular strategy can't find enough entries from the queryset so it can either return a smaller list or return entries not in the queryset or both. The station argument is the teamplayer.models.Station instance for which songs are being selected. This is (almost) always Station.main_station() .

Idea

Regular terrestrial radio stations often play the same set of songs in rotation over and over again. This is one reason why I rarely listen to them. However I thought this would be an interesting (and easy) autofill strategy to write.

Implementation

Here's the idea: keep a (play)list of songs from the TeamPlayer Library for rotation, store it in a database table, and then write the autofill function to simply pick from that list. Here is the Django database model:

from django.db import models from teamplayer.models import LibraryItem class Song(models.Model): song = models.OneToOneField(LibraryItem)

This table's rows just point to a LibraryItem . We can use the Django admin site to maintain the list. So again the autofill function just points to entries from the list:

from .models import Song def rotation_autofill(*, queryset, entries_needed, station): songs = Song.objects.order_by('?')[:entries_needed] songs = [i.song for i in songs] return songs

Now all that we need is some logic to run the commercial breaks and station identification. Just kidding. Now all that is needed is to "package" our plugin.

Packaging

As I've said TeamPlayer now uses setuptools entry points to get autofill strategies. The entry point group name for autofill plugins is aptly called 'teamplayer.autofill_strategy' . So in our setup.py we register our function as such:

# setup.py from setuptools import setup setup( name='mypackage', ... entry_points={ 'teamplayer.autofill_strategy': [ 'rotation = mypackage.autofill:rotation_autofill', ] } )

Here the entry_points argument to setup defines the entry points. For this we declare the group teamplayer.autofill_strategy and in that group we have a single entry point called rotation . rotation points to the rotation_autofill function in the module mypackage.autofill (using dots for the module and a colon for the member).

From there all you would need is to pip install your app, add it to INSTALLED_APPS (after TeamPlayer) and change the following setting:

TEAMPLAYER = { 'SHAKE_THINGS_UP': 10, 'AUTOFILL_STRATEGY': 'rotation', }

The 'SHAKE_THINGS_UP' setting tells TeamPlayer the (maximum) number of Library items to add to DJ Ango's queue at a time ( 0 to disable) and the AUTOFILL_STRATEGY tells which autofill strategy plugin to load.

A (more) complete implementation of this example is here .

↧

How to Create Group By Queries With Django ORM

December 7, 2016, 1:21 pm

≫ Next: Python爬虫入门六之Cookie的使用

≪ Previous: Writing autofill plugins for TeamPlayer

This tutorial is about how to implement SQL-like group by queries using the Django ORM. It’s a fairly common operation, specially for those who are familiar with SQL. The Django ORM is actually an abstraction layer, that let us play with the database as it was object-oriented but in the end it’s just a relational database and all the operations are translated into SQL statements.

Most of the work can be done retrieving the raw data from the database, and playing with it in the python side, grouping the data in dictionaries, iterating through it, making sums, averages and what not. But the database is a very powerful tool and do much more than simply storing the data, and often you can do the work much faster directly in the database.

Generally speaking, when you start doing group by queries, you are no longer interested in each model instances (or in a table row) details, but you want extract new information from your dataset, based on some common aspects shared between the model instances.

Let’s have a look in an example:

class Country(models.Model): name = models.CharField(max_length=30) class City(models.Model): name = models.CharField(max_length=30) country = models.ForeignKey(Country) population = models.PositiveIntegerField()

And the raw data stored in the database:

cities id name country_id population 1 Tokyo 28 36,923,000 2 Shanghai 13 34,000,000 3 Jakarta 19 30,000,000 4 Seoul 21 25,514,000 5 Guangzhou 13 25,000,000 6 Beijing 13 24,900,000 7 Karachi 22 24,300,000 8 Shenzhen 13 23,300,000 9 Delhi 25 21,753,486 10 Mexico City 24 21,339,781 11 Lagos 9 21,000,000 12 So Paulo 1 20,935,204 13 Mumbai 25 20,748,395 14 New York City 20 20,092,883 15 Osaka 28 19,342,000 16 Wuhan 13 19,000,000 17 Chengdu 13 18,100,000 18 Dhaka 4 17,151,925 19 Chongqing 13 17,000,000 20 Tianjin 13 15,400,000 21 Kolkata 25 14,617,882 22 Tehran 11 14,595,904 23 Istanbul 2 14,377,018 24 London 26 14,031,830 25 Hangzhou 13 13,400,000 26 Los Angeles 20 13,262,220 27 Buenos Aires 8 13,074,000 28 Xi'an 13 12,900,000 29 Paris 6 12,405,426 30 Changzhou 13 12,400,000 31 Shantou 13 12,000,000 32 Rio de Janeiro 1 11,973,505 33 Manila 18 11,855,975 34 Nanjing 13 11,700,000 35 Rhine-Ruhr 16 11,470,000 36 Jinan 13 11,000,000 37 Bangalore 25 10,576,167 38 Harbin 13 10,500,000 39 Lima 7 9,886,647 40 Zhengzhou 13 9,700,000 41 Qingdao 13 9,600,000 42 Chicago 20 9,554,598 43 Nagoya 28 9,107,000 44 Chennai 25 8,917,749 45 Bangkok 15 8,305,218 46 Bogotá 27 7,878,783 47 Hyderabad 25 7,749,334 48 Shenyang 13 7,700,000 49 Wenzhou 13 7,600,000 50 Nanchang 13 7,400,000 51 Hong Kong 13 7,298,600 52 Taipei 29 7,045,488 53 Dallas Fort Worth 20 6,954,330 54 Santiago 14 6,683,852 55 Luanda 23 6,542,944 56 Houston 20 6,490,180 57 Madrid 17 6,378,297 58 Ahmedabad 25 6,352,254 59 Toronto 5 6,055,724

↧

Python爬虫入门六之Cookie的使用

December 7, 2016, 1:20 pm

≫ Next: Introducing: fastparquet

≪ Previous: How to Create Group By Queries With Django ORM

大家好哈，上一节我们研究了一下爬虫的异常处理问题，那么接下来我们一起来看一下Cookie的使用。

为什么要使用Cookie呢？

Cookie，指某些网站为了辨别用户身份、进行session跟踪而储存在用户本地终端上的数据（通常经过加密）

比如说有些网站需要登录后才能访问某个页面，在登录之前，你想抓取某个页面内容是不允许的。那么我们可以利用Urllib2库保存我们登录的Cookie，然后再抓取其他页面就达到目的了。

在此之前呢，我们必须先介绍一个opener的概念。

1.Opener

当你获取一个URL你使用一个opener(一个urllib2.OpenerDirector的实例)。在前面，我们都是使用的默认的opener，也就是urlopen。它是一个特殊的opener，可以理解成opener的一个特殊实例，传入的参数仅仅是url，data，timeout。

如果我们需要用到Cookie，只用这个opener是不能达到目的的，所以我们需要创建更一般的opener来实现对Cookie的设置。

2.Cookielib

cookielib模块的主要作用是提供可存储cookie的对象，以便于与urllib2模块配合使用来访问Internet资源。Cookielib模块非常强大，我们可以利用本模块的CookieJar类的对象来捕获cookie并在后续连接请求时重新发送，比如可以实现模拟登录功能。该模块主要的对象有CookieJar、FileCookieJar、MozillaCookieJar、LWPCookieJar。

它们的关系：CookieJar ―-派生―->FileCookieJar ―-派生― >MozillaCookieJar和LWPCookieJar

1）获取Cookie保存到变量

首先，我们先利用CookieJar对象实现获取cookie的功能，存储到变量中，先来感受一下

import urllib2 import cookielib #声明一个CookieJar对象实例来保存cookie cookie = cookielib.CookieJar() #利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器 handler=urllib2.HTTPCookieProcessor(cookie) #通过handler来构建opener opener = urllib2.build_opener(handler) #此处的open方法同urllib2的urlopen方法，也可以传入request response = opener.open('http://www.baidu.com') for item in cookie: print 'Name = '+item.name print 'Value = '+item.value

我们使用以上方法将cookie保存到变量中，然后打印出了cookie中的值，运行结果如下

Name = BAIDUID Value = B07B663B645729F11F659C02AAE65B4C:FG=1 Name = BAIDUPSID Value = B07B663B645729F11F659C02AAE65B4C Name = H_PS_PSSID Value = 12527_11076_1438_10633 Name = BDSVRTM Value = 0 Name = BD_HOME Value = 0 2）保存Cookie到文件

在上面的方法中，我们将cookie保存到了cookie这个变量中，如果我们想将cookie保存到文件中该怎么做呢？这时，我们就要用到

FileCookieJar这个对象了，在这里我们使用它的子类MozillaCookieJar来实现Cookie的保存

import cookielib import urllib2 #设置保存cookie的文件，同级目录下的cookie.txt filename = 'cookie.txt' #声明一个MozillaCookieJar对象实例来保存cookie，之后写入文件 cookie = cookielib.MozillaCookieJar(filename) #利用urllib2库的HTTPCookieProcessor对象来创建cookie处理器 handler = urllib2.HTTPCookieProcessor(cookie) #通过handler来构建opener opener = urllib2.build_opener(handler) #创建一个请求，原理同urllib2的urlopen response = opener.open("http://www.baidu.com") #保存cookie到文件 cookie.save(ignore_discard=True, ignore_expires=True)

关于最后save方法的两个参数在此说明一下：

官方解释如下：

ignore_discard: save even cookies set to be discarded.

ignore_expires: save even cookies that have expiredThe file is overwritten if it already exists

由此可见，ignore_discard的意思是即使cookies将被丢弃也将它保存下来，ignore_expires的意思是如果在该文件中cookies已经存在，则覆盖原文件写入，在这里，我们将这两个全部设置为True。运行之后，cookies将被保存到cookie.txt文件中，我们查看一下内容，附图如下

3）从文件中获取Cookie并访问

那么我们已经做到把Cookie保存到文件中了，如果以后想使用，可以利用下面的方法来读取cookie并访问网站，感受一下

import cookielib import urllib2 #创建MozillaCookieJar实例对象 cookie = cookielib.MozillaCookieJar() #从文件中读取cookie内容到变量 cookie.load('cookie.txt', ignore_discard=True, ignore_expires=True) #创建请求的request req = urllib2.Request("http://www.baidu.com") #利用urllib2的build_opener方法创建一个opener opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie)) response = opener.open(req) print response.read()

设想，如果我们的 cookie.txt 文件中保存的是某个人登录百度的cookie，那么我们提取出这个cookie文件内容，就可以用以上方法模拟这个人的账号登录百度。

4）利用cookie模拟网站登录

下面我们以我们学校的教育系统为例，利用cookie实现模拟登录，并将cookie信息保存到文本文件中，来感受一下cookie大法吧！

注意：密码我改了啊，别偷偷登录本宫的选课系统 o(%o□%p)o

import urllib import urllib2 import cookielib filename = 'cookie.txt' #声明一个MozillaCookieJar对象实例来保存cookie，之后写入文件 cookie = cookielib.MozillaCookieJar(filename) opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie)) postdata = urllib.urlencode({ 'stuid':'201200131012', 'pwd':'23342321' }) #登录教务系统的URL loginUrl = 'http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bks_login2.login' #模拟登录，并把cookie保存到变量 result = opener.open(loginUrl,postdata) #保存cookie到cookie.txt中 cookie.save(ignore_discard=True, ignore_expires=True) #利用cookie请求访问另一个网址，此网址是成绩查询网址 gradeUrl = 'http://jwxt.sdu.edu.cn:7890/pls/wwwbks/bkscjcx.curscopre' #请求访问成绩查询网址 result = opener.open(gradeUrl) print result.read()

以上程序的原理如下

创建一个带有cookie的opener，在访问登录的URL时，将登录后的cookie保存下来，然后利用这个cookie来访问其他网址。

如登录之后才能查看的成绩查询呀，本学期课表呀等等网址，模拟登录就这么实现啦，是不是很酷炫？

好，小伙伴们要加油哦！我们现在可以顺利获取网站信息了，接下来就是把网站里面有效内容提取出来，下一节我们去会会正则表达式！

↧

Introducing: fastparquet

December 7, 2016, 1:19 pm

≫ Next: Python 3.6.0 release candidate is now available

≪ Previous: Python爬虫入门六之Cookie的使用

A compliant, flexible and speedy interface to Parquet format files for python, fastparquet provides seamless translation between in-memory pandas DataFrames and on-disc storage.

In this post, we will introduce the two functions that will most commonly be used within fastparquet, followed by a discussion of the current Big Data landscape, Python's place within it and details of how fastparquet fills one of the gaps on the way to building out a full end-to-end Big Data pipeline in Python.

fastparquet Teaser

New users of fastparquet will mainly use the functions write and ParquetFile.to_pandas . Both functions offer good performance with default values, and both have a number of options to improve performance further.

import fastparquet # write data fastparquet.write('out.parq', df, compression='SNAPPY') # load data pfile = fastparquet.ParquetFile('out.parq') df2 = pfile.topandas() # all columns df3 = pfile.topandas(columns=['floats', 'times']) # pick some columns Introduction: Python and Big Data

Python was named as a favourite tool for data science by 45% of data scientists in 2016. Many reasons can be presented for this, and near the top will be:

Python is very commonly taught at college and university level

Python and associated numerical libraries are free and open source

The code tends to be concise, quick to write, and expressive

An extremely rich ecosystem of libraries exist for not only numerical processing but also other important links in the pipeline from data ingest to visualization and distribution of results

Big Data, however, has typically been based on traditional databases and, in latter years, the Hadoop ecosystem. Hadoop provides a distributed file-system, cluster resource management (YARN, Mesos) and a set of frameworks for processing data (map-reduce, pig, kafka, and many more). In the past few years, Spark has rapidly increased in usage, becoming a major force, even though 62% of users use Python to execute Spark jobs (via PySpark).

The Hadoop ecosystem and its tools, including Spark, are heavily based around the Java Virtual Machine (JVM), which creates a gap between the familiar, rich Python data ecosystem and clustered Big Data with Hadoop. One such missing piece is a data format that can efficiently store large amounts of tabular data, in a columnar layout, and split it into blocks on a distributed file-system.

Parquet has become the de-facto standard file format for tabular data in Spark, Impala and other clustered frameworks. Parquet provides several advantages relevant to Big Data processing:

Columnar storage, only read the data of interest

Efficient binary packing

Choice of compression algorithms and encoding

Splits data into files, allowing for parallel processing

Range of logical types

Statistics stored in metadata to allow for skipping unneeded chunks

Data partitioning using the directory structure

fastparquetbridges the gap to provide native Python read/write access with out the need to use Java.

Until now, Spark's Python interface provided the only way to write Spark files from Python. Much of the time is spent in deserializing the data in the Java-Python bridge. Also, note that the times column returned is now just integers, rather than the correct datetime type. Not only does fastparquet provide native access to Parquet files, it in fact makes the transfer of data to Spark much faster.

# to make and save a large-ish DataFrame import pandas as pd import numpy as np N = 10000000 df = pd.DataFrame({'ints': np.random.randint(0, 1000, size=N), 'floats': np.random.randn(N), 'times': pd.DatetimeIndex(start='1980', freq='s', periods=N)})

The default Spark single-machine configuration cannot handle the above DataFrame (out-of-memory error), so we'll perform timing for 1/10 of the data:

# sending data to spark via pySpark serialization, 1/10 of the data %time o = sql.createDataFrame(df[::10]).count() CPU times: user 3.45 s, sys: 96.6 ms, total: 3.55 s Wall time: 4.14 s %%time # sending data to spark via a file made with fastparquet, all the data fastparquet.write('out.parq', df, compression='SNAPPY') df4 = sql.read.parquet('outspark.parq').count() CPU times: user 2.75 s, sys: 285 ms, total: 3.04 s Wall time: 3.27 s The fastparquet Library

fastparquet is an open source library providing a Python interface to the Parquet file format. It usesNumba and NumPy to provide speed, and writes data to and from pandas DataFrames, the most typical starting point for Python data science operations.

fastparquet can be installed using conda :

conda install -c conda-forge fastparquet

(currently only available for Python 3)

The code is hosted on GitHub The primary documentation is on RTD

Bleeding edge installation directly from the GitHub repo is also supported, as long as Numba, pandas, pytest and ThriftPy are installed.

Reading Parquet files into pandas is simple and, again, much faster than via PySpark serialization.

import fastparquet pfile = fastparquet.ParquetFile('out.parq') %time df2 = pfile.to_pandas() CPU times: user 812 ms, sys: 291 ms, total: 1.1 s Wall time: 1.1 s

The Parquet format is more compact and faster to load than the ubiquitous CSV format.

df.to_csv('out.csv') !du -sh out.csv out.parq 490M out.csv 162M out.parq

In this case, the data is 229MB in memory, which translates to 162MB on-disc as Parquet or 490MB as CSV. Loading from CSV takes substantially longer than from Parquet.

%time df2 = pd.read_csv('out.csv', parse_dates=True) CPU times: user 9.85 s, sys: 1 s, total: 10.9 s Wall time: 10.9 s

The biggest advantage, however, is the ability to pick only some columns of interest. In CSV, this still means scanning through the whole file (if not parsing all the values), but the columnar nature of Parquet means only reading the data you need.

%time df3 = pd.read_csv('out.csv', usecols=['floats']) %time df3 = pfile.to_pandas(columns=['floats']) CPU times: user 4.04 s, sys: 176 ms, total: 4.22 s Wall time: 4.22 s CPU times: user 40 ms, sys: 96.9 ms, total: 137 ms Wall time: 137 ms Example

We have taken the airlines dataset and converted it into Parquet format using fastparquet . The original data was in CSV format, one file per year, 1987-2004. The total data size is 11GB as CSV, uncompressed, which becomes about double that in memory as a pandas DataFrame for typical dtypes. This is approaching, if not Big Data, Sizable Data, because it cannot fit into my machine's memory.

The Parquet data is stored as a multi-file dataset. The total size is 2.5GB, with Snappy compression throughout.

ls airlines-parq/ _common_metadata part.12.parquet part.18.parquet part.4.parquet _metadata part.13.parquet part.19.parquet part.5.parquet part.0.parquet part.14.parquet part.2.parquet part.6.parquet part.1.parquet part.15.parquet part.20.parquet part.7.parquet part.10.parquet part.16.parquet part.21.parquet part.8.parquet part.11.parquet part.17.parquet part.3.parquet part.9.parquet

To load the metadata:

import fastparquet pf = fastparquet.ParquetFile('airlines-parq')

The ParquetFile instance provides various information about the data set in attributes:

pf.info

pf.schema

pf.dtypes

pf.count

Furthermore, we have information available about the "row-groups" (logical chunks) and the 29 column fragments contained within each. In this case, we have one row-group for each of the original CSV files―that is, one per year.

fastparquet will not generally be as fast as a direct memory dump, such as numpy.save or Feather , nor will it be as fast or compact as custom tuned formats like bcolz . However, it provides good trade-offs and options which can be tuned to the nature of the data. For example, the column/row-group chunking of the data allows pre-selection of only some portions of the total, which enables not having to scan through the other parts of the disc at all. The load speed will depend on the data type of the column, the efficiency of compression, and whether there are any NULLs.

There is, in general, a trade-off between compression and processing speed; uncompressed will tend to be faster, but larger on disc, and gzip compression will be the most compact, but slowest. Snappy compression, in this example, provides moderate space efficiency, without too much processing cost.

fastparquet has no problem loading a very large number of rows or columns (memory allowing):

%%time # 124M bool values d = pf.to_pandas(columns=['Cancelled']) CPU times: user 436 ms, sys: 167 ms, total: 603 ms Wall time: 620 ms %%time d = pf.to_pandas(columns=['Distance']) CPU times: user 964 ms, sys: 466 ms, total: 1.43 s Wall time: 1.47 s %%time # just the first portion of the data, 1.3M rows, 29 columns d = pf.to_pandas(filters=(('Year', '==', 1987), )) CPU times: user 1.37 s, sys: 212 ms, total: 1.58 s Wall time: 1.59 s

The following factors are known to reduce performance:

The existence of NULLs in the data. It is faster to use special values, such as NaN for data types that allow it, or other known sentinel values, such as an empty byte-string.

Variable-length string encoding is slow on both write and read, and fixed-length will be faster, although this is not compatible with all Parquet frameworks (particularly Spark). Converting to categories will be a good option if the cardinality is low.

Some data types require conversion in order to be stored in Parquet's few primitive types. Conversion may take some time.

The Python Big DataEcosystem

fastparquet provides one of the necessary links for Python to be a first-class citizen within Big Data processing. Although useful alone, it is intended to work seamlessly with the following libraries:

Dask , a pure-Python, flexible parallel execution engine, and its distributed scheduler. Each row-group is independent of the others, and Dask can take advantage of this to process parts of a Parquet data-set in parallel. The Dask DataFrame closely mirrors pandas, and methods on it (a subset of all those in pandas) actually call pandas methods on the underlying shards of the logical DataFrame. The Dask Parquet interface is experimental, as it lags slightly behind development in fastparquet.

hdfs3 , s3fs and adlfs provide native Pythonic interfaces to massive file systems. If the whole purpose of Parquet is to store Big Data, we need somewhere to keep it. fastparquet accepts a function to open a file-like object, given a path, and, so, can use any of these back-ends for reading and writing, and makes it easy to use any new file-system back-end in the future. Choosing the back-end is automatic when using Dask and a URL like s3://mybucket/mydata.parq .

With the blossoming of interactive visualization technologies for Python, the prospect of end-to-end Big Data processing projects is now fully realizable.

fastparquet Status and Plans

As of the publication of this article, the fastparquet library can be considered beta―useful to the general public and able to cope with many situations, but with some caveats (see below). Please try your own use case and report issues and comments on the GitHub tracker . The code will continue to develop (contributions welcome), and we will endeavour to keep the documentation in sync and provide regular updates.

A number of nice-to-haves are planned, and work to improve the performance should be completed around the new year, 2017.

Further Helpful Information

We don't have the space to talk about it here, but documentation at RTD gives further details on:

How to iterate through Parquet-stored data, rather than load the whole data set into memory at once

Using Parquet with Dask-DataFrames for parallelism and on a distributed cluster

Getting the most out of performance

Reading and writing partitioned data

Data types understood by Parquet and fastparquet

fastparquet Caveats

Aside from the performance pointers, above, some specific things do not work in fastparquet, and for some of these, fixes are not planned―unless there is substantial community interest.

Some encodings are not supported, such as delta encoding, since we have no test data to develop against.

↧

Python 3.6.0 release candidate is now available

December 7, 2016, 3:37 pm

≫ Next: python tempfile 学习小结

≪ Previous: Introducing: fastparquet

python 3.6.0rc1 is the release candidate for Python 3.6, the next major

release of Python.

Code for 3.6.0 is now frozen. Assuming no release critical problems are

found prior to the 3.6.0 final release date, currently 2016-12-16, the

3.6.0 final release will be the same code base as this 3.6.0rc1.

Maintenance releases for the 3.6 series will follow at regular

Among the major new features in Python 3.6 are:

* PEP 468 - Preserving the order of **kwargs in a function

* PEP 487 - Simpler customization of class creation

* PEP 495 - Local Time Disambiguation

* PEP 498 - Literal String Formatting

* PEP 506 - Adding A Secrets Module To The Standard Library

* PEP 509 - Add a private version to dict

* PEP 515 - Underscores in Numeric Literals

* PEP 519 - Adding a file system path protocol

* PEP 520 - Preserving Class Attribute Definition Order

* PEP 523 - Adding a frame evaluation API to CPython

* PEP 524 - Make os.urandom() blocking on linux (during system startup)

* PEP 525 - Asynchronous Generators (provisional)

* PEP 526 - Syntax for Variable Annotations (provisional)

* PEP 528 - Change windows console encoding to UTF-8

* PEP 529 - Change Windows filesystem encoding to UTF-8

* PEP 530 - Asynchronous Comprehensions

Please see "What’s New In Python 3.6" for more information:

https://docs.python.org/3.6/whatsnew/3.6.html

You can find Python 3.6.0rc1 here:

https://www.python.org/downloads/release/python-360rc1/

Note that 3.6.0rc1 is still a preview release and thus its use is not recommended for

production environments

More information about the release schedule can be found here: https://www.python.org/dev/peps/pep-0494/

↧

python tempfile 学习小结

December 7, 2016, 4:59 pm

≫ Next: Python内置函数详解――总结篇

≪ Previous: Python 3.6.0 release candidate is now available

tempfile 这个模块主要是用来创建临时文件和目录，用完后会自动删除，省的你自己去创建一个文件、使用这个文件、再删除这个过程了。其中比较常用的是TemporaryFile和NamedTemporaryFile，其他觉得简单看看就可以了。

TemporaryFile 创建一个临时文件，关闭时自动删除 In [81]: tmp = tempfile.TemporaryFile() In [82]: type(tmp) Out[82]: file In [83]: tmp.write('I am lee\n') In [84]: tmp.seek(0) In [85]: tmp.read() Out[85]: 'I am lee\n’ #调用close()后文件就自动删除了 In [86]: tmp.close()

NamedTemporaryFile 类似于TemporaryFile，创建一个临时文件，可以得到文件名，delete参数决定文件关闭时是否删除

In [89]: tmp = tempfile.NamedTemporaryFile() #通过name属性可以获取到文件名 In [90]: tmp.name Out[90]: '/tmp/tmpijT5Aj' In [91]: os.path.exists(tmp.name) Out[91]: True In [92]: tmp.write("I am lee\n") In [93]: tmp.seek(0) In [94]: tmp.read() Out[94]: 'I am lee\n' #调用close()后文件名被删除了 In [95]: tmp.close() In [96]: os.path.exists(tmp.name) Out[96]: False

这两个函数说明文档里都有这么一句:This file-like object can be used in a with statement, just like a normal file.所以也可以配合with使用，省的再处理文件关闭之类的事情了

In [100]: with tempfile.NamedTemporaryFile() as tmp: .....: print tmp.name .....: tmp.write("I am lee\n") .....: tmp.seek(0) .....: print tmp.read() .....: print os.path.exists(tmp.name) .....: /tmp/tmp9U67Og I am lee True #文件已经被删除 In [101]: print os.path.exists(tmp.name) False 参考资料:

官方tempfile模块: https://docs.python.org/2.7/library/tempfile.html#module-tempfile

Python模块学习――tempfile: http://www.cnblogs.com/captain_jack/archive/2011/01/19/1939555.html

Python 学习笔记: http://wiki.jikexueyuan.com/project/the-python-study-notes-second-edition/files-and-directories.html

← python os.walk 学习小结

↧

Python内置函数详解――总结篇

December 7, 2016, 9:08 pm

≫ Next: GET and POST requests using Python

≪ Previous: python tempfile 学习小结

引言

国庆期间下定决心打算学习python，于是下载安装了开发环境。然后问题就来了，怎么开始呢？纠结一番，还是从官方帮助文档开始吧。可是全是英文啊，英语渣怎么破？那就边翻译边看边实践着做吧(顺便吐槽下百度翻译，同样的语句百度翻译出来的结果和谷歌翻译出来的结果差的不是一丢丢)。鉴于以往学习语言的经历，怕自己又向之前一样学了段时间之后又不了了之，也为了记录下学习过程的自己的一些理解和体会，所以硬着头皮决定开始了这个系列―― Python内置函数详解。我知道可能技术含量不好，可能其中还有些错误，但是人呢，总要有所坚持，才能有所收获吧。

2个多月来，将3.5版本中的68个内置函数，按顺序逐个进行了自认为详细的解析，现在是时候进行个总结了。为了方便记忆，将这些内置函数进行了如下分类：

数学运算 abs：求数值的绝对值 >>> abs(-2) 2 divmod：返回两个数值的商和余数 >>> divmod(5,2) (2, 1) >> divmod(5.5,2) (2.0, 1.5) max：返回可迭代对象中的元素中的最大值或者所有参数的最大值 >>> max(1,2,3) # 传入3个参数取3个中较大者 3 >>> max('1234') # 传入1个可迭代对象，取其最大元素值 '4' >>> max(-1,0) # 数值默认去数值较大者 0 >>> max(-1,0,key = abs) # 传入了求绝对值函数，则参数都会进行求绝对值后再取较大者 -1 min：返回可迭代对象中的元素中的最小值或者所有参数的最小值 >>> min(1,2,3) # 传入3个参数取3个中较小者 1 >>> min('1234') # 传入1个可迭代对象，取其最小元素值 '1' >>> min(-1,-2) # 数值默认去数值较小者 -2 >>> min(-1,-2,key = abs) # 传入了求绝对值函数，则参数都会进行求绝对值后再取较小者 -1 pow：返回两个数值的幂运算值或其与指定整数的模值 >>> pow(2,3) >>> 2**3 >>> pow(2,3,5) >>> pow(2,3)%5 round：对浮点数进行四舍五入求值 >>> round(1.1314926,1) 1.1 >>> round(1.1314926,5) 1.13149 sum：对元素类型是数值的可迭代对象中的每个元素求和 # 传入可迭代对象 >>> sum((1,2,3,4)) 10 # 元素类型必须是数值型 >>> sum((1.5,2.5,3.5,4.5)) 12.0 >>> sum((1,2,3,4),-10) 0 类型转换 bool：根据传入的参数的逻辑值创建一个新的布尔值 >>> bool() #未传入参数 False >>> bool(0) #数值0、空序列等值为False False >>> bool(1) True int：根据传入的参数创建一个新的整数 >>> int() #不传入参数时，得到结果0。 0 >>> int(3) 3 >>> int(3.6) 3 float：根据传入的参数创建一个新的浮点数 >>> float() #不提供参数的时候，返回0.0 0.0 >>> float(3) 3.0 >>> float('3') 3.0 complex：根据传入参数创建一个新的复数 >>> complex() #当两个参数都不提供时，返回复数 0j。 0j >>> complex('1+2j') #传入字符串创建复数 (1+2j) >>> complex(1,2) #传入数值创建复数 (1+2j) str：返回一个对象的字符串表现形式(给用户) >>> str() '' >>> str(None) 'None' >>> str('abc') 'abc' >>> str(123) '123' bytearray：根据传入的参数创建一个新的字节数组 >>> bytearray('中文','utf-8') bytearray(b'\xe4\xb8\xad\xe6\x96\x87') bytes：根据传入的参数创建一个新的不可变字节数组 >>> bytes('中文','utf-8') b'\xe4\xb8\xad\xe6\x96\x87' memoryview：根据传入的参数创建一个新的内存查看对象 >>> v = memoryview(b'abcefg') >>> v[1] 98 >>> v[-1] 103 ord：返回Unicode字符对应的整数 >>> ord('a') 97 chr：返回整数所对应的Unicode字符 >>> chr(97) #参数类型为整数 'a' bin：将整数转换成2进制字符串 >>> bin(3) '0b11' oct：将整数转化成8进制数字符串 >>> oct(10) '0o12' hex：将整数转换成16进制字符串 >>> hex(15) '0xf' tuple：根据传入的参数创建一个新的元组 >>> tuple() #不传入参数，创建空元组 () >>> tuple('121') #传入可迭代对象。使用其元素创建新的元组 ('1', '2', '1') list：根据传入的参数创建一个新的列表 >>>list() # 不传入参数，创建空列表 [] >>> list('abcd') # 传入可迭代对象，使用其元素创建新的列表 ['a', 'b', 'c', 'd'] dict：根据传入的参数创建一个新的字典 >>> dict() # 不传入任何参数时，返回空字典。 {} >>> dict(a = 1,b = 2) # 可以传入键值对创建字典。 {'b': 2, 'a': 1} >>> dict(zip(['a','b'],[1,2])) # 可以传入映射函数创建字典。 {'b': 2, 'a': 1} >>> dict((('a',1),('b',2))) # 可以传入可迭代对象创建字典。 {'b': 2, 'a': 1} set：根据传入的参数创建一个新的集合 >>>set() # 不传入参数，创建空集合 set() >>> a = set(range(10)) # 传入可迭代对象，创建集合 >>> a {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} frozenset：根据传入的参数创建一个新的不可变集合 >>> a = frozenset(range(10)) >>> a frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}) enumerate：根据可迭代对象创建枚举对象 >>> seasons = ['Spring', 'Summer', 'Fall', 'Winter'] >>> list(enumerate(seasons)) [(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')] >>> list(enumerate(seasons, start=1)) #指定起始值 [(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')] range：根据传入的参数创建一个新的range对象 >>> a = range(10) >>> b = range(1,10) >>> c = range(1,10,3) >>> a,b,c # 分别输出a,b,c (range(0, 10), range(1, 10), range(1, 10, 3)) >>> list(a),list(b),list(c) # 分别输出a,b,c的元素 ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 4, 7]) >>> iter：根据传入的参数创建一个新的可迭代对象 >>> a = iter('abcd') #字符串序列 >>> a <str_iterator object at 0x03FB4FB0> >>> next(a) 'a' >>> next(a) 'b' >>> next(a) 'c' >>> next(a) 'd' >>> next(a) Traceback (most recent call last): File "<pyshell#29>", line 1, in <module> next(a) StopIteration slice：根据传入的参数创建一个新的切片对象 >>> c1 = slice(5) # 定义c1 >>> c1 slice(None, 5, None) >>> c2 = slice(2,5) # 定义c2 >>> c2 slice(2, 5, None) >>> c3 = slice(1,10,3) # 定义c3 >>> c3 slice(1, 10, 3) super：根据传入的参数创建一个新的子类和父类关系的代理对象 #定义父类A >>> class A(object): def __init__(self): print('A.__init__') #定义子类B，继承A >>> class B(A): def __init__(self): print('B.__init__') super().__init__() #super调用父类方法 >>> b = B() B.__init__ A.__init__ object：创建一个新的object对象 >>> a = object() >>> a.name = 'kim' # 不能设置属性 Traceback (most recent call last): File "<pyshell#9>", line 1, in <module> a.name = 'kim' AttributeError: 'object' object has no attribute 'name' 序列操作 all：判断可迭代对象的每个元素是否都为True值 >>> all([1,2]) #列表中每个元素逻辑值均为True，返回True True >>> all([0,1,2]) #列表中0的逻辑值为False，返回False False >>> all(()) #空元组 True >>> all({}) #空字典 True any：判断可迭代对象的元素是否有为True值的元素 >>> any([0,1,2]) #列表元素有一个为True，则返回True True >>> any([0,0]) #列表元素全部为False，则返回False False >>> any([]) #空列表 False >>> any({}) #空字典 False f ilter：使用指定方法过滤可迭代对象的元素 >>> a = list(range(1,10)) #定义序列 >>> a [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> def if_odd(x): #定义奇数判断函数 return x%2==1 >>> list(filter(if_odd,a)) #筛选序列中的奇数 [1, 3, 5, 7, 9] map：使用指定方法去作用传入的每个可迭代对象的元素，生成新的可迭代对象 >>> a = map(ord,'abcd') >>> a <map object at 0x03994E50> >>> list(a) [97, 98, 99, 100] next：返回可迭代对象中的下一个元素值 >>> a = iter('abcd') >>> next(a) 'a' >>> next(a) 'b' >>> next(a) 'c' >>> next(a) 'd' >>> next(a) Traceback (most recent call last): File "<pyshell#18>", line 1, in <module> next(a) StopIteration #传入default参数后，如果可迭代对象还有元素没有返回，则依次返回其元素值，如果所有元素已经返回，则返回default指定的默认值而不抛出StopIteration 异常 >>> next(a,'e') 'e' >>> next(a,'e') 'e' reversed：反转序列生成新的可迭代对象 >>> a = reversed(range(10)) # 传入range对象 >>> a # 类型变成迭代器 <range_iterator object at 0x035634E8> >>> list(a) [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] sorted：对可迭代对象进行排序，返回一个新的列表 >>> a = ['a','b','d','c','B','A'] >>> a ['a', 'b', 'd', 'c', 'B', 'A'] >>> sorted(a) # 默认按字符ascii码排序 ['A', 'B', 'a', 'b', 'c', 'd'] >>> sorted(a,key = str.lower) # 转换成小写后再排序，'a'和'A'值一样，'b'和'B'值一样 ['a', 'A', 'b', 'B', 'c', 'd'] zip：聚合传入的每个迭代器中相同位置的元素，返回一个新的元组类型迭代器 >>> x = [1,2,3] #长度3 >>> y = [4,5,6,7,8] #长度5 >>> list(zip(x,y)) # 取最小长度3 [(1, 4), (2, 5), (3, 6)]

对象操作

help：返回对象的帮助信息 >>> help(str) Help on class str in module builtins: class str(object) | str(object='') -> str | str(bytes_or_buffer[, encoding[, errors]]) -> str | | Create a new string object from the given object. If encoding or | errors is specified, then the object must expose a data buffer | that will be decoded using the given encoding and error handler. | Otherwise, returns the result of object.__str__() (if defined) | or repr(object). | encoding defaults to sys.getdefaultencoding(). | errors defaults to 'strict'. | | Methods defined here: | | __add__(self, value, /) | Return self+value. | *************************** dir：返回对象或者当前作用域内的属性列表 >>> import math >>> math <module 'math' (built-in)> >>> dir(math) ['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc'] id：返回对象的唯一标识符 >>> a = 'some text' >>> id(a) 69228568 hash：获取对象的哈希值 >>> hash('good good study') 1032709256 type：返回对象的类型，或者根据传入的参数创建一个新的类型 >>> type(1) # 返回对象的类型 <class 'int'> #使用type函数创建类型D，含有属性InfoD >>> D = type('D',(A,B),dict(InfoD='some thing defined in D')) >>> d = D() >>> d.InfoD 'some thing defined in D' len：返回对象的长度 >>> len('abcd') # 字符串 >>> len(bytes('abcd','utf-8')) # 字节数组 >>> len((1,2,3,4)) # 元组 >>> len([1,2,3,4]) # 列表 >>> len(range(1,5)) # range对象 >>> len({'a':1,'b':2,'c':3,'d':4}) # 字典 >>> len({'a','b','c','d'}) # 集合 >>> len(frozenset('abcd')) #不可变集合 ascii：返回对象的可打印表字符串表现方式 >>> ascii(1) '1' >>> ascii('&') "'&'" >>> ascii(9000000) '9000000' >>> ascii('中文') #非ascii字符 "'\\u4e2d\\u6587'" format：格式化显示值 #字符串可以提供的参数 's' None >>> format('some string','s') 'some string' >>> format('some string') 'some string' #整形数值可以提供的参数有 'b' 'c' 'd' 'o' 'x' 'X' 'n' None >>> format(3,'b') #转换成二进制 '11' >>> format(97,'c') #转换unicode成字符 'a' >>> format(11,'d') #转换成10进制 '11' >>> format(11,'o') #转换成8进制 '13' >>> format(11,'x') #转换成16进制小写字母表示 'b' >>> format(11,'X') #转换成16进制大写字母表示 'B' >>> format(11,'n') #和d一样 '11' >>> format(11) #默认和d一样 '11' #浮点数可以提供的参数有 'e' 'E' 'f' 'F' 'g' 'G' 'n' '%' None >>> format(314159267,'e') #科学计数法，默认保留6位小数 '3.141593e+08' >>> format(314159267,'0.2e') #科学计数法，指定保留2位小数 '3.14e+08' >>> format(314159267,'0.2E') #科学计数法，指定保留2位小数，采用大写E表示 '3.14E+08' >>> format(314159267,'f') #小数点计数法，默认保留6位小数 '314159267.000000' >>> format(3.14159267000,'f') #小数点计数法，默认保留6位小数 '3.141593' >>> format(3.14159267000,'0.8f') #小数点计数法，指定保留8位小数 '3.14159267' >>> format(3.14159267000,'0.10f') #小数点计数法，指定保留10位小数 '3.1415926700' >>> format(3.14e+1000000,'F') #小数点计数法，无穷大转换成大小字母 'INF' #g的格式化比较特殊，假设p为格式中指定的保留小数位数，先尝试采用科学计数法格式化，得到幂指数exp，如果-4<=exp<p，则采用小数计数法，并保留p-1-exp位小数，否则按小数计数法计数，并按p-1保留小数位数 >>> format(0.00003141566,'.1g') #p=1,exp=-5 ==》 -4<=exp<p不成立，按科学计数法计数，保留0位小数点 '3e-05' >>> format(0.00003141566,'.2g') #p=1,exp=-5 ==》 -4<=exp<p不成立，按科学计数法计数，保留1位小数点 '3.1e-05' >>> format(0.00003141566,'.3g') #p=1,exp=-5 ==》 -4<=exp<p不成立，按科学计数法计数，保留2位小数点 '3.14e-05' >>> format(0.00003141566,'.3G') #p=1,exp=-5 ==》 -4<=exp<p不成立，按科学计数法计数，保留0位小数点，E使用大写 '3.14E-05' >>> format(3.1415926777,'.1g') #p=1,exp=0 ==》 -4<=exp<p成立，按小数计数法计数，保留0位小数点 '3' >>> format(3.1415926777,'.2g') #p=1,exp=0 ==》 -4<=exp<p成立，按小数计数法计数，保留1位小数点 '3.1' >>> format(3.1415926777,'.3g') #p=1,exp=0 ==》 -4<=exp<p成立，按小数计数法计数，保留2位小数点 '3.14' >>> format(0.00003141566,'.1n') #和g相同 '3e-05' >>> format(0.00003141566,'.3n') #和g相同 '3.14e-05' >>> format(0.00003141566) #和g相同 '3.141566e-05' vars：返回当前作用域内的局部变量和其值组成的字典，或者返回对象的属性列表 #作用于类实例 >>> class A(object): pass >>> a.__dict__ {} >>> vars(a) {} >>> a.name = 'Kim' >>> a.__dict__ {'name': 'Kim'} >>> vars(a) {'name': 'Kim'}

反射操作

__import__：动态导入模块 index = __import__('index') index.sayHello() isinstance：判断对象是否是类或者类型元组中任意类元素的实例 >>> isinstance(1,int) True >>> isinstance(1,str) False >>> isinstance(1,(int,str)) True issubclass：判断类是否是另外一个类或者类型元组中任意类元素的子类 >>> issubclass(bool,int) True >>> issubclass(bool,str) False >>> issubclass(bool,(str,int)) True hasattr：检查对象是否含有属性 #定义类A >>> class Student: def __init__(self,name): self.name = name >>> s = Student('Aim') >>> hasattr(s,'name') #a含有name属性 True >>> hasattr(s,'age') #a不含有age属性 False getattr：获取对象的属性值 #定义类Student >>> class Student: def __init__(self,name): self.name = name >>> getattr(s,'name') #存在属性name 'Aim' >>> getattr(s,'age',6) #不存在属性age，但提供了默认值，返回默认值 >>> getattr(s,'age') #不存在属性age，未提供默认值，调用报错 Traceback (most recent call last): File "<pyshell#17>", line 1, in <module> getattr(s,'age') AttributeError: 'Stduent' object has no attribute 'age' setattr：设置对象的属性值 >>> class Student: def __init__(self,name): self.name = name >>> a = Student('Kim') >>> a.name 'Kim' >>> setattr(a,'name','Bob') >>> a.name 'Bob' delattr：删除对象的属性 #定义类A >>> class A: def __init__(self,name): self.name = name def sayHello(self): print('hello',self.name) #测试属性和方法 >>> a.name '小麦' >>> a.sayHello() hello 小麦 #删除属性 >>> delattr(a,'name') >>> a.name Traceback (most recent call last): File "<pyshell#47>", line 1, in <module> a.name AttributeError: 'A' object has no attribute 'name' callable：检测对象是否可被调用 >>> class B: #定义类B def __call__(self): print('instances are callable now.') >>> callable(B) #类B是可调用对象 True >>> b = B() #调用类B >>> callable(b) #实例b是可调用对象 True >>> b() #调用实例b成功 instances are callable now.

变量操作

globals：返回当前作用域内的全局变量和其值组成的字典 >>> globals() {'__spec__': None, '__package__': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', '__doc__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>} >>> a = 1 >>> globals() #多了一个a {'__spec__': None, '__package__': None, '__builtins__': <module 'builtins' (built-in)>, 'a': 1, '__name__': '__main__', '__doc__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>} locals：返回当前作用域内的局部变量和其值组成的字典 >>> def f(): print('before define a ') print(locals()) #作用域内无变量 a = 1 print('after define a') print(locals()) #作用域内有一个a变量，值为1 >>> f <function f at 0x03D40588> >>> f() before define a {} after define a {'a': 1}

交互操作

print：向标准输出对象打印输出 >>> print(1,2,3) 1 2 3 >>> print(1,2,3,sep = '+') 1+2+3 >>> print(1,2,3,sep = '+',end = '=?') 1+2+3=? input：读取用户输入值 >>> s = input('please input your name:') please input your name:Ain >>> s 'Ain'

文件操作

open：使用指定的模式和编码打开文件，返回文件读写对象 # t为文本读写，b为二进制读写 >>> a = open('test.txt','rt') >>> a.read() 'some text' >>> a.close()

编译执行

compile：将字符串编译为代码或者AST对象，使之能够通过exec语句来执行或者eval进行求值 >>> #流程语句使用exec >>> code1 = 'for i in range(0,10): print (i)' >>> compile1 = compile(code1,'','exec') >>> exec (compile1) 0 1 2 3 4 5 6 7 8 9 >>> #简单求值表达式用eval >>> code2 = '1 + 2 + 3 + 4' >>> compile2 = compile(code2,'','eval') >>> eval(compile2) 10 eval：执行动态表达式求值 >>> eval('1+2+3+4') 10 exec：执行动态语句块 >>> exec('a=1+2') #执行语句 >>> a 3 repr：返回一个对象的字符串表现形式(给解释器) >>> a = 'some text' >>> str(a) 'some text' >>> repr(a) "'some text'"

装饰器

property：标示属性的装饰器 >>> class C: def __init__(self): self._name = '' @property def name(self): """i'm the 'name' property.""" return self._name @name.setter def name(self,value): if value is None: raise RuntimeError('name can not be None') else: self._name = value >>> c = C() >>> c.name # 访问属性 '' >>> c.name = None # 设置属性时进行验证 Traceback (most recent call last): File "<pyshell#84>", line 1, in <module> c.name = None File "<pyshell#81>", line 11, in name raise RuntimeError('name can not be None') RuntimeError: name can not be None >>> c.name = 'Kim' # 设置属性 >>> c.name # 访问属性 'Kim' >>> del c.name # 删除属性，不提供deleter则不能删除 Traceback (most recent call last): File "<pyshell#87>", line 1, in <module> del c.name AttributeError: can't delete attribute >>> c.name 'Kim' classmethod：标示方法为类方法的装饰器 >>> class C: @classmethod def f(cls,arg1): print(cls) print(arg1) >>> C.f('类对象调用类方法') <class '__main__.C'> 类对象调用类方法 >>> c = C() >>> c.f('类实例对象调用类方法') <class '__main__.C'> 类实例对象调用类方法 staticmethod：标示方法为静态方法的装饰器

# 使用装饰器定义静态方法 >>> class Student(object): def __init__(self,name): self.name = name @staticmethod def sayHello(lang): print(lang) if lang == 'en': print('Welcome!') else: print('你好！') >>> Student.sayHello('en') #类调用,'en'传给了lang参数 en Welcome! >>> b = Student('Kim') >>> b.sayHello('zh') #类实例对象调用,'zh'传给了lang参数 zh 你好

↧

GET and POST requests using Python

December 7, 2016, 9:07 pm

≫ Next: MonkeyRunner填坑之jython

≪ Previous: Python内置函数详解――总结篇

This post discusses two HTTP (Hypertext Transfer Protocol) request methods GET and POST requests inpython and their implementation in python.

What is HTTP?

HTTP is a set of protocols designed to enable communication between clients and servers. It works as a request-response protocol between a client and server.

A web browser may be the client, and an application on a computer that hosts a web site may be the server.

So, to request a response from the server, there are mainly two methods:

GET : to request data from the server. POST : to submit data to be processed to the server. Here is a simple diagram which explains the basic concept of GET and POST methods.
GET and POST requests using Python

Now, to make HTTP requests in python, we can use several HTTP libraries like: httplib urllib requests

The most elegant and simplest of above listed libraries is Requests. We will be using requests library in this article. To download and install Requests library, use following command:

pip install requests

OR, download it from here and install manually.

Making a Get request # importing the requests library
import requests
# api-endpoint
URL = "http://maps.googleapis.com/maps/api/geocode/json"
# location given here
location = "delhi technological university"
# defining a params dict for the parameters to be sent to the API
PARAMS = {'address':location}
# sending get request and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
# extracting data in json format
data = r.json()
# extracting latitude, longitude and formatted address
# of the first matching location
latitude = data['results'][0]['geometry']['location']['lat']
longitude = data['results'][0]['geometry']['location']['lng']
formatted_address = data['results'][0]['formatted_address']
# printing the output
print("Latitude:%s\nLongitude:%s\nFormatted Address:%s"
%(latitude, longitude,formatted_address))

Output:

The above example finds latitude, longitude and formatted address of a given location by sending a GET request to the Google Maps API. An API (Application Programming Interface) enables you to access the internal features of a program in a limited fashion. And in most cases, the data provided is in JSON(javascript Object Notation) format (which is implemented as dictionary objects in Python!).

Important points to infer :

PARAMS = {'address':location}

The URL for a GET request generally carries some parameters with it. For requests library, parameters can be defined as a dictionary. These parameters are later parsed down and added to the base url or the api-endpoint.

To understand the parameters role, try to print r.url after the response object is created. You will see something like this:

http://maps.googleapis.com/maps/api/geocode/json?address=delhi+technological+university

This is the actual URL on which GET request is made

r = requests.get(url = URL, params = PARAMS)

Here we create a response object ‘r’ which will store the request-response. We use requests.get() method since we are sending a GET request. The two arguments we pass are url and the parameters dictionary.

data = r.json()

Now, in order to retrieve the data from the response object, we need to convert the raw response content into a JSON type data structure. This is achieved by using json() method. Finally, we extract the required information by parsing down the JSON type object.

Making a POST request # importing the requests library
import requests
# defining the api-endpoint
API_ENDPOINT = "http://pastebin.com/api/api_post.php"
# your API key here
API_KEY = "XXXXXXXXXXXXXXXXX"
# your source code here
source_code = '''
print("Hello, world!")
a = 1
b = 2
print(a + b)
'''
# data to be sent to api
data = {'api_dev_key':API_KEY,
'api_option':'paste',
'api_paste_code':source_code,
'api_paste_format':'python'}
# sending post request and saving response as response object
r = requests.post(url = API_ENDPOINT, data = data)
# extracting response text
pastebin_url = r.text
print("The pastebin URL is:%s"%pastebin_url)

This example explains how to paste your source_code to pastebin.com by sending POST request to the PASTEBIN API.

First of all, you will need to generate an API key by signing up here and then access your API key here.

Important features of this code: data = {'api_dev_key':API_KEY,
'api_option':'paste',
'api_paste_code':source_code,
'api_paste_format':'python'}

Here again, we will need to pass some data to API server. We store this data as a dictionary.

r = requests.post(url = API_ENDPOINT, data = data)

Here we create a response object ‘r’ which will store the request-response. We use requests.post() method since we are sending a POST request. The two arguments we pass are url and the data dictionary.

pastebin_url = r.text

In response, the server processes the data sent to it and sends the pastebin URL of your source_code which can be simply accessed by r.text .

requests.postmethod could be used for many other tasks as well like filling and submitting the web forms, posting on your FB timeline using the Facebook Graph API, etc.

Here are some important points to ponder upon: When the method is GET, all form data is encoded into the URL, appended to the action URL as query string parameters. With POST, form data appears within the message body of the HTTP request. In GET method, the parameter data is limited to what we can stuff into the request line (URL). Safest to use less than 2K of parameters, some servers handle up to 64K.No such problem in POST method since we send data in message body of the HTTP request, not the URL. Only ASCII characters are allowed for data to be sent in GET method.There is no such restriction in POST method. GET is less secure compared to POST because data sent is part of the URL. So, GET method should not be used when sending passwords or other sensitive information.

This blog is contributed by Nikhil Kumar . If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

↧

MonkeyRunner填坑之jython

December 7, 2016, 10:57 pm

≫ Next: python的yield用法

≪ Previous: GET and POST requests using Python

MonkeyRunner使用的jython环境是jython-standalone-2.5.3，写好的python脚本，运行才发现， import json 报了 import error ，看了2.7的jython包是包含的，本想替换却发现不行，只能另寻出路。

最后解决方案，手动下载 simplejson ：

import sys,time,datetime sys.path.append('simplejson-3.10.0') if not ('simplejson-3.10.0' in sys.path): sys.path.append('simplejson-3.10.0') try: import json except ImportError: import simplejson as json

这算完了吗？显然没有，Mac下运行好好的脚本，移到windows又傻逼了：

LookupError: unknown encoding 'ms936' ，这个简单，解决办法就是在执行monkeyrunner之前先在控制台执行 chcp437 即可。

↧

python的yield用法

December 7, 2016, 10:56 pm

≫ Next: 为什么抵制Python 3

≪ Previous: MonkeyRunner填坑之jython

yield在使用过程中需要用到函数内部，现在已经不能单独的使用了。含有yield的函数一般被认为是generator或者是产生generator的函数.直接上网上的斐波那契例子:

def fab(max): n, a, b = 0, 0, 1 while n < max: #print b yield b a, b = b, a + b n = n + 1

c=fab(5)代表的是一个generator 注意fab和fab(5)不一样。代码在运行过程中遇到yield的时候会将代码中断，然后c这个迭代器进行c.next()的时候会继续运行之前中断后面的代码，直到再次遇见c.next()。这个时候yield d会将b作为c.next()的返回结果。可以将yield看为这样的函数 put --> wait_and_get.先输入（send函数）,然后等待下次的迭代并且将上次的迭代输出的结果作为输入。所以在fab里面的while循环里面，b的值看着像是有记忆功能似的，会记录下上次迭代得到的b值。

借助yield简化了代码，并且实现了generator，运行速度也不错。

EOF

↧

为什么抵制Python 3

December 8, 2016, 1:59 am

≫ Next: Codementor: Extending Apache Pig with Python UDFs

≪ Previous: python的yield用法

为什么抵制python 3

5小时前来源：CSDN

这份文档列出了为什么初学者应该避免学Python 3的原因，在这里我给出两类原因：第一类是针对零基础的，另一类是对于有一定编程基础的人来说的。第一部分，将会从非技术的角度谈论，帮助初学者不受外部宣传和压力的影响做出合理的决定。第二部分，将讨论目前Python 3存在的缺陷，以及这些缺陷为什么会阻碍程序员的工作。

我不会教初学者Python 3，因为我不想让他们觉得自己“在编程方面糟透了”――而实际上，这并不是他们的错，而是Python 3的问题。对他们来说，这很不公平，所以我一般会教Python 2，最大限度地降低他们的阻碍。

最重要的原因

在讲技术层面的东西之前，我想讨论一下作为初学者不应该使用Python 3最根本的社会原因：

Python 3极有可能是一个巨大的失败，它将会杀了Python。

Python 3的接受度只有30%，可能不是很准确，不过总归不高。我认为，Python 3基本上算是“死了”，也不会走得更远了，原因在下面列出。这些原因都是易用性的问题，很容易就能修改，但是Python项目组采取了一种傲慢的态度，认为这是一种feature，是对用户好。这是一门语言死亡的信号，所以如果你计划将来使用Python 3，有可能陷入麻烦。

道理很简单。如果你学Python 2，你可以使用所有Python 2的库，除非这门语言死了，或者你有更高的需求。但是如果你学Python 3的话，未来就很不确定了。有很大的可能，到头来你还得从Python 2重新开始。

现在已经十年过去了，可能下一个十年过去，Python 3的接受程度还是30%。即使是在Python最擅长的领域――科学方面，接受程度也是只有25-30%。现在，是时候承认它的失败，采取新计划了。

初学者可以理解的原因

除了Python 3的不确定之外，还有很多原因。在这部分中，我将从非程序员也能理解的角度描述，如果你理解不了的话，也可以继续读下面技术方面的讨论。

没有考虑到你的兴趣

Python项目组极力劝说你使用Python 3，并不是站在你的角度考虑，而是为了推广Python项目。

这也是你不应该使用Python 3的根本原因。为什么他们向你推荐一个只有30%接受度、还在不断变动、充满问题的语言？原因是，越多的人从Python 3开始，意味着越多的接受度，这对Python 3有好处。他们不关心你的问题，不关心你遇到多么大的阻力，也不关心你因为Python 3的技术限制很可能学不下去。他们只担心你，一个Python的初学者，可能会壮大Python 3的使用者队伍。

我只希望你开始编程，不在乎你用什么语言。这就是为什么我有两本Python和Ruby免费书籍的原因。语言对我来说并不是很重要，我也不会向你推荐一门快要破产的语言。Python项目组却不在乎这么做，所以你应该避免上了他们的当，直到他们把问题都修补好。很多人在这条路上越走越远，甚至开始禁我的书，就因为此书的内容不支持Python 3。

换个角度说，如果Python 3真有那么好，他们也不需要说服你，你自然而然就会去用，也不会有什么问题。相反，他们没有把精力放在修复问题上，反而去宣传，利用社会压力和市场来说服你使用Python 3。从技术上看，他们会采取市场和宣传手段，恰恰说明Python 3是有缺陷的。

这种忽视Python 3本身存在的问题去宣传的行为，在我看来很不合理。

错误的决策

Python项目决定，Python 3将不兼容Python 2的代码。即使基础计算科学证明，新版本的语言兼容旧版本是可行的，但是他们还是声称“不可能”。鉴于他们控制着语言的实现，又故意让Python 3不兼容Python 2，我只好总结：他们也是故意让Python 3有缺陷的。这样做的结果是，与其选择花大工夫重写Python 2的代码，还要切换到一个截然不同的语言，大多数人选择不如就用Python 2吧。

这意味着如果你用Python 3，就是在一个很可能破产的平台工作，我个人是不能容忍教人们快要破产的东西的。

String难以使用

Python 3的String对初学者来说非常难用。其初衷是让Python更加国际化，但是却导致这个类型难以使用，错误信息很少。每次你想在程序中处理字符，都要搞明白Byte和Unicode的编码区别。不明白这是什么东西？错不在你。Python本来是一个简单的语言，对初学者非常友好，只需要简单的操作就能让程序工作，你可以一步一步地学习string。更糟糕的是，当String出错时（经常发生），得到的错误信息非常有有限，甚至不会告诉你哪一个变量需要修改。

除此之外，Python 3.6有三种不同的字符串格式化方式。这意味着你要学习三种完全不同的方法。即使是经验丰富的专业程序员，也不能轻松地记住这些方式和不断改变的feature。

我喜欢python的原因就是，它对初学者来说非常友好，不需要学习很深入地知识，马上就能写出有用的代码。而Python 3对国际化的尝试正好毁了这一点。

初学者开始编程，接触的第一件事就是String，Python使其变得如此复杂，像我这样的程序员都处理不好，很难让人坚持学下去。

核心库没有更新

很多核心库都用Python 3重写了，但是已经很久没有更新，新特性也没用上。Python 3变化如此之大，状态不稳定，让库怎么维护？在我自己的测试中，我发现当使用Unicode处理库的返回结果时就不能正常工作，得到的是Bytes。这意味着如果去使用一个你认为正常的库，说不定就会遇到一堆错误信息。这些错误信息是随机分布的，说不定有些能正确处理Unicode，却又在别的地方出错。

除非Python 3出一个标准，规定每一个库应该以什么样的编码返回结果，不然你永远不能依赖一个Python 3的库。解决方法是，创建一个“String”类型，能魔法般地感知字符串是Unicode编码的还是Bytes，但是Python上去反对任何他们认为是“魔法”的东西，所以他们坚持自己糟糕的设计，并声称这样做很好。

站在程序员的角度分析

这部分内容，初学者可以拿给你博学的朋友或大牛看。在这部分我用了很多代码示例，来展示为什么Python 3糟透了。目前的你可能对这部分内容有些疑惑，但是你的朋友能看的懂，而且很可能他们看了之后劝你不要碰Python 3。

开始之前，我想先声明一点，关于Python 3我已经写了一部分不错的教程，事实是我也想支持Python 3，但是在他们改良那些糟粕之前不会这么做，因为这违背了我的信念：初学者应该使用易用的东西。

Python 3虚拟机设计不良

不能兼容旧版本的代码是Python 3接受度不高的主要原因。下面是我的根据：

Python 3和Python 2一样使用了虚拟机本来Python 3代码是可以和Python 2代码在相同的虚拟机中运行的，它们控制着两种语言，完全可以做到兼容，但是却选择让用户手动转换代码 F#/C#和JRuby/Java就是很好的例子，前者同时控制着两种语言，后者没有，但是都做到了兼容 Java是向前兼容的另一个经典例子，也许对Python的迁移来说难度更小同一个虚拟机不能同时支持Python 2和Python 3，增加了Python 2代码的转化难度。更糟的是，Python项目组没有提供一个平滑的转换过程，而是让你去手动转换，同时支持两种语言当被问及为什么不同时支持两种语言时，Python项目的人告诉我，这是不可能的。但从技术上讲，这是完全可能的，而且这对代码迁移来说优点很多 Python 3的设计缺陷提高了迁移难度，降低了接受度，强迫你做代码转换工作。由此带来的问题，意味着Python 3有很大的可能性不会成功当我说“不会成功”的时候，人们可能想，这是相比于Python 2而言的，但是实际上我是相较于其他编程语言而言的对我而言，我认为因为设计上的不良，就需要人们耗费无意义的额外劳动是不合理的。我很怀疑他们是否能改正这些缺点，但是据上面这些证据，Python 3是不会成功的

故意挫败2to3翻译器

我曾经说过100%确定2与3兼容是可能的，像2to3转化这种东西没有必要。我觉得这种说法不妥，正确的说法应该是，“如果Python 2和3设计良好，那么2to3转换应该是能完美工作的。”

当然，我知道这并不算是一个有说服力的理由。代码转换高成本，以及不兼容Python 2，糟糕的2to3转换等，只是问题的一个表现，并不算得上是一个原因。

静态类型字符串

Python是一个动态类型的语言。这意味着我并不需要知道变量的类型就可以使用它。只要它表现得和另一种类型一样，我就可以把它当做是另一种类型。至于这是否符合计算机科学，就不重要了。动态类型的特性让Python变得易于使用，也是我将它推荐给初学者的原因。

在Python 3中，我不能像下面这样灵活地写代码了：

def addstring(a, b): return a + b def catstring(a, b): return "{}{}".format(a, b)

如果我们有一个String和一个Bytes类型，那么调用第一个函数会得到一个error，第二个会得到一个用repr格式化的byte string。我比较倾向于第一个结果，至少会让我知道这个地方有问题，而不是在各个地方埋下隐患。

下面是我在Python 3.5中使用这两个函数的方法：

Python 3.5.1 (default, Sep 16 2016, 13:36:12) [GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.29)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> def catstring(a, b): ... return "{} {}".format(a,b) ... >>> x = bytes("hello", 'utf-8') >>> y = "hello" >>> catstring(x, y) "b'hello' hello" >>> >>> def addstring(a, b): ... return a + b ... >>> addstring(x, y) Traceback (most recent call last): File "", line 2, in addstring TypeError: can't concat bytes to str >>> ^D

下面是Python 2的版本：

Python 2.7.11 (default, May 25 2016, 05:27:56) [GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.29)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> def catstring(a, b): ... return "{}{}".format(a,b) ... >>> def addstring(a, b): ... return a + b ... >>> x = "hello" >>> y = bytes("hello") >>> catstring(x, y) 'hellohello' >>> addstring(x, y) 'hellohello' >>>

关于为什么Python 3的版本表现和Python 2不同，没有技术上的原因，仅仅是因为他们决定让你手动去处理类型问题。经典的“Macho Code Guy”被抛弃了，这不再是适合初学者的一门语言。难用的东西并不是好的编程语言或好的程序员做出来的事情，而是傲慢的人做出来的事情。然而，当用户指出来这很难用的时候，他们的说法则是这对你好。这完全是对糟糕设计的辩解。

另外，错误提示也非常大男子主义，非常简短：

", line 2, in addstring TypeError: can't concat bytes to str

如果他们执意要求初学者弄明白什么是Bytes，什么是Unicode的话，至少他们要告诉人们哪些变量是Bytes，哪些是Strings。

动态与静态类型不匹配

string作为静态类型的一个致命缺点是，Python缺少相关处理的类型安全组件。Python是一个动态语言，函数声明的时候不支持声明类型。它也不是静态编译的，所以直到你运行代码之前，你都不知道是否存在类型错误。过去没有这些特性，也能正常工作，因为你可以写测试，Python动态的特性让Python只要方法类型的签名相同就能正常工作。

字符串也经常从外部资源得到，比如Socket连接，文件或者类似的输入。这意味着Python 3的静态类型字符串和静态类型安全缺失会导致Python 3程序崩溃更频繁，与Python 2相比将存在更多的安全问题。

核心库没有更新

抛开编码的问题不说，目前还有很多库，都不能正确地返回String结果。尤其是一些处理HTTP协议的库，很多情况下，库依然使用旧版的API，都依赖动态特性，但是在返回值方面没有更新。你以为返回的是String，但其实返回的是Bytes。

更愚蠢的是，Python原本有一个Chardet库，可以判断字节流的编码。如果Python 3非要像上面那样做，也可以将Chardet收入核心库，进行自动转换，让用户无需操心编码问题。这样的话，你就可以这样用：

x = bytes('hello', 'utf-8') y = x.guess

编码类型检测本来是一个已经解决的问题。其他语言差不多也都完美地处理好这个问题了。但是Python 3却认为，String的Unicode编码非常重要，用户应该自己处理。他们说易用性和安全性同样重要，但实际上并没有那么做。

太多的format选择

目前，Python 3支持旧的格式化风格：

print "Howdy %s" % ('Zed')

新的format风格：

print "Howdy {}".format('Zed')

Python 3.6会有一种新的format风格：

x = 'Zed' print f"Howdy {x}"

我比较偏向最后一种，不知道为什么之前不这样支持，而是用愚蠢的format函数。String插入应该是所有人用得最多一个特性。

现在的问题是，初学者必须了解目前所有的format格式，这实在是太多了。我在更新《笨方法学Python》（Learn Python the Hard Way）的时候，一般我只介绍Python 3.6的f-string风格，其他的只是提一下。

String的更多版本

最后，我听说现在又有一个新的String类型，同时支持Unicode和Bytes。这可能简化String的处理。但是我依然会让初学者避免学Python 3，直到“chimera string”能依赖动态很好地工作。在这之前，你都需要艰难地在一个动态语言里处理一个静态类型。

总结和警告

我已经坚持写Python 3存在的问题很久了，期间经过了五个版本。因为我希望它能为初学者带来帮助。每年我都尝试将我的一些代码用Python 3重写，都会遇到挫折。如果我都不能熟练地使用Python 3，那么对初学者来说就更不可能了。所以每年我都必须等其修复问题。我是真心喜欢Python，并且希望Python项目组能放下傲慢的态度，更加注重易用性。

这五个版本之后，我必须承认，Python 3已经不再适合初学者了。我希望这份列表能带来一些实质性的改变，但我对此并不抱太大期望。随着Python 3的开发，接受度却没有提高。Python 3项目组对此的措施是利用社会压力，市场鼓动来获得接受度，而不是去修复Python 3存在的问题。这很尴尬，修复技术上的问题很简单，但是错误地看待社会反应带来的问题就难对付了。

现在，Python社区已经被洗脑了，他们认为目前的String很好，Python 3不兼容Python 2是一种特殊的爱，甚至去反对语言翻译背后的数学，尝试用蛮力去转换Python 2代码。

残酷的现实是，如果Python 3被设计成同时支持Python 3和Python 2代码，String也和Python 2一样是动态的，情况就不会像现在这样。现在，我害怕的情况是，很多Python用户更倾向于使用一个稳定的语言，比如Go、Rust、Clojure或Elixir。切换到Python 3的成本太高了，倒不如直接切换到另一种不会破产，更稳定的语言。

也许Python项目更专注于让初学者学Python 3，是因为对这些人来说，没有什么转换成本。如果你按照他们的方式去学习，在Python 3的坑里摸索，那么你就忽视了其他可能的选择。他们牺牲初学者的利益去拯救一个需要修复的项目，这是不道德的。这意味着如果初学者学习编程失败了，那么不是他们自己的问题，而应该归咎于Python 3。

从我的意愿出发，我希望这篇文章能给他们一些警示，能改变他们做事情的方式，对别人友好一些。承认他们在Python 3设计上的瑕疵，在Python 4上拯救这个语言。遗憾的是，我很怀疑这个愿望是否能成真，很可能他们会觉得我在说些什么，到底懂不懂Python 3，认为我应该闭嘴。

无论如何，我都会继续用我认为最简单、最好的方式教Python 3，带给初学者学习编程的热情。

感谢阅读。

↧

Codementor: Extending Apache Pig with Python UDFs

December 8, 2016, 1:58 am

≫ Next: 人生苦短，我用python-- Day18 正则+组件+django框架

≪ Previous: 为什么抵制Python 3

( image source )

Introduction

Apache Pig is a popular system for executing complex Hadoop map-reduce based data-flows. It adds a layer of abstraction on top of Hadoop's map-reduce mechanisms in order to allow developers to take a high-level view of the data and operations on that data. Pig allows you to do things more explicitly. For example, you can join two or more data sources (much like an SQL join). Writing a join as a map and reduce function is a bit of a drag and it's usually worth avoiding. So Pig is great because it simplifies complex tasks - it provides a high-level scripting language that allows users to take more of a big-picture view of their data flow.

Pig is especially great because it is extensible. This tutorial will focus on its extensibility. By the end of this tutorial, you will be able to write PigLatin scripts that execute python code as a part of a larger map-reduce workflow. Pig can be extended with other languages too, but for now we'll stick to Python.

Before we continue

This tutorial relies on a bunch of knowledge. It'll be very useful if you know a little Python and PigLatin. It'll also be useful to know a bit about how map-reduce works in the context of Hadoop.

User Defined Functions (UDFs)

A Pig UDF is a function that is accessible to Pig, but written in a language that isn't PigLatin. Pig allows you to register UDFs for use within a PigLatin script. A UDF needs to fit a specific prototype - you can't just write your function however you want because then Pig won't know how to call your function, it won't know what kinds of arguments it needs, and it won't know what kind of return value to expect. There are a couple of basic UDF types:

Eval UDFs

This is the most common type of UDF. It's used in FOREACH type statements. Here's an example of an eval function in action:

users = LOAD 'user_data' AS (name: chararray); upper_users = FOREACH users GENERATE my_udfs.to_upper_case(name);

This code is fairly simple - Pig doesn't really do string processing so we introduce a UDF that does. There are some missing pieces that I'll get to later, specifically how Pig knows what my_udfs means and suchlike.

Aggregation UDFs

These are just a special case of an eval UDF. An Aggregate function is usually applied to grouped data. For example:

user_sales = LOAD 'user_sales' AS (name: chararray, price: float); grouped_sales = GROUP user_sales BY name; number_of_sales = FOREACH grouped_sales GENERATE group, COUNT(user_sales);

In other words, an aggregate UDF is a udf that is used to combine multiple pieces of information. Here we are aggregating sales data to show how many purchases were made by each user.

Filter UDFs

A filter UDF returns a boolean value. If you have a data source that has a bunch of rows and only a portion of those rows are useful for the current analysis then a filter function of some kind would be useful. An example of a filter function is action follows:

user_messages = LOAD 'user_twits' AS (name:chararray, message:chararray); rude_messages = FILTER user_messages by my_udfs.contains_naughty_words(message); Enough talk, let's code

In this section we'll be writing a couple of Python UDFs and making them accessible within PigLatin scripts.

Here's about the simplest Python UDF you can write:

from pig_util import outputSchema @outputSchema('word:chararray') def hi_world(): return "hello world"

The data output from a function has a specific form. Pig likes it if you specify the schema of the data because then it knows what it can do with that data. That's what the output_schema decorator is for. There are a bunch of different ways to specify a schema, we'll get to that in a little bit.

Now if that were saved in a file called "my_udfs.py" you would be able to make use of it in a PigLatin script like so:

-- first register it to make it available REGISTER 'myudf.py' using jython as my_special_udfs users = LOAD 'user_data' AS (name: chararray); hello_users = FOREACH users GENERATE name, my_special_udfs.hi_world(); Specifying the UDF output schema

Now a UDF has input and output. This little section is all about the outputs. Here we'll go over the different ways you can specify the output format of a Python UDF through use of the outputSchema decorator. We have a few options, here they are:

# our original udf # it returns a single chararray (that's PigLatin for String) @outputSchema('word:chararray') def hi_world(): return "hello world" # this one returns a Python tuple. Pig recognises the first element # of the tuple as a chararray like before, and the next one as a # long (a kind of integer) @outputSchema("word:chararray,number:long") def hi_everyone(): return "hi there", 15 #we can use outputSchema to define nested schemas too, here is a bag of tuples @outputSchema('some_bag:bag{t:(field_1:chararray, field_2:int)}') def bag_udf(): return [ ('hi',1000), ('there',2000), ('bill',0) ] #and here is a map @outputSchema('something_nice:map[]') def my_map_maker(): return {"a":"b", "c":"d", "e","f"}

So outputSchema can be used to imply that a function outputs one or a combination of basic types. Those types are:

chararray: like a string bytearray: a bunch of bytes in a row. Like a string but not as human friendly long: long integer int: normal integer double: floating point number datetime boolean

If no schema is specified then Pig assumes that the UDF outputs a bytearray.

UDF arguments

Not only does a UDF have outputs but inputs as well! This sentence should be filed under 'dah'. I reserved it for a separate section so as not to clutter the discussion on output schemas. This part is fairly straight-forward so I'm just going to breeze through it...

First some UDFs:

def deal_with_a_string(s1): return s1 + " for the win!" def deal_with_two_strings(s1,s2): return s1 + " " + s2 def square_a_number(i): return i*i def now_for_a_bag(lBag): lOut = [] for i,l in enumerate(lBag): lNew = [i,] + l lOut.append(lNew) return lOut

And here we make use of those UDFs in a PigLatin script:

REGISTER 'myudf.py' using jython as myudfs users = LOAD 'user_data' AS (firstname: chararray, lastname:chararray,some_integer:int); winning_users = FOREACH users GENERATE myudfs.deal_with_a_string(firstname); full_names = FOREACH users GENERATE myudfs.deal_with_two_strings(firstname,lastname); squared_integers = FOREACH users GENERATE myudfs.square_a_number(some_integer); users_by_number = GROUP users by some_integer; indexed_users_by_number = FOREACH users_by_number GENERATE group,myudfs.now_for_a_bag(users); Beyond Standard Python UDFs

There are a couple of gotchas to using Python in the form of a UDF. Firstly, even though we are writing our UDFs in Python, Pig executes them in Jython. Jython is an implementation of Python that runs on the Java Virtual Machine (JVM). Most of the time this is not an issue as Jython strives to implement all of the same features of CPython but there are some libraries that it doesn't allow. For example you can't use numpy from Jython.

Besides that, Pig doesn't really allow for Python Filter UDFs. You can only do stuff like this:

user_messages = LOAD 'user_twits' AS (name:chararray, message:chararray); --add a field that says whether it is naughty (1) or not (0) messages_with_rudeness = FOREACH user_messages GENERATE name,message,contains_naughty_words(message) as naughty; --then filter by the naughty field filtered_messages = FILTER messages_with_rudeness by (naughty==1); -- and finally strip away the naughty field rude_messages = FOREACH filtered_messages GENERATE name,message; Python Streaming UDFs

Pig allows you to hook into the Hadoop Streaming API, this allows us to get around the Jython issue when we need to. If you haven't heard of Hadoop Streaming before, here is the low down: Hadoop allows you to write mappers and reducers in any language that gives you access to stdin and stdout. So that's pretty much any language you want. Like Python 3 or even Cow . Since this is a Python tutorial the examples that follow will all be in Python but you can plug in whatever you want.

Here's a simple Python streaming script, lets call it simple_stream.py :

#! /usr/bin/env python import sys import string for line in sys.stdin: if len(line) == 0: continue l = line.split() #split the line by whitespace for i,s in enumerate(l): print "{key}\t{value}\n".format(key=i,value=s) # give out a key value pair for each word in the line

The aim is to get Hadoop to run the script on each node. That means that the hash bang line ( #! ) needs to be valid on every node, all the import statements must be valid on every node (any packages imported must be installed on each node); and any other system level files or resources accessed within the Python script must be accessible in the same way on every node.

Ok, onto the Pig stuff...

To make the streaming UDF accessible to Pig we make use of the define statement. You can read all about it here

Here is how we can use it with our simple_stream script:

DEFINE stream_alias 'simple_stream.py' SHIP('simple_stream.py'); user_messages = LOAD 'user_twits' AS (name:chararray, message:chararray); just_messages = FOREACH user_messages generate message; streamed = STREAM just_messages THROUGH stream_alias; DUMP streamed;

Lets look at that that DEFINE statement a little closer. The general format we are using is:

DEFINE alias 'command' SHIP('files');

The alias is the name we use to access our streaming function from within our PigLatin script. The command is the system command Pig will call when it needs to use our streaming function. And finally SHIP tells Pig which files and dependencies Pig needs to distribute to the Hadoop nodes for the command to be able to work.

Then once we have the resources we want to pass though the our streaming function we just use the STREAM command as above.

And that's it

Well, sort of. PigLatin is quite a big thing, this tutorial just barely scraped the surface of its capabilities. If all the LOADing and FOREACHing and suchlike didn't make sense to you the I would suggest checking out a more introductory PigLatin tutorial before coming back here. This tutorial should be enough to get you started in using Python from within Pig jobs.

Python is also quite a big thing. Understanding the Python import system is really worthwhile if you want to use Python on a Hadoop cluster. It's also worthwhile understanding some little details like how Python decorators work.

There are also some more technical ways of calling Python from Pig, this tutorial aimed to be an introduction to UDFs, not a definitive guide. For more examples and more in-depth discussions of the different decorators and suchlike that Pig makes available to Jython based UDFs I would suggest taking a look at Pig's official documentation.

Another topic only touched on briefly was Hadoop Streaming, this in itself is a powerful technology but actually pretty easy to use once you get started. I've made use of the Streaming API many times without needing anything as complicated as PigLatin - it's worthwhile being able to use that API as a standalone thing.

↧

人生苦短，我用python-- Day18 正则+组件+django框架

December 8, 2016, 2:47 am

≫ Next: Python Web Frameworks

≪ Previous: Codementor: Extending Apache Pig with Python UDFs

1.正则表达式

2.组件

3.django框架

一、正则表达式

作用：1，判断字符串是否符合规定的正则表达式 ----test

2，获取匹配的数据 exec

用户登录的时候常常需要用到正则进行匹配用户输入的是否符合要求：

实验案例一：判断字符串是否符合定义的正则表达式要求
人生苦短，我用python-- Day18 正则+组件+django框架

exec 使用方法：

rep = /\d+/; 定义一个正则表达式，匹配数字 str = "DongGuang_061600_BeiJing_10000" 定义一个字符串 rep.exec(str) 使用rep正则表达式匹配str这个字符串中符合的数据 # ["67"] 结果明显看出，使用这种方式，无论执行几次都是获取第一个数据 str = 'javascript is more fun than Java or JavaBeans!' 定义一个字符串 var pattern = /Java\w*/; 定义一个正则规则，\w的意思是陪陪一Java开头的一个单词 pattern.exec(str) ["JavaScript"] str = 'JavaScript is more fun than Java or JavaBeans!' 定义一个字符串 var pattern = /\bJava(\w*)\b/; 定义一个正则规则，这里把上面的\w*括起来的其意思是，当获取到第一次匹配后，会把结果再次进行匹配，把第一次匹配的内容去除　　　　　　　　　　　　　　　　　　　剩余的内容当做第二次匹配的结果 pattern.exec(str) ["JavaScript", "Script"]

全局匹配

关键参数g

str = 'JavaScript is more fun than Java or JavaBeans!' 定义一个字符串 var pattern = /\bJava(\w*)\b/g; 定义一个正则规则，在上一个的基础上加了一个g,意思是当每次执行这个正则的时候，都会筛选下一个选择，当执行到最后一个，没有匹配到结果为null pattern.exec(str) ["JavaScript", "Script"] pattern.exec(str) ["Java", ""] pattern.exec(str) ["JavaBeans", "Beans"] pattern.exec(str) null

多行匹配

关键参数 m

str = 'JavaScript is more fun than \nJava or JavaBeans!' #定义一个字符串，其中有一个换行符号 "JavaScript is more fun than Java or JavaBeans!" var pattern = /^Java\w*/g; #定义一个正则表达式，全局匹配开头为Java的单词 undefined pattern.exec(str) #第一次匹配结果 ["JavaScript"] pattern.exec(str) #第二次匹配结果 null pattern.exec(str) #第三次匹配结果 ["JavaScript"] pattern.exec(str) #第四次匹配结果 null var pattern = /^Java\w/gm; #定义一个正则表达式，全局多行匹配开头为Java， undefined pattern.exec(str) ["JavaScript"] pattern.exec(str) ["Java"]

不区分大小写匹配

关键参数 i

响应式：响应式html编程大概意思是当浏览器的宽度到达某个程度的时候，css中的某个样式生效 <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Title</title> <style> .c1{ background-color: red; } /*c1这个div最小像素是500,当宽度小于500px的时候,此样式不生效*/ @media (min-width: 500px) { .c1{ background-color: green; } } </style> </head> <body style="margin: auto"> <div class="c1"> 1 </div> </body> </html> 三、django框架的安装及使用

1.django安装

pip3 install django

2.创建一个工程

django-admin startproject management_system

django-admin startproject 项目名称

3.启动工程

cdmanagement_system # 进入到项目目录

python3 manage.py runserver #启动项目

如果不指定端口默认启动端口为127.0.0.1:8000端口

也可以指定端口

python3 manage.py runserver 127.0.0.1:8001

4.web访问

5.目录介绍

-management_system 项目名称

|--management_system #对整个程序进行配置

-__init__.py

-settings.py # 配置文件，配置数据库连接、模板连接等等

-urls.py # URL和函数对应关系，当用户来访问程序的是，就会根据url进行匹配

-wsgi.py # 首先wsgi是一套规则，django是一个web框架不负责给我创建socket连接，所以这里调用wsgi模块进行socket的创建，我们代码只需

要进行编写函数，处理wsgi传过来的数据就可以了。

-manage.py # 管理django程序的：

1.python manage.py runserver 启动django

2.python manage.py startapp APP名称创建app程序（可以理解为子模块）

3.python manage.py makemigrations

python manage.py mingrate 通过这两个命令可以连接数据库，创建表

6.创建工程pycharm也可以帮我们创建，这样创建和我们使用命令创建是一个效果

7.写一个测试页面的html，http://127.0.0.1:8000/test.html

"""management_system URL Configuration The `urlpatterns` list routes URLs to views. For more information please see: https://docs.djangoproject.com/en/1.10/topics/http/urls/ Examples: Function views 1. Add an import: from my_app import views 2. Add a URL to urlpatterns: url(r'^$', views.home, name='home') Class-based views 1. Add an import: from other_app.views import Home 2. Add a URL to urlpatterns: url(r'^$', Home.as_view(), name='home') Including another URLconf 1. Import the include() function: from django.conf.urls import url, include 2. Add a URL to urlpatterns: url(r'^blog/', include('blog.urls')) """ from django.conf.urls import url from django.contrib import admin from django.shortcuts import HttpResponse def test(request): return HttpResponse('<h1>This is a test pag!</h1>') urlpatterns = [ url(r'^admin/', admin.site.urls), url(r'^test.html/', test), ] urls.py

效果：

8.创建一个app

当写一个网站的时候，往往会有很多模块，上面的app其实就是模块的概念，看下图：

当我写一个运维平台的是，可能会有这么的模块，那么每个模块我们叫做一个app，这样就实现了代码分离，数据库共享的效果！

win下面创建app没有什么好的办法，需要我们使用命令：

python manage.py startapp APP名称

mac下面创建app有一个快捷键，option+r，然后直接输入startapp APP名称

9.app目录介绍

APP目录

|--migrations 数据库操作记录目录

--__init__.py

--admin.py #django后台管理配置

--apps.py # 配置当前app

--models.py # django后台管理数据库表管理文件,创建指定的类，models可以创建表结构

--tests.py # 单元测试

--views.py # 写和当前app相关的所有业务代码

10.django的html模板目录的配置

依次找到：工程目录-->settings.py文件修改TEMPLATES列表中的DIRS对应的值为“[os.path.join(BASE_DIR,'创建的模板目录名字')]”

11.django的静态文件目录的配置

依次找到：工程目录-->settings.py文件最后面增加STATUICFILES_DIRS ，切记最后一定要逗号

STATICFILES_DIRS = ( os.path.join(BASE_DIR,'static'), )

12. django返回数据的时候如果返回一个html的时候，我们的想法是，打开一个html读出数据来，把整个数据返回给客户端，但是这种模式假如有好多访问需要不断

的我们来打开文件，这样很麻烦；django后来给我们封装了一个方法我们引入render这个方法进行返回就行了，其实他内部也是打开返回这么操作的。

from django.shortcuts import render

假如我们要重定向到另外一个网站的时候：就需要引入我redirect这个方法进行返回

from django.shortcuts import redirect

我们来梳理一下一个请求过来大致的流程：

说一个html里面嵌套关键字的方法使用大括号：

在这个body标签中有一个{{error_msg}}的关键字，这个error_msg是自己定义的，但是两个大括号是固定格式，当要把这个html返回给浏览器的时候django自己会先把html解析一遍（说白了就是把一些关键字替换一遍）；那么替换成什么呢？

可以看到如果跳转失败了，那么后台会定义一变量error_msg = ‘用户名密码错误’，然后return 一个html，并且会把这个error_msg当做一个值传过去。然后django就会在html进行替换前面定义的关键字了

模板循环：

返回的参数

USER_LIST = [ {'username':'alex','email':'alex3714@163.com','gender':'男'}, {'username':'eriuc','email':'yinjiao@163.com','gender':'男'}, {'username':'tom','email':'tom@163.com','gender':'女'}, return render(request, 'home.html', {'user_list': USER_LIST})

html模板拿到user_list数据后，进行循环，每个row是一个字典，模板语言拿字典中的数据的时候，用点进行拿

{% for row in user_list %} <tr> <td>{{ row.username }}</td> <td>{{ row.gender }}</td> <td>{{ row.email }}</td> </tr> {% endfor %}

本节django使用的的代码案例：

"""onedjango URL Configuration The `urlpatterns` list routes URLs to views. For more information please see: https://docs.djangoproject.com/en/1.10/topics/http/urls/ Examples: Function views 1. Add an import: from my_app import views 2. Add a URL to urlpatterns: url(r'^$', views.home, name='home') Class-based views 1. Add an import: from other_app.views import Home 2. Add a URL to urlpatterns: url(r'^$', Home.as_view(), name='home') Including another URLconf 1. Impo

↧

Python Web Frameworks

December 8, 2016, 2:46 am

≫ Next: DXF data extraction

≪ Previous: 人生苦短，我用python-- Day18 正则+组件+django框架

Structures (source: Eos Maia ). Introduction

At the time of this writing, the web development landscape is dominated by javascript tools. Frameworks like ReactJS and AngularJS are very popular, and many things that were previously done on the server are handled on the client side by these frameworks. This is not limited to the client. Server-side JavaScript frameworks like NodeJS are also prominent.

Does that mean that languages like python should throw in the towel and forget about web applications? On the contrary. Python is a very powerful language that is easy to learn and provides a fast development pace. It has many mature libraries for web-related tasks, from object-relational mapping (ORM) to web scraping. Python is also a fabulous "glue" language for making disparate technologies work together. In this era where JSON APIs and communication with multiple systems are so important, Python is a great choice for server-side web development. And it's great for full-scale web applications, too!

There are many web frameworks for Python; some provide more facilities than others, some offer a greater degree of flexibility or more extensibility. Some try to provide everything you need for a web application and require the use of very specific components, whereas others focus on giving you the bare minimum so that you can pick only the components your application needs.

Among these frameworks, there are dozens that have a significant number of users. How do newcomers to the language choose the right one for their needs? The easiest criterion would probably be popularity, and there are two or three frameworks that will easily be found doing web searches or asking around. This is far from ideal, however, and leaves the possibility of overlooking a framework that is better suited to a developer's needs, tastes, or philosophy.

In this report, we will survey the Python web framework landscape, giving aspiring web developers a place to start their selection process for a web framework. We will look in some detail at a few of the available frameworks, as well as give pointers about how to pick one, and even how to go about creating your own.

Hopefully, this will make it easier for new developers to find what's available, and maybe give experienced Python developers an idea or two about how other web frameworks do things.

What Do Web Frameworks Do?

A web application is not a standalone program but part of the web "pipeline" that brings a website to a user's browser. There's much more to it than your application code working under the hood to make the web work, and having a good understanding of the other pieces of the puzzle is key to being a good web developer. In case you are new to web development or need a refresher, take a look atPython Web Development Fundamentalsto get your bearings.

When writing a web application, in addition to writing the code that does the "business logic" work, it's necessary to figure out things like which URL runs which code, plus take care of things like security, sessions, and sending back attractive and functional HTML pages. For a web service, perhaps we need a JSON rendering of the response instead of an HTML page. Or we might require both.

No matter what our application does, these are parts of it that very conceivably could be used in other, completely different applications. This is what a web framework is: a set of features that are common to a wide range of web applications.

Exactly which set of features a framework provides can vary a lot among frameworks. Some frameworks offer a lot of functionality, including URL routing, HTML templating systems, ORMs to interact with relational databases, security, sessions, form generation, and more. These are sometimes referred to as full-stack frameworks .

Other frameworks, known by many as micro frameworks , offer a much less varied set of features and focus on simplicity. They usually offer URL routing, templating, and not much else.

This emphasis on size (micro and full-stack) can sometimes be confusing. Are we referring to the framework's codebase? Are micro frameworks for small applications and full-stack frameworks for large applications? Also, not all frameworks easily fit into one of these categories. If a framework has lots of features but makes most of them optional, does that still count as full-stack?

From an experienced developer point of view, it could make sense to examine frameworks in terms of decisions made . Many features offered by frameworks, like which ORM it supports or which templating system it's bundled with, imply a decision to use that specific tool instead of other similar components.

Obviously, the more decisions made by the framework, the less decisions the developer needs to make. That means more reliance on the way the framework works, more knowledge of how its parts fit together, and more integrated behavior―all within the confines of what the web framework considers a web application. Conversely, if a developer needs to make more decisions, they'll have more work to do, but they will also have more control over their application, and can concentrate on the parts of a framework they specifically need.

Even if the framework makes these decisions, most of them are not set in stone. A developer can change these decisions, maybe by replacing certain components or libraries. The trade-off is losing some framework functionality in return for that freedom.

There are many Python web frameworks. Besides size and decisions made for the developer, many of them offer unique features or special twists on what a web appplication should do. Some developers will immediately feel attracted to some frameworks, or conclude after some analysis that one of them is better suited for the specific project they have in mind. Regardless of the chosen framework, it's always a good idea to be aware of the variety of other available frameworks so that a better choice can be made if necessary.

Python Web Framework Landscape

There are many options for building web applications with Python. Python's powerful yet flexible nature makes it perfect for this task. It's a good idea to know what's available before going in that direction, though. Perhaps one of the many existing options will suit your needs and save you a ton of work.

To make it easier to know at a glance what frameworks are out there, the following list shows 30 web frameworks that are active and have more than 1,000 monthly downloads at the time of this writing. For each framework, the list presents the following information:

Slogan This is a short phrase that comes from the framework's site or documentation and attempts to convey the spirit of the framework according to its creators. Description In a nutshell, what this framework is and why you should use it. Author Main author, according to Python Package Index . Website Official website of the framework, or code repository if no site is available. Relative popularity A very crude attempt at gauging a project's popularity, by normalizing the number of monthly downloads and generating a score. Its purpose is only to give the reader a general idea about how one framework compares to another in terms of number of users. For example, Django, which is the Python framework with the largest number of downloads, has 10 stars. At the other end of the spectrum, BlueBream, which is barely above 1,000 downloads, has one star. This popularity scale should not be taken too seriously. Python versions Shows the versions of Python that the framework runs on. License Shows the license under which the framework is distributed. Documentation This is a key part of any framework, because the more you know about how to use it, the quicker you can get started and take advantage of its features. Some people learn by example, so having tutorials and sample code can be very helpful too, both for beginners and more advanced users. For each framework, documentation is graded using a very simple scale: poor, adequate, extensive, or comprehensive. Again, this is very subjective and only meant as a simple guide to know what to expect. Features A short list of what the framework's authors consider its best features. Other resources This refers to resources other than web pages to get help and information for a framework, like mailing lists and IRC channels. Persistence Many web applications require a storage layer of some sort, usually a database. Because of this, most web frameworks are designed to use one or more specific data persistence options. Templating This is another very common feature of web frameworks. The HTML markup for an application page is usually written in a templating language. Web Framework List Appier

Joyful Python Web App development.

Appier is an object-oriented Python web framework built for super-fast app development. It’s as lightweight as possible, but not too lightweight. It gives you the power of bigger frameworks, without the complexity.

Author

Hive Solutions Lda.

Website

http://appier.hive.pt

Relative popularity

*****

Python versions

2.6 to 3.5

License

Apache

Documentation

Adequate

Other resources

None

Persistence

MongoDB

Templating

Jinja2

Features REST dispatching JSON response encoding Admin interface i18n Aspen

A Python web framework that makes the most of the filesystem. Simplates are the main attraction.

Aspen maps your URLs directly to the filesystem. It’s way simpler than regular expression routing or object traversal.

Author

Gratipay, LLC

Website

http://aspen.io

Relative popularity

******

Python versions

2.6, 2.7

License

MIT

Documentation

Adequate

Other resources

IRC

Persistence

Any

Templating

Python, Jinja2, Pystache

Features Simplates: code and template in same file, with structure JSON helpers Filesystem-based URL mapping BlueBream

The Zope Web Framework.

BlueBream is an open source web application server, framework, and library created by the Zope community and formerly known as Zope 3. It is best suited for medium to large projects split into many interchangeable and reusable components.

Author

Zope Foundation and Contributors

Website

http://bluebream.zope.org

Relative popularity

Python versions

2.6, 2.7

License

ZPL

Documentation

Extensive

Other resources

Mailing list

Persistence

ZODB

Templating

ZPT

Features Built on top of Zope 3 Full stack, but with distributed architecture Mature, well-tested components Object database Bobo

Web application framework for the impatient.

Bobo is a lightweight framework for creating WSGI web applications. Its goal is to be easy to use and remember.

Author

Jim Fulton

Website

http://bobo.digicool.com

Relative popularity

Python versions

2.6 to 3.5

License

ZPL

Documentation

Extensive

Other resources

Mailing list

Persistence

Any

Templating

Any

Features Subroutes for multiple-step URL matching JSON request bodies Automatic response generation, based on return value Bottle

Fast and simple WSGI framework for small web applications.

Bottle is a fast, simple, and lightweight WSGI micro web framework for Python. It is distributed as a single-file module and has no dependencies other than the Python Standard Library.

Author

Marcel Hellkamp

Website

http://bottlepy.org

Relative popularity

******

Python versions

2.6 to 3.5

License

MIT

Documentation

Extensive

Other resources

Mailing list, IRC, Twitter

Persistence

Any

Templating

Simple templates, Jinja2, Mako, Cheetah

Features HTTP utilities Single file distribution CherryPy

A Minimalist Python Web Framework.

CherryPy allows developers to build web applications in much the same way they would build any other object-oriented Python program.

Author

CherryPy Team

Website

http://www.cherrypy.org

Relative popularity

******

Python versions

2.6 to 3.5

License

BSD

Documentation

Comprehensive

Other resources

Mailing list, IRC

Persistence

Any

Templating

Any

Features Authorization, sessions, static content, and more Configuration system Plugin system Profiling and coverage support Clastic

A functional Python web framework that streamlines explicit development practices while eliminating global state.

Clastic was created to fill the need for a minimalist web framework that does exactly what you tell it to, while eliminating common pitfalls and delays in error discovery.

Author

Mahmoud Hashemi

Website

https://github.com/mahmoud/clastic

Relative popularity

Python versions

2.6, 2.7

License

BSD

Documentation

Basic

Other resources

None

Persistence

Any

Templating

Any

Features No global state Proactive URL route checking Improved middleware paradigm Cyclone

Facebook's Tornado on top of Twisted.

Cyclone is a web server framework for Python that implements the Tornado API as a Twisted protocol.

Author

Alexandre Fiori

Website

https://cyclone.io

Relative popularity

***

Python versions

2.6, 2.7

License

Apache

Documentation

Adequate

Other resources

None

Persistence

Twisted adbapi, redis, sqlite, mongodb

Templating

Cyclone templates

Features Asyncio stack Command-line integration Django

The web framework for perfectionists with deadlines.

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.

Author

Django Software Foundation

Website

https://djangoproject.com

Relative popularity

**********

Python versions

2.6 to 3.5

License

BSD

Documentation

Comprehensive

Other resources

Mailing lists, IRC

Persistence

Django ORM

Templating

Django templates, Jinja2

Features Fully loaded: authentication, site maps, feeds, etc. Superbly documented Extensible admin interface Security-minded Falcon

An unladen web framework for building APIs and app backends.

Falcon is a minimalist, high-performance web framework for building RESTful services and app backends with Python.

Author

Kurt Griffiths

Website

http://falconframework.org

Relative popularity

****

Python versions

2.6 to 3.5

License

Apache

Documentation

Extensive

Other resources

Mailing list, IRC

Persistence

Any

Templating

Any

Features Web service oriented Focused on performance Fantastico

Pluggable, developer-friendly content publishing framework for Python 3 developers.

Python 3 MVC web framework with built-in capabilities for developing web services and modular web applications.

Author

Radu Viorel Cosnita

Website

https://github.com/rcosnita/fantastico/

Relative popularity

Python versions

3.3, 3.4, 3.5

License

MIT

Documentation

Adequate

Other resources

None

Persistence

Fantastico ORM

Templating

Any

Features Extensible routing engine ORM Dynamic content generation Flask

Web development one drop at a time.

A micro framework based on Werkzeug, Jinja2, and good intentions.

Author

Armin Ronacher

Website

http://flask.pocoo.org

Relative popularity

*********

Python versions

2.6, 2.7, 3.3, 3.4, 3.5

License

BSD

Documentation

Comprehensive

Other resources

Mailing list, IRC

Persistence

Any

Templating

Jinja2

Features Built-in debugger RESTful request dispatching Allows modular applications with plugins Extensible Giotto

Web development simplified. An MVC framework supporting Python 3.

Giotto is a Python web framework. It encourages a functional style where model, view, and controller code is strongly decoupled.

Author

Chris Priest

Website

http://giotto.readthedocs.org

Relative popularity

Python versions

2.7, 3.3, 3.4, 3.5

License

Own

Documentation

Adequate

Other resources

Google group

Persistence

SQLAlchemy

Templating

Jinja2

Features Generic views and models Functional CRUD patterns Automatic RESTful interface Automatic URL routing Grok

A smashing web framework.

Grok uses the Zope Component Architecture and builds on Zope concepts like content objects (models), views, and adapters. Its simplicity lies in using convention over configuration and sensible defaults when wiring components together.

Author

Grok Team

Website

http://grok.zope.org

Relative popularity

***

Python versions

2.6, 2.7

License

ZPL

Documentation

Extensive

Other resources

Mailing list

Persistence

ZODB

Templating

Zope page templates

Features Convention over configuration Takes advantage of full Zope toolkit Object-oriented database kiss.py

MVC web framework in Python with Gevent, Jinja2, and Werkzeug.

Author

Stanislav Feldman

Website

http://stanislavfeldman.github.io/kiss.py

Relative popularity

Python versions

2.6, 2.7

License

Own

Documentation

Poor

Other resources

None

Persistence

Pewee

Templating

Jinja2

Features Integration with Gevent REST controllers Minified templates Klein

Werkzeug + twisted.web.

Klein is a micro framework for developing production-ready web services with Python. It’s built on widely used and well-tested components like Werkzeug and Twisted.

Author

Amber Brown

Website

http://klein.readthedocs.org

Relative popularity

*****

Python versions

2.6 to 3.5

License

MIT

Documentation

Adequate

Other resources

IRC

Persistence

Any

Templating

Twisted templates

Features Focus on web services Integrates Twisted concepts like deferreds Morepath

A micro web framework with superpowers.

Morepath is a Python WSGI micro framework. It uses routing, but the routing is to models. Morepath is model-driven and flexible, which makes it expressive.

Author

Martijn Faassen

Website

http://morepath.readthedocs.org/

Relative popularity

Python versions

2.6 to 3.5

License

BSD

Documentation

Extensive

Other resources

Mailing list, IRC

Persistence

Any

Templating

Any

Features Automatic hyperkinks that don't break Generic UIs Simple, flexible permissions Easy to extend and override Muffin

Web framework based on Asyncio stack.

Muffin is a fast, simple, and asyncronous web framework for Python 3 .

Author

Kirill Klenov

Website

https://github.com/klen/muffin

Relative popularity

*****

Python versions

2.6 to 3.5

License

MIT

Documentation

Poor

Other resources

None

Persistence

Any

Templating

Any

Features Asyncio stack Command-line integration Pylons

A framework to make writing web applications in Python easy.

Pylons 1.0 is a lightweight web framework emphasizing flexibility and rapid development.

Author

Ben Bangert, Philip Jenvey, James Gardner

Website

http://www.pylonsproject.org/projects/pylons-framework/

Relative popularity

****

Python versions

2.6, 2.7

License

BSD

Documentation

Extensive

Other resources

Mailing lists, IRC

Persistence

SQLAlchemy

Templating

Mako, Genshi, Jinja2

Features Uses existing and well-tested Python packages Extensible application design Minimalist, component-based philosophy Pyramid

The start small, finish big, stay finished framework.

Pyramid is a general, open source, Python web application development framework. Its primary goal is to make it easier for a Python developer to create web applications.

Author

Chris McDonough, Agendaless Consulting

Website

https://trypyramid.com

Relative popularity

******

Python versions

2.6 to 3.5

License

BSD derived

Documentation

Comprehensive

Other resources

Mailing lists, IRC

Persistence

Any

Templating

Any

Features Powerful configuration system Overridable asset specifications Extensible templating Flexible view and rendering systems Tornado

A Python web framework and asynchronous networking library, originally developed at FriendFeed.

A simple web framework with asynchronous features that allow it to scale to large numbers of open connections, making it ideal for long polling.

Author

Facebook

Website

http://www.tornadoweb.org

Relative popularity

*********

Python versions

2.6 to 3.5

License

Apache

Documentation

Adequate

Other resources

Mailing list, wiki

Persistence

Any

Templating

Tornado templates

Features Ideal for long-polling and websockets Can scale to tens of thousands of open connections TurboGears

The web framework that scales with you.

TurboGears is a Python web framework based on the ObjectDispatch paradigm. It is meant to make it possible to write both small and concise applications in Minimal mode or complex applications in Full Stack mode.

Author

TurboGears Release Team

Website

http://www.turbogears.org

Relative popularity

***

Python versions

2.6, 2.7, 3.3, 3.4, 3.5

License

MIT

Documentation

Extensive

Other resources

Mailing list, IRC, Google+

Persistence

SQLAlchemy

Templating

Genshi

Features From micro framework to full-stack applications Pluggable applications Widget system Horizontal data partitioning Twisted

Building the engine of your Internet.

An extensible framework for Python programming, with special focus on event-based network programming and multiprotocol integration. Twisted includes twisted.web, a web application server based on the concept of resources.

Author

Glyph Lefkowitz

Website

https://twistedmatrix.com

Relative popularity

*******

Python versions

2.6 to 3.5

License

MIT

Documentation

Adequate

Other resources

Mailing list, IRC

Persistence

Any

Templating

twisted.web.template

Features Takes advantage of Twisted networking power Allows "spreadable" web servers (multiple servers answer requests on same port) Can use any WSGI application as a resource Uliweb

Unlimited Python web framework.

Uliweb is a full-stacked Python-based web framework. It has three main design goals: reusability, configurability, and replaceability. Its functionality revolves around these goals.

Author

Limodou

Website

http://limodou.github.io/uliweb-doc/

Relative popularity

Python versions

2.6, 2.7

License

BSD

Documentation

Adequate

Other resources

Mailing list

Persistence

Uliorm

Templating

Uliweb

Features Based on SQLAlchemy and Werkzeug Extensible Command-line tools Watson

It's elementary, my dear Watson.

A framework designed to get out of your way and let you code your application rather than spend time wrangling with the framework. It follows the "convention over configuration" ideal.

Author

Simon Coulton

Website

http://watson-framework.readthedocs.org

Relative popularity

Python versions

3.3, 3.4, 3.5

License

Own

Documentation

Adequate

Other resources

Mailing list

Persistence

Any

Templating

Jinja2

Features Event-based Dependency injection Form library web.py

Think about the ideal way to write a web app. Write the code to make it happen.

web.py is a web framework for Python that is as simple as it is powerful.

Author

Anand Chitipothu

Website

http://webpy.org

Relative popularity

*****

Python versions

2.6, 2.7

License

Public Domain

Documentation

Adequate

Other resources

Mailing list

Persistence

web.database

Templating

Templetor

Features Simple to use Own database and template libraries Form library web2py

Everything in one package with no dependencies.

Free open source full-stack framework for rapid development of fast, scalable, secure, and portable database-driven web-based applications.

Author

Massimo Di Pierro

Website

http://web2py.com

Relative popularity

Python versions

2.6, 2.7

License

LGPL 3

Documentation

Extensive

Other resources

Mailing list

Persistence

DAL

Templating

web2py

Features Requires no installation and no configuration Web-based IDE Everything included; no dependencies Always backward-compatible webapp2

Taking Google App Engine's webapp to the next level!

webapp2 is a lightweight Python web framework compatible with Google App Engine’s webapp.

Author

Rodrigo Moraes

Website

http://webapp-improved.appspot.com/

Relative popularity

****

Python versions

2.6, 2.7

License

Apache

Documentation

Extensive

Other resources

Mailing list

Persistence

Google datastore

Templating

Jinja2, Mako

Features Compatible with webapp Better URI routing and exception handling Extras package with optional utilities WebPages

A Python web framework.

This project was designed for web developers who want to do more in less time. To create a new project with Hello World and a database connection, you only need a few minutes.

Author

Anton Danilchenko

Website

https://github.com/webpages/webpages

Relative popularity

Python versions

3.3, 3.4, 3.5

License

MIT

Documentation

Poor

Other resources

Facebook

Persistence

WebPages ORM

Templating

WebPages templates

Features Convention over configuration Settings per component User authentication out of the box ORM with simplified syntax wheezy.web

Python's fastest web framework.

A lightweight, high-performance, high-concurrency WSGI web framework with the key features to build modern, efficient web applications.

Author

Andriy Kornatskyy

Website

http://wheezyweb.readthedocs.org

Relative popularity

***

Python versions

2.6 to 3.5

License

MIT

Documentation

Adequate

Other resources

None

Persistence

Any

Templating

Jinja2, Mako, Tenjin, Wheezy

Features High performance Authentication/authorization Model update/validation Some Frameworks to Keep an Eye On

As we have seen, there are many Python web frameworks to choose from. In fact, there are too many to be able to cover every one in detail in this report. Instead, we will take a deeper look at six of the most popular. There is enough diversity here to give the reader some idea about how different frameworks work and what a web application's code looks like when using them.

For each framework, we are going to give a general description, discuss some key features, look at some sample code, and talk a bit about when it should be used. When possible, code for a simple single-file application will be shown. Quick start instructions assume Python and pip or easy_install are present on the system. It is also recommended that you use virtualenv (or pyvenv for Python 3.3+) to create an isolated environment for your application. For simplicity, the examples do not show the setup of pip and virtualenv . SeePython Web Development Fundamentalsfor help with any of these tools.

Django

Django is without a doubt the most popular web framework for Python at the time of this writing. Django is a high-level framework, designed to take care of most common web application needs.

Django makes a lot of decisions for you, from code layout to security. It's also very well documented, so it's very easy to get a project off the ground quickly. There are also many third-party applications that can complement its many features nicely.

Django is very well-suited for database-driven web applications. Not only does it include its own object-relational mapping (ORM), but it can do automatic form generation based on the schemas and even helps with migrations. Once your models are defined, a rich Python API can be used to access your data.

Django also offers a dynamic administrative interface that lets authenticated users add, change, and delete objects. This makes it possible to get a nice-looking admin site up very early in the development cycle, and start populating the data and testing the models while the user-facing parts of the application are taking shape.

In addition to all this, Django has a clean and simple way of mapping URLs to code (views), and deals with things like caching, user profiles, authentication, sessions, cookies, internationalization, and more. Its templating language is simple, but it's possible to create custom tags for more advanced needs. Also, Django now supports Jinja2, so there's that option if you require a bit more powerful templating.

Quick Start

To install Django:

$ pip install Django

Unlike the other frameworks discussed in this chapter, Django is a full-stack framework, so we won't show a code listing for a Hello World application. While it's possible to create a Django application in a single file, this goes against the way Django is designed, and would actually require more knowledge about the various framework pieces than a complete application.

Django organizes code inside a project. A project has a configuration, or settings, plus a set of URL declarations. Since Django is intended for working with relational databases, the settings usually include database configuration information. Inside the project, there is a command-line utility, named manage.py , for interacting with it in various ways. To create a project:

$ django-admin startproject mysite

A project can contain one or more applications. An application is an ordinary Python package where Django looks for some things. An application can contain the database models, views, and admin site registrations that will make your models be part of the automatic admin interface. The basic idea in Django is that an application performs one defined task.

Representative Code

Django's database integration is one of its strong suits, so let's take a look at a few examples of that.

Defining models from django.db import models class Author(models.Model): first_name = models.CharField(max_length=70) last_name = models.CharField(max_length=70) def __str__(self): return self.full_name @property def full_name(self): return '{} {}'.format(self.first_name, self.last_name) class Book(models.Model): author = models.ForeignKey(Author) title = models.CharField(max_length=200) description = models.TextField() pub_date = models.DateField() def __str__(self): return self.title

A model contains data about one single part of your application. It will usually map to a single database table. A model is defined in a class that subclasses django.db.models.Model . There are several types of fields, like CharField , TextField , DateField , etc. A model can have as many fields as needed, which are added simply by assigning them to attributes of the model. Relationships can be expressed easily, like in the author field in the Book model above, which uses a ForeignKey field to model a many-to-one relationship.

In addition to fields, a model can have behaviors, or "business logic." This is done using instance methods, like full_name in our sample code. All models also include automatic methods, which can be overriden if desired, like the __str__ method in the example, which gives a unicode representation for a model in Python 3.

Registering models with the admin interface from django.contrib import admin from mysite.myapp.models import Book class BookAdmin(admin.ModelAdmin): list_display = ['title', 'author', 'pub_date'] list_filter = ['pub_date'] search_fields = ['title', 'description'] admin.site.register(Book, BookAdmin)

To make your models appear in the Django admin site, you need to register them. This is done by inheriting from django.contrib.admin.ModelAdmin , customizing the display and behavior of the admin, and registering the class, like we do in the last line of the previous example. That's all the code needed to get a polished interface for adding, changing, and removing books from your site.

Django's admin is very flexible, and has many customization hooks. For example, the list_display attribute takes a list of fields and displays their values in columns for each row of books, rather than just showing the result of the __str__() method. The hooks are not just for display purposes. You can add custom validators, or define actions that operate on one or more selected instances and perform domain specific transformations on the data.

Views from django.shortcuts import render from .models import Book def publication_by_year(request, year): books = Book.objects.filter(pub_date__year=year) context = {'year': year, 'book_list': books} return render(request, 'books/by_year.html', context)

A view in Django can be a simple method that takes a request and zero or more URL parameters. The view is mapped to a URL using Django's URL patterns. For example, the view above might be associated with a pattern like this:

url(r'^books/([0-9]{4})/$, views.publication_by_year)

This is a simple regular expression pattern that will match any four-digit number after "books/" in the URL. Let's say it's a year. This number is passed to the view, where we use the model API to filter all existing books with this year in the publication date. Finally, the render method is used to generate an HTML page with the result, using a context object that contains any results that need to be passed to the template.

Automated Testing

Django recommends using the unittest module for writing tests, though any testing framework can be used. Django provides some tools to help write tests, like TestCase subclasses that add Django-specific assertions and testing mechanisms. It has a test client that simulates requests and lets you examine the responses.

Django's documentation has several sections dedicated to testing applications, giving detailed descriptions of the tools it provides and examples of how to test your applications.

When to Use Django

Django is very good for getting a database-driven application done really quickly. Its many parts are very well integrated and the admin site is a huge time saver for getting site administrators up and running right away.

If your data is not relational or you have fairly simple requirements, Django's features and parts can be left just sitting there, or even get in the way. In that case, a lighter framework might be better.

Flask

Flask is a micro framework. Micro refers to the small core of the framework, not the ability to create single-file applications. Flask basically provides routing and templating, wrapped around a few configuration conventions. Its objective is to be flexible and allow the user to pick the tools that are best for their project. It provides many hooks for customization and extensions.

Flask curiously started as an April Fool's joke, but it's in fact a very serious framework, heavily tested and extensively documented. It features integrated unit testing support and includes a development server with a powerful debugger that lets you examine values and step through the code using the browser.

Flask is unicode-based and supports the Jinja2 templating engine, which is one of the most popular for Python web applications. Though it can be used with other template systems, Flask takes advantage of Jinja2's unique features, so it's really not advisable to do so.

Flask's routing system is very well-suited for RESTful request dispatching, which is really a fancy name for allowing specific routes for specific HTTP verbs (methods). This is very useful for building APIs and web services.

Other Flask features include sessions with secure cookies, pluggable views, and signals (for notifications and subscriptions to them). Flask also uses the concept of blueprints for making application components.

Quick Start

To install Flask:

$ pip install Flask Flask "Hello World"

from flask import Flask app = Flask(__name__) @app.route("/") def hello(): return "Hello World!" if __name__ == "__main__": app.run()

The Flask class is used to create an instance of a WSGI application, passing in the name of the application's package or module. Once we have a WSGI application object, we can use Flask's specific methods and decorators.

The route decorator is used to connect a view function with a URL (in this case, the root of the site). This is a very simple view that just returns a string of text.

Finally, we use the common Python idiom for executing some code when a script is called directly by the interpreter, where we call app.run() to start the development server.

Representative Code

Let's look at a few examples of what Flask code looks like inside real applications.

Per request connections @app.before_request def before_request(): g.db = connect_db() @app.teardown_request def teardown_request(exception): db = getattr(g, 'db', None) if db is not None: db.close()

It's common to need some resources present on a per request basis such as service connections to things like redis, Salesforce, or databases. Flask provides various decorators to set this up easily. In the example above, we assume that the connect_db method is defined somewhere else and takes care of connecting to an already initialized database. Any function decorated with before_request will be called before a request, and we use that call to store the database connection in the special g object provided by Flask.

To make sure that the connection is closed at the end of the request, we can use the teardown_request decorator. Functions decorated with this are guaranteed to be executed even if an exception occurs. In fact, if this happens, the exception is passed in. In this example, we don't care if there's an exception; we just try to get the connection from g , and if there is one, we close it.

There's also an after_request decorator, which gets called with the response as a parameter and must return that or another response object.

Sessions from flask import Flask, session, redirect, url_for, escape, request app = Flask(__name__) @app.route('/') def index(): if 'username' in session: return 'Logged in as %s' % escape(session['username']) return 'You are not logged in' @app.route('/login', methods=['GET', 'POST']) def login(): if request.method == 'POST': session['username'] = request.form['username'] return redirect(url_for('index')) return render_template('login.html') @app.route('/logout') def logout(): session.pop('username', None) return redirect(url_for('index')) app.secret_key = 'A secret'

The session object allows you to store information specific to a user from one request to the next. Sessions are implemented using secure cookies, and thus need a key to be used.

The index view checks the session for the presence of a user name, and shows the logged-in state accordingly. The login view is a bit more interesting. It renders the login template if called with the GET method, and sets the session username variable if called with POST. The logout view simply removes the variable from the session, in effect logging out the user.

Views @app.route('/') def show_entries(): cur = g.db.execute( 'select title, text from entries order by id desc') entries = [dict(title=row[0], text=row[1]) for row in cur.fetchall()] return render_template('show_entries.html', entries=entries) @app.route('/add', methods=['POST']) def add_entry(): if not session.get('username'): abort(401) g.db.execute( 'insert into entries (title, text) values (?, ?)', [request.form['title'], request.form['text']]) g.db.commit() flash('New entry was successfully posted') return redirect(url_for('show_entries'))

Here we show how to define views. The route decorator that we saw in the quickstart application is used to connect the add_entry method with the /add URL. Note the use of the methods parameter to restrict that view to POST requests.

We examine the session to see if the user is logged in, and if not, abort the request. We assume that the request comes from a form that includes the title and text parameters, which we extract from the request to use in an insert statement. The request object referenced here has to be imported from flask , as in the sessions example.

Finally, a flash message is set up to display the change to the user, and the browser is redirected to the main show_entries view. This last view is simple, but it shows how to render a template, calling it with the template name and the context data required for rendering.

Automated Testing

Flask exposes the Werkzeug test client to make testing applications easier. It also provides test helpers to let tests access the request context as if they were views.

The documentation has a long section about testing applications. The examples use unittest , but any other testing tool can be used. Since Werkzeug is fully documented itself, there is very good information available about the test client too.

When to Use Flask

Flask can be used to write all kinds of applications, but by design it's better for small- to medium-sized systems. It is also not ideal for composing multiple applications, because of its use of global variables. It's especially good for web APIs and services. Its small core allows it to be used as "glue" code for many data backends, and it's a very powerful companion for SQLAlchemy when dealing with database-driven web applications.

Tornado

Tornado is a combination of an asynchronous networking library and a web framework. It is intended for use in applications that require long-lived connections to their users.

Tornado has its own HTTP server based on its asynchronous library. While it's possible to use the web framework part of Tornado with WSGI, to take advantage of its asynchronous nature it's necessary to use it together with the web server.

In addition to typical web framework features, Tornado has libraries and utilities to make writing asynchronous code easier. Instead of depending on callbacks, Tornado's coroutines library allows a programming style more similar to synchronous code.

Tornado includes a simple templating language. Unlike other templating languages discussed here, in Tornado templates there are no restrictions on the kind of expressions that you can use. Tornado also has the concept of UI modules , which are special function calls to render UI widgets that can include their own CSS and JavaScript.

Tornado also offers support for authentication and security, including secure cookies and CSRF protection. Tornado authentication includes support for third-party login systems, like Google, Facebook, and Twitter.

Quick Start

To install Tornado:

$ pip install tornado Tornado "Hello World" import tornado.ioloop import tornado.web class MainHandler(tornado.web.RequestHandler): def get(self): self.write("Hello, world") application = tornado.web.Application([ (r"/", MainHandler), ]) if __name__ == "__main__": application.listen(8888) tornado.ioloop.IOLoop.current().start()

First, we define a request handler, which will simply write our "Hello World" message in the response. A Tornado application usually consists of one or more handlers. The only prerequisite for defining a handler is to subclass from the tornado.web.RequestHandler class.

To route requests to the appropriate handlers and take care of global configuration options, Tornado uses an application object. In the example above, we can see how the application is passed the routing table, which in this case includes only one route. This route assigns the root URL of the site to the MainHandler created above.

Once we have an application object, we configure it to listen to port 8888 and start the asynchronous loop to serve our application. Note that there's no specific association of the application object we created and the ioloop , because the listen call actually creates an HTTP server behind the scenes.

Representative Code

Since Tornado's asynchronous nature is its main feature, let's see some examples of that.

Synchronous and asynchronous code from tornado.httpclient import HTTPClient def synchronous_fetch(url): http_client = HTTPClient() response = http_client.fetch(url) return response.body from tornado.httpclient import AsyncHTTPClient def asynchronous_fetch(url, callback): http_client = AsyncHTTPClient() def handle_response(response): callback(response.body) http_client.fetch(url, callback=handle_response) from tornado import gen @gen.coroutine def fetch_coroutine(url): http_client = AsyncHTTPClient() response = yield http_client.fetch(url) raise gen.Return(response.body)

In these three short examples, we can see how Tornado uses asynchronous calls and how that compares with the normal, synchronous calls that we would use in a WSGI application.

In the first example, we use tornado.HTTPClient to fetch a URL from somewhere in the cloud. This is the regular case, and the synchronous_fetch call will not return until the client gets the response back.

The second example uses the AsyncHTTPClient . The call will return immediately after the fetch call, which is why Tornado can scale more. The fetch method is passed a callback, which is a function that will be executed when the client gets a response back. This works, but it can lead to situations where you have to chain callbacks together, which can quickly become confusing.

For this reason, coroutines are the recommended way to write asynchronous code in Tornado. Coroutines take advantage of Python generators to be able to return immediately with no callbacks. In the fetch_coroutine method above, the gen.coroutine decorator takes care of waiting without blocking for the client to finish fetching the URL, and then passes the result to the yield.

Request handlers class BaseHandler(tornado.web.RequestHandler): def get_current_user(self): return self.get_secure_cookie("user") class MainHandler(BaseHandler): def get(self): if not self.current_user: self.redirect("/login") return name = tornado.escape.xhtml_escape(self.current_user) self.render("hello.html", title="Welcome", name) class LoginHandler(BaseHandler): def get(self): self.render("login.html", title="Login Form") def post(self): self.set_secure_cookie("user", self.get_argument("name")) self.redirect("/") application = tornado.web.Application([ (r"/", MainHandler), (r"/login", LoginHandler)], cookie_secret="__TODO:_GENERATE_A_RANDOM_VALUE_HERE__")

Since request handlers are classes, you can use inheritance to define a base request handler that can have all the basic behavior needed for your application. In BaseHandler in the previous example, the get_current_user call will be available for both handlers defined in the next example.

A handler should have a method for every HTTP method that it can handle. In MainHandler , the GET method gets a look at the current user and redirects to the login handler if it is not set (remember that get_current_user is inherited from the base handler). If there's a user, its name is escaped before being passed to the template. The render method of a handler gets a template by name, optionally passes it some arguments, and renders it.

LoginHandler has both GET and POST methods. The first renders the login form, and the second sets a secure cookie with the name and redirects to the MainHandler . The Tornado handlers have several utility methods to help with requests. For example, the self.get_argument method gets a parameter from the request. The request itself can be accessed with self.request .

UI modules class Entry(tornado.web.UIModule): def embedded_css(self): return ".entry { margin-bottom: 1em; }" def render(self, entry, show_comments=False): return self.render_string( "module-entry.html", entry=entry, show_comments=show_comments)

UI modules are reusable UI widgets that you can use across your application. They make it easy to design your page layouts using independent components. UI modules subclass from tornado.web.UIModule and must include a render method. In the example above, we define a UI module that represents a blog entry.

The render method can include arbitrary parameters, which usually will be passed on to the module template, like in the example above. A UI module can also include its own CSS and JavaScript. In our example, we use the embedded_css method to return some CSS to use for the entry class. There are also methods for embedding JavaScript and for pointing to CSS and JavaScript files.

Once the UI module is defined, we can call it within a template with:

{ % module Entry(entry, show_comments=True) % }

Automated Testing

Tornado offers support classes for automated testing that allow developers to test asynchronous code. It has a simple test runner, which wraps unittest.main . It also has a couple of test helper functions.

Tornado's test module is documented, but there is no specific tutorial or narrative section devoted to testing.

When to Use Tornado

Tornado is a bit different to the other web frameworks discussed here, in that it goes hand in hand with asynchronous networking. It's ideal to use when you need websockets, long polling, or any other kind of long-lived connections. It can also help you scale your application to tens of thousands of open connections, provided your code is written to be asynchronous and nonblocking.

For more "regular" applications, like database-driven sites, using a WSGI framework is probably a better choice. Some of those frameworks also include a lot of features that the Tornado web framework does not have.

Bottle

Bottle is a true Python micro framework, in that it's actually distributed as a single file and has no dependencies outside of the Python standard library. It's lightweight and fast.

Bottle focuses on providing clean and simple URL routing and templating. It includes utilities for many web development needs, like access to form data, cookies, headers, and file uploads.

↧